sparse matrix solvers: Topics by Science.gov

Sample records for sparse matrix solvers

User's Manual for PCSMS (Parallel Complex Sparse Matrix Solver). Version 1.

NASA Technical Reports Server (NTRS)

Reddy, C. J.

2000-01-01

PCSMS (Parallel Complex Sparse Matrix Solver) is a computer code written to make use of the existing real sparse direct solvers to solve complex, sparse matrix linear equations. PCSMS converts complex matrices into real matrices and use real, sparse direct matrix solvers to factor and solve the real matrices. The solution vector is reconverted to complex numbers. Though, this utility is written for Silicon Graphics (SGI) real sparse matrix solution routines, it is general in nature and can be easily modified to work with any real sparse matrix solver. The User's Manual is written to make the user acquainted with the installation and operation of the code. Driver routines are given to aid the users to integrate PCSMS routines in their own codes.
Using a multifrontal sparse solver in a high performance, finite element code

NASA Technical Reports Server (NTRS)

King, Scott D.; Lucas, Robert; Raefsky, Arthur

1990-01-01

We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
The Use of Sparse Direct Solver in Vector Finite Element Modeling for Calculating Two Dimensional (2-D) Magnetotelluric Responses in Transverse Electric (TE) Mode

NASA Astrophysics Data System (ADS)

Yihaa Roodhiyah, Lisa’; Tjong, Tiffany; Nurhasan; Sutarno, D.

2018-04-01

The late research, linear matrices of vector finite element in two dimensional(2-D) magnetotelluric (MT) responses modeling was solved by non-sparse direct solver in TE mode. Nevertheless, there is some weakness which have to be improved especially accuracy in the low frequency (10-3 Hz-10-5 Hz) which is not achieved yet and high cost computation in dense mesh. In this work, the solver which is used is sparse direct solver instead of non-sparse direct solverto overcome the weaknesses of solving linear matrices of vector finite element metod using non-sparse direct solver. Sparse direct solver will be advantageous in solving linear matrices of vector finite element method because of the matrix properties which is symmetrical and sparse. The validation of sparse direct solver in solving linear matrices of vector finite element has been done for a homogen half-space model and vertical contact model by analytical solution. Thevalidation result of sparse direct solver in solving linear matrices of vector finite element shows that sparse direct solver is more stable than non-sparse direct solver in computing linear problem of vector finite element method especially in low frequency. In the end, the accuracy of 2D MT responses modelling in low frequency (10-3 Hz-10-5 Hz) has been reached out under the efficient allocation memory of array and less computational time consuming.
Solution of matrix equations using sparse techniques

NASA Technical Reports Server (NTRS)

Baddourah, Majdi

1994-01-01

The solution of large systems of matrix equations is key to the solution of a large number of scientific and engineering problems. This talk describes the sparse matrix solver developed at Langley which can routinely solve in excess of 263,000 equations in 40 seconds on one Cray C-90 processor. It appears that for large scale structural analysis applications, sparse matrix methods have a significant performance advantage over other methods.
Comparing direct and iterative equation solvers in a large structural analysis software system

NASA Technical Reports Server (NTRS)

Poole, E. L.

1991-01-01

Two direct Choleski equation solvers and two iterative preconditioned conjugate gradient (PCG) equation solvers used in a large structural analysis software system are described. The two direct solvers are implementations of the Choleski method for variable-band matrix storage and sparse matrix storage. The two iterative PCG solvers include the Jacobi conjugate gradient method and an incomplete Choleski conjugate gradient method. The performance of the direct and iterative solvers is compared by solving several representative structural analysis problems. Some key factors affecting the performance of the iterative solvers relative to the direct solvers are identified.
Algorithms and Application of Sparse Matrix Assembly and Equation Solvers for Aeroacoustics

NASA Technical Reports Server (NTRS)

Watson, W. R.; Nguyen, D. T.; Reddy, C. J.; Vatsa, V. N.; Tang, W. H.

2001-01-01

An algorithm for symmetric sparse equation solutions on an unstructured grid is described. Efficient, sequential sparse algorithms for degree-of-freedom reordering, supernodes, symbolic/numerical factorization, and forward backward solution phases are reviewed. Three sparse algorithms for the generation and assembly of symmetric systems of matrix equations are presented. The accuracy and numerical performance of the sequential version of the sparse algorithms are evaluated over the frequency range of interest in a three-dimensional aeroacoustics application. Results show that the solver solutions are accurate using a discretization of 12 points per wavelength. Results also show that the first assembly algorithm is impractical for high-frequency noise calculations. The second and third assembly algorithms have nearly equal performance at low values of source frequencies, but at higher values of source frequencies the third algorithm saves CPU time and RAM. The CPU time and the RAM required by the second and third assembly algorithms are two orders of magnitude smaller than that required by the sparse equation solver. A sequential version of these sparse algorithms can, therefore, be conveniently incorporated into a substructuring for domain decomposition formulation to achieve parallel computation, where different substructures are handles by different parallel processors.
Finite difference method accelerated with sparse solvers for structural analysis of the metal-organic complexes

NASA Astrophysics Data System (ADS)

Guda, A. A.; Guda, S. A.; Soldatov, M. A.; Lomachenko, K. A.; Bugaev, A. L.; Lamberti, C.; Gawelda, W.; Bressler, C.; Smolentsev, G.; Soldatov, A. V.; Joly, Y.

2016-05-01

Finite difference method (FDM) implemented in the FDMNES software [Phys. Rev. B, 2001, 63, 125120] was revised. Thorough analysis shows, that the calculated diagonal in the FDM matrix consists of about 96% zero elements. Thus a sparse solver would be more suitable for the problem instead of traditional Gaussian elimination for the diagonal neighbourhood. We have tried several iterative sparse solvers and the direct one MUMPS solver with METIS ordering turned out to be the best. Compared to the Gaussian solver present method is up to 40 times faster and allows XANES simulations for complex systems already on personal computers. We show applicability of the software for metal-organic [Fe(bpy)3]2+ complex both for low spin and high spin states populated after laser excitation.
Exploring Deep Learning and Sparse Matrix Format Selection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Y.; Liao, C.; Shen, X.

We proposed to explore the use of Deep Neural Networks (DNN) for addressing the longstanding barriers. The recent rapid progress of DNN technology has created a large impact in many fields, which has significantly improved the prediction accuracy over traditional machine learning techniques in image classifications, speech recognitions, machine translations, and so on. To some degree, these tasks resemble the decision makings in many HPC tasks, including the aforementioned format selection for SpMV and linear solver selection. For instance, sparse matrix format selection is akin to image classification—such as, to tell whether an image contains a dog or a cat;more » in both problems, the right decisions are primarily determined by the spatial patterns of the elements in an input. For image classification, the patterns are of pixels, and for sparse matrix format selection, they are of non-zero elements. DNN could be naturally applied if we regard a sparse matrix as an image and the format selection or solver selection as classification problems.« less
Amesos2 and Belos: Direct and Iterative Solvers for Large Sparse Linear Systems

DOE PAGES

Bavier, Eric; Hoemmen, Mark; Rajamanickam, Sivasankaran; ...

2012-01-01

Solvers for large sparse linear systems come in two categories: direct and iterative. Amesos2, a package in the Trilinos software project, provides direct methods, and Belos, another Trilinos package, provides iterative methods. Amesos2 offers a common interface to many different sparse matrix factorization codes, and can handle any implementation of sparse matrices and vectors, via an easy-to-extend C++ traits interface. It can also factor matrices whose entries have arbitrary “Scalar” type, enabling extended-precision and mixed-precision algorithms. Belos includes many different iterative methods for solving large sparse linear systems and least-squares problems. Unlike competing iterative solver libraries, Belos completely decouples themore » algorithms from the implementations of the underlying linear algebra objects. This lets Belos exploit the latest hardware without changes to the code. Belos favors algorithms that solve higher-level problems, such as multiple simultaneous linear systems and sequences of related linear systems, faster than standard algorithms. The package also supports extended-precision and mixed-precision algorithms. Together, Amesos2 and Belos form a complete suite of sparse linear solvers.« less
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*

PubMed Central

Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.

2014-01-01

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094
Laplace-domain waveform modeling and inversion for the 3D acoustic-elastic coupled media

NASA Astrophysics Data System (ADS)

Shin, Jungkyun; Shin, Changsoo; Calandra, Henri

2016-06-01

Laplace-domain waveform inversion reconstructs long-wavelength subsurface models by using the zero-frequency component of damped seismic signals. Despite the computational advantages of Laplace-domain waveform inversion over conventional frequency-domain waveform inversion, an acoustic assumption and an iterative matrix solver have been used to invert 3D marine datasets to mitigate the intensive computing cost. In this study, we develop a Laplace-domain waveform modeling and inversion algorithm for 3D acoustic-elastic coupled media by using a parallel sparse direct solver library (MUltifrontal Massively Parallel Solver, MUMPS). We precisely simulate a real marine environment by coupling the 3D acoustic and elastic wave equations with the proper boundary condition at the fluid-solid interface. In addition, we can extract the elastic properties of the Earth below the sea bottom from the recorded acoustic pressure datasets. As a matrix solver, the parallel sparse direct solver is used to factorize the non-symmetric impedance matrix in a distributed memory architecture and rapidly solve the wave field for a number of shots by using the lower and upper matrix factors. Using both synthetic datasets and real datasets obtained by a 3D wide azimuth survey, the long-wavelength component of the P-wave and S-wave velocity models is reconstructed and the proposed modeling and inversion algorithm are verified. A cluster of 80 CPU cores is used for this study.
BCYCLIC: A parallel block tridiagonal matrix cyclic solver

NASA Astrophysics Data System (ADS)

Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.

2010-09-01

A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
High-performance equation solvers and their impact on finite element analysis

NASA Technical Reports Server (NTRS)

Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. Dale, Jr.

1990-01-01

The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number of operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
High-performance equation solvers and their impact on finite element analysis

NASA Technical Reports Server (NTRS)

Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. D., Jr.

1992-01-01

The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number od operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
Addressing the computational cost of large EIT solutions.

PubMed

Boyle, Alistair; Borsic, Andrea; Adler, Andy

2012-05-01

Electrical impedance tomography (EIT) is a soft field tomography modality based on the application of electric current to a body and measurement of voltages through electrodes at the boundary. The interior conductivity is reconstructed on a discrete representation of the domain using a finite-element method (FEM) mesh and a parametrization of that domain. The reconstruction requires a sequence of numerically intensive calculations. There is strong interest in reducing the cost of these calculations. An improvement in the compute time for current problems would encourage further exploration of computationally challenging problems such as the incorporation of time series data, wide-spread adoption of three-dimensional simulations and correlation of other modalities such as CT and ultrasound. Multicore processors offer an opportunity to reduce EIT computation times but may require some restructuring of the underlying algorithms to maximize the use of available resources. This work profiles two EIT software packages (EIDORS and NDRM) to experimentally determine where the computational costs arise in EIT as problems scale. Sparse matrix solvers, a key component for the FEM forward problem and sensitivity estimates in the inverse problem, are shown to take a considerable portion of the total compute time in these packages. A sparse matrix solver performance measurement tool, Meagre-Crowd, is developed to interface with a variety of solvers and compare their performance over a range of two- and three-dimensional problems of increasing node density. Results show that distributed sparse matrix solvers that operate on multiple cores are advantageous up to a limit that increases as the node density increases. We recommend a selection procedure to find a solver and hardware arrangement matched to the problem and provide guidance and tools to perform that selection.
Three-Dimensional Inverse Transport Solver Based on Compressive Sensing Technique

NASA Astrophysics Data System (ADS)

Cheng, Yuxiong; Wu, Hongchun; Cao, Liangzhi; Zheng, Youqi

2013-09-01

According to the direct exposure measurements from flash radiographic image, a compressive sensing-based method for three-dimensional inverse transport problem is presented. The linear absorption coefficients and interface locations of objects are reconstructed directly at the same time. It is always very expensive to obtain enough measurements. With limited measurements, compressive sensing sparse reconstruction technique orthogonal matching pursuit is applied to obtain the sparse coefficients by solving an optimization problem. A three-dimensional inverse transport solver is developed based on a compressive sensing-based technique. There are three features in this solver: (1) AutoCAD is employed as a geometry preprocessor due to its powerful capacity in graphic. (2) The forward projection matrix rather than Gauss matrix is constructed by the visualization tool generator. (3) Fourier transform and Daubechies wavelet transform are adopted to convert an underdetermined system to a well-posed system in the algorithm. Simulations are performed and numerical results in pseudo-sine absorption problem, two-cube problem and two-cylinder problem when using compressive sensing-based solver agree well with the reference value.
Time integration algorithms for the two-dimensional Euler equations on unstructured meshes

NASA Technical Reports Server (NTRS)

Slack, David C.; Whitaker, D. L.; Walters, Robert W.

1994-01-01

Explicit and implicit time integration algorithms for the two-dimensional Euler equations on unstructured grids are presented. Both cell-centered and cell-vertex finite volume upwind schemes utilizing Roe's approximate Riemann solver are developed. For the cell-vertex scheme, a four-stage Runge-Kutta time integration, a fourstage Runge-Kutta time integration with implicit residual averaging, a point Jacobi method, a symmetric point Gauss-Seidel method and two methods utilizing preconditioned sparse matrix solvers are presented. For the cell-centered scheme, a Runge-Kutta scheme, an implicit tridiagonal relaxation scheme modeled after line Gauss-Seidel, a fully implicit lower-upper (LU) decomposition, and a hybrid scheme utilizing both Runge-Kutta and LU methods are presented. A reverse Cuthill-McKee renumbering scheme is employed for the direct solver to decrease CPU time by reducing the fill of the Jacobian matrix. A comparison of the various time integration schemes is made for both first-order and higher order accurate solutions using several mesh sizes, higher order accuracy is achieved by using multidimensional monotone linear reconstruction procedures. The results obtained for a transonic flow over a circular arc suggest that the preconditioned sparse matrix solvers perform better than the other methods as the number of elements in the mesh increases.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE PAGES

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...

2017-06-01

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
New algorithms for field-theoretic block copolymer simulations: Progress on using adaptive-mesh refinement and sparse matrix solvers in SCFT calculations

NASA Astrophysics Data System (ADS)

Sides, Scott; Jamroz, Ben; Crockett, Robert; Pletzer, Alexander

2012-02-01

Self-consistent field theory (SCFT) for dense polymer melts has been highly successful in describing complex morphologies in block copolymers. Field-theoretic simulations such as these are able to access large length and time scales that are difficult or impossible for particle-based simulations such as molecular dynamics. The modified diffusion equations that arise as a consequence of the coarse-graining procedure in the SCF theory can be efficiently solved with a pseudo-spectral (PS) method that uses fast-Fourier transforms on uniform Cartesian grids. However, PS methods can be difficult to apply in many block copolymer SCFT simulations (eg. confinement, interface adsorption) in which small spatial regions might require finer resolution than most of the simulation grid. Progress on using new solver algorithms to address these problems will be presented. The Tech-X Chompst project aims at marrying the best of adaptive mesh refinement with linear matrix solver algorithms. The Tech-X code PolySwift++ is an SCFT simulation platform that leverages ongoing development in coupling Chombo, a package for solving PDEs via block-structured AMR calculations and embedded boundaries, with PETSc, a toolkit that includes a large assortment of sparse linear solvers.

Acceleration of GPU-based Krylov solvers via data transfer reduction

DOE PAGES

Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...

2015-04-08

Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less
LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators

NASA Astrophysics Data System (ADS)

Gonzalez, Juan; Núñez, Rafael C.

2009-07-01

We present LAPACKrc, a family of FPGA-based linear algebra solvers able to achieve more than 100x speedup per commodity processor on certain problems. LAPACKrc subsumes some of the LAPACK and ScaLAPACK functionalities, and it also incorporates sparse direct and iterative matrix solvers. Current LAPACKrc prototypes demonstrate between 40x-150x speedup compared against top-of-the-line hardware/software systems. A technology roadmap is in place to validate current performance of LAPACKrc in HPC applications, and to increase the computational throughput by factors of hundreds within the next few years.
A comparison of SuperLU solvers on the intel MIC architecture

NASA Astrophysics Data System (ADS)

Tuncel, Mehmet; Duran, Ahmet; Celebi, M. Serdar; Akaydin, Bora; Topkaya, Figen O.

2016-10-01

In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU_MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
Three-dimensional unstructured grid Euler computations using a fully-implicit, upwind method

NASA Technical Reports Server (NTRS)

Whitaker, David L.

1993-01-01

A method has been developed to solve the Euler equations on a three-dimensional unstructured grid composed of tetrahedra. The method uses an upwind flow solver with a linearized, backward-Euler time integration scheme. Each time step results in a sparse linear system of equations which is solved by an iterative, sparse matrix solver. Local-time stepping, switched evolution relaxation (SER), preconditioning and reuse of the Jacobian are employed to accelerate the convergence rate. Implicit boundary conditions were found to be extremely important for fast convergence. Numerical experiments have shown that convergence rates comparable to that of a multigrid, central-difference scheme are achievable on the same mesh. Results are presented for several grids about an ONERA M6 wing.
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

DOE PAGES

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry; ...

2016-10-27

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
Matrix decomposition graphics processing unit solver for Poisson image editing

NASA Astrophysics Data System (ADS)

Lei, Zhao; Wei, Li

2012-10-01

In recent years, gradient-domain methods have been widely discussed in the image processing field, including seamless cloning and image stitching. These algorithms are commonly carried out by solving a large sparse linear system: the Poisson equation. However, solving the Poisson equation is a computational and memory intensive task which makes it not suitable for real-time image editing. A new matrix decomposition graphics processing unit (GPU) solver (MDGS) is proposed to settle the problem. A matrix decomposition method is used to distribute the work among GPU threads, so that MDGS will take full advantage of the computing power of current GPUs. Additionally, MDGS is a hybrid solver (combines both the direct and iterative techniques) and has two-level architecture. These enable MDGS to generate identical solutions with those of the common Poisson methods and achieve high convergence rate in most cases. This approach is advantageous in terms of parallelizability, enabling real-time image processing, low memory-taken and extensive applications.
Convergence Speed of a Dynamical System for Sparse Recovery

NASA Astrophysics Data System (ADS)

Balavoine, Aurele; Rozell, Christopher J.; Romberg, Justin

2013-09-01

This paper studies the convergence rate of a continuous-time dynamical system for L1-minimization, known as the Locally Competitive Algorithm (LCA). Solving L1-minimization} problems efficiently and rapidly is of great interest to the signal processing community, as these programs have been shown to recover sparse solutions to underdetermined systems of linear equations and come with strong performance guarantees. The LCA under study differs from the typical L1 solver in that it operates in continuous time: instead of being specified by discrete iterations, it evolves according to a system of nonlinear ordinary differential equations. The LCA is constructed from simple components, giving it the potential to be implemented as a large-scale analog circuit. The goal of this paper is to give guarantees on the convergence time of the LCA system. To do so, we analyze how the LCA evolves as it is recovering a sparse signal from underdetermined measurements. We show that under appropriate conditions on the measurement matrix and the problem parameters, the path the LCA follows can be described as a sequence of linear differential equations, each with a small number of active variables. This allows us to relate the convergence time of the system to the restricted isometry constant of the matrix. Interesting parallels to sparse-recovery digital solvers emerge from this study. Our analysis covers both the noisy and noiseless settings and is supported by simulation results.
Overcoming Challenges in Kinetic Modeling of Magnetized Plasmas and Vacuum Electronic Devices

NASA Astrophysics Data System (ADS)

Omelchenko, Yuri; Na, Dong-Yeop; Teixeira, Fernando

2017-10-01

We transform the state-of-the art of plasma modeling by taking advantage of novel computational techniques for fast and robust integration of multiscale hybrid (full particle ions, fluid electrons, no displacement current) and full-PIC models. These models are implemented in 3D HYPERS and axisymmetric full-PIC CONPIC codes. HYPERS is a massively parallel, asynchronous code. The HYPERS solver does not step fields and particles synchronously in time but instead executes local variable updates (events) at their self-adaptive rates while preserving fundamental conservation laws. The charge-conserving CONPIC code has a matrix-free explicit finite-element (FE) solver based on a sparse-approximate inverse (SPAI) algorithm. This explicit solver approximates the inverse FE system matrix (``mass'' matrix) using successive sparsity pattern orders of the original matrix. It does not reduce the set of Maxwell's equations to a vector-wave (curl-curl) equation of second order but instead utilizes the standard coupled first-order Maxwell's system. We discuss the ability of our codes to accurately and efficiently account for multiscale physical phenomena in 3D magnetized space and laboratory plasmas and axisymmetric vacuum electronic devices.
Development of a steady potential solver for use with linearized, unsteady aerodynamic analyses

NASA Technical Reports Server (NTRS)

Hoyniak, Daniel; Verdon, Joseph M.

1991-01-01

A full potential steady flow solver (SFLOW) developed explicitly for use with an inviscid unsteady aerodynamic analysis (LINFLO) is described. The steady solver uses the nonconservative form of the nonlinear potential flow equations together with an implicit, least squares, finite difference approximation to solve for the steady flow field. The difference equations were developed on a composite mesh which consists of a C grid embedded in a rectilinear (H grid) cascade mesh. The composite mesh is capable of resolving blade to blade and far field phenomena on the H grid, while accurately resolving local phenomena on the C grid. The resulting system of algebraic equations is arranged in matrix form using a sparse matrix package and solved by Newton's method. Steady and unsteady results are presented for two cascade configurations: a high speed compressor and a turbine with high exit Mach number.
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

DOE PAGES

Azad, Ariful; Ballard, Grey; Buluc, Aydin; ...

2016-11-08

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achievingmore » significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
A manual for PARTI runtime primitives

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel

1990-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Performance issues for iterative solvers in device simulation

NASA Technical Reports Server (NTRS)

Fan, Qing; Forsyth, P. A.; Mcmacken, J. R. F.; Tang, Wei-Pai

1994-01-01

Due to memory limitations, iterative methods have become the method of choice for large scale semiconductor device simulation. However, it is well known that these methods still suffer from reliability problems. The linear systems which appear in numerical simulation of semiconductor devices are notoriously ill-conditioned. In order to produce robust algorithms for practical problems, careful attention must be given to many implementation issues. This paper concentrates on strategies for developing robust preconditioners. In addition, effective data structures and convergence check issues are also discussed. These algorithms are compared with a standard direct sparse matrix solver on a variety of problems.
Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units.

PubMed

Qi, Ruxi; Botello-Smith, Wesley M; Luo, Ray

2017-07-11

Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this study, we implemented and analyzed commonly used linear PBE solvers for the ever-improving graphics processing units (GPU) for biomolecular simulations, including both standard and preconditioned conjugate gradient (CG) solvers with several alternative preconditioners. Our implementation utilizes the standard Nvidia CUDA libraries cuSPARSE, cuBLAS, and CUSP. Extensive tests show that good numerical accuracy can be achieved given that the single precision is often used for numerical applications on GPU platforms. The optimal GPU performance was observed with the Jacobi-preconditioned CG solver, with a significant speedup over standard CG solver on CPU in our diversified test cases. Our analysis further shows that different matrix storage formats also considerably affect the efficiency of different linear PBE solvers on GPU, with the diagonal format best suited for our standard finite-difference linear systems. Further efficiency may be possible with matrix-free operations and integrated grid stencil setup specifically tailored for the banded matrices in PBE-specific linear systems.
A manual for PARTI runtime primitives, revision 1

NASA Technical Reports Server (NTRS)

Das, Raja; Saltz, Joel; Berryman, Harry

1991-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

NASA Astrophysics Data System (ADS)

Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization.

PubMed

Lu, Canyi; Lin, Zhouchen; Yan, Shuicheng

2015-02-01

This paper presents a general framework for solving the low-rank and/or sparse matrix minimization problems, which may involve multiple nonsmooth terms. The iteratively reweighted least squares (IRLSs) method is a fast solver, which smooths the objective function and minimizes it by alternately updating the variables and their weights. However, the traditional IRLS can only solve a sparse only or low rank only minimization problem with squared loss or an affine constraint. This paper generalizes IRLS to solve joint/mixed low-rank and sparse minimization problems, which are essential formulations for many tasks. As a concrete example, we solve the Schatten-p norm and l2,q-norm regularized low-rank representation problem by IRLS, and theoretically prove that the derived solution is a stationary point (globally optimal if p,q ≥ 1). Our convergence proof of IRLS is more general than previous one that depends on the special properties of the Schatten-p norm and l2,q-norm. Extensive experiments on both synthetic and real data sets demonstrate that our IRLS is much more efficient.
Methods for design and evaluation of integrated hardware-software systems for concurrent computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

Research activities and publications are briefly summarized. The major tasks reviewed are: (1) VAX implementation of the PISCES parallel programming environment; (2) Apollo workstation network implementation of the PISCES environment; (3) FLEX implementation of the PISCES environment; (4) sparse matrix iterative solver in PSICES Fortran; (5) image processing application of PISCES; and (6) a formal model of concurrent computation being developed.
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carleton, James Brian; Parks, Michael L.

Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less

Efficient Computation of Sparse Matrix Functions for Large-Scale Electronic Structure Calculations: The CheSS Library.

PubMed

Mohr, Stephan; Dawson, William; Wagner, Michael; Caliste, Damien; Nakajima, Takahito; Genovese, Luigi

2017-10-10

We present CheSS, the "Chebyshev Sparse Solvers" library, which has been designed to solve typical problems arising in large-scale electronic structure calculations using localized basis sets. The library is based on a flexible and efficient expansion in terms of Chebyshev polynomials and presently features the calculation of the density matrix, the calculation of matrix powers for arbitrary powers, and the extraction of eigenvalues in a selected interval. CheSS is able to exploit the sparsity of the matrices and scales linearly with respect to the number of nonzero entries, making it well-suited for large-scale calculations. The approach is particularly adapted for setups leading to small spectral widths of the involved matrices and outperforms alternative methods in this regime. By coupling CheSS to the DFT code BigDFT, we show that such a favorable setup is indeed possible in practice. In addition, the approach based on Chebyshev polynomials can be massively parallelized, and CheSS exhibits excellent scaling up to thousands of cores even for relatively small matrix sizes.
A fast time-difference inverse solver for 3D EIT with application to lung imaging.

PubMed

Javaherian, Ashkan; Soleimani, Manuchehr; Moeller, Knut

2016-08-01

A class of sparse optimization techniques that require solely matrix-vector products, rather than an explicit access to the forward matrix and its transpose, has been paid much attention in the recent decade for dealing with large-scale inverse problems. This study tailors application of the so-called Gradient Projection for Sparse Reconstruction (GPSR) to large-scale time-difference three-dimensional electrical impedance tomography (3D EIT). 3D EIT typically suffers from the need for a large number of voxels to cover the whole domain, so its application to real-time imaging, for example monitoring of lung function, remains scarce since the large number of degrees of freedom of the problem extremely increases storage space and reconstruction time. This study shows the great potential of the GPSR for large-size time-difference 3D EIT. Further studies are needed to improve its accuracy for imaging small-size anomalies.
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

DOE PAGES

Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...

2016-06-30

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Solvers for $$\\mathcal{O} (N)$$ Electronic Structure in the Strong Scaling Limit

DOE PAGES

Bock, Nicolas; Challacombe, William M.; Kale, Laxmikant

2016-01-26

Here we present a hybrid OpenMP/Charm\\tt++ framework for solving themore » $$\\mathcal{O} (N)$$ self-consistent-field eigenvalue problem with parallelism in the strong scaling regime, $$P\\gg{N}$$, where $P$ is the number of cores, and $N$ is a measure of system size, i.e., the number of matrix rows/columns, basis functions, atoms, molecules, etc. This result is achieved with a nested approach to spectral projection and the sparse approximate matrix multiply [Bock and Challacombe, SIAM J. Sci. Comput., 35 (2013), pp. C72--C98], and involves a recursive, task-parallel algorithm, often employed by generalized $N$-Body solvers, to occlusion and culling of negligible products in the case of matrices with decay. Lastly, employing classic technologies associated with generalized $N$-Body solvers, including overdecomposition, recursive task parallelism, orderings that preserve locality, and persistence-based load balancing, we obtain scaling beyond hundreds of cores per molecule for small water clusters ([H$${}_2$$O]$${}_N$$, $$N \\in \\{ 30, 90, 150 \\}$$, $$P/N \\approx \\{ 819, 273, 164 \\}$$) and find support for an increasingly strong scalability with increasing system size $N$.« less
A Performance Comparison of the Parallel Preconditioners for Iterative Methods for Large Sparse Linear Systems Arising from Partial Differential Equations on Structured Grids

NASA Astrophysics Data System (ADS)

Ma, Sangback

In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
Exploiting Data Sparsity in Parallel Matrix Powers Computations

DTIC Science & Technology

2013-05-03

2013 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour...matrices of the form A = D+USV H, where D is sparse and USV H has low rank but may be dense. Matrices of this form arise in many practical applications...methods numerical partial di erential equation solvers, and preconditioned iterative methods. If A has this form , our algorithm enables a communication
GPU-accelerated element-free reverse-time migration with Gauss points partition

NASA Astrophysics Data System (ADS)

Zhou, Zhen; Jia, Xiaofeng; Qiang, Xiaodong

2018-06-01

An element-free method (EFM) has been demonstrated successfully in elasticity, heat conduction and fatigue crack growth problems. We present the theory of EFM and its numerical applications in seismic modelling and reverse time migration (RTM). Compared with the finite difference method and the finite element method, the EFM has unique advantages: (1) independence of grids in computation and (2) lower expense and more flexibility (because only the information of the nodes and the boundary of the concerned area is required). However, in EFM, due to improper computation and storage of some large sparse matrices, such as the mass matrix and the stiffness matrix, the method is difficult to apply to seismic modelling and RTM for a large velocity model. To solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition and utilise the graphics processing unit to improve the computational efficiency. We employ the compressed sparse row format to compress the intermediate large sparse matrices and attempt to simplify the operations by solving the linear equations with CULA solver. To improve the computation efficiency further, we introduce the concept of the lumped mass matrix. Numerical experiments indicate that the proposed method is accurate and more efficient than the regular EFM.
Coupled Modeling of Hydrodynamics and Sound in Coastal Ocean for Renewable Ocean Energy Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Long, Wen; Jung, Ki Won; Yang, Zhaoqing

An underwater sound model was developed to simulate sound propagation from marine and hydrokinetic energy (MHK) devices or offshore wind (OSW) energy platforms. Finite difference methods were developed to solve the 3D Helmholtz equation for sound propagation in the coastal environment. A 3D sparse matrix solver with complex coefficients was formed for solving the resulting acoustic pressure field. The Complex Shifted Laplacian Preconditioner (CSLP) method was applied to solve the matrix system iteratively with MPI parallelization using a high performance cluster. The sound model was then coupled with the Finite Volume Community Ocean Model (FVCOM) for simulating sound propagation generatedmore » by human activities, such as construction of OSW turbines or tidal stream turbine operations, in a range-dependent setting. As a proof of concept, initial validation of the solver is presented for two coastal wedge problems. This sound model can be useful for evaluating impacts on marine mammals due to deployment of MHK devices and OSW energy platforms.« less
A Shifted Block Lanczos Algorithm 1: The Block Recurrence

NASA Technical Reports Server (NTRS)

Grimes, Roger G.; Lewis, John G.; Simon, Horst D.

1990-01-01

In this paper we describe a block Lanczos algorithm that is used as the key building block of a software package for the extraction of eigenvalues and eigenvectors of large sparse symmetric generalized eigenproblems. The software package comprises: a version of the block Lanczos algorithm specialized for spectrally transformed eigenproblems; an adaptive strategy for choosing shifts, and efficient codes for factoring large sparse symmetric indefinite matrices. This paper describes the algorithmic details of our block Lanczos recurrence. This uses a novel combination of block generalizations of several features that have only been investigated independently in the past. In particular new forms of partial reorthogonalization, selective reorthogonalization and local reorthogonalization are used, as is a new algorithm for obtaining the M-orthogonal factorization of a matrix. The heuristic shifting strategy, the integration with sparse linear equation solvers and numerical experience with the code are described in a companion paper.
Solving lattice QCD systems of equations using mixed precision solvers on GPUs

NASA Astrophysics Data System (ADS)

Clark, M. A.; Babich, R.; Barros, K.; Brower, R. C.; Rebbi, C.

2010-09-01

Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40, 135 and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.
Project APhiD: A Lorenz-gauged A-Φ decomposition for parallelized computation of ultra-broadband electromagnetic induction in a fully heterogeneous Earth

NASA Astrophysics Data System (ADS)

Weiss, Chester J.

2013-08-01

An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

PubMed Central

Papalexakis, Evangelos E.; Faloutsos, Christos; Mitchell, Tom M.; Talukdar, Partha Pratim; Sidiropoulos, Nicholas D.; Murphy, Brian

2016-01-01

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ’edible’, ’fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, ’friends’, wall-postings); there, Turbo-SMT spots spammer-like anomalies. PMID:27672406
Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

NASA Astrophysics Data System (ADS)

Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

2016-07-01

Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
An Optimization Code for Nonlinear Transient Problems of a Large Scale Multidisciplinary Mathematical Model

NASA Astrophysics Data System (ADS)

Takasaki, Koichi

This paper presents a program for the multidisciplinary optimization and identification problem of the nonlinear model of large aerospace vehicle structures. The program constructs the global matrix of the dynamic system in the time direction by the p-version finite element method (pFEM), and the basic matrix for each pFEM node in the time direction is described by a sparse matrix similarly to the static finite element problem. The algorithm used by the program does not require the Hessian matrix of the objective function and so has low memory requirements. It also has a relatively low computational cost, and is suited to parallel computation. The program was integrated as a solver module of the multidisciplinary analysis system CUMuLOUS (Computational Utility for Multidisciplinary Large scale Optimization of Undense System) which is under development by the Aerospace Research and Development Directorate (ARD) of the Japan Aerospace Exploration Agency (JAXA).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
A Note on Substructuring Preconditioning for Nonconforming Finite Element Approximations of Second Order Elliptic Problems

NASA Technical Reports Server (NTRS)

Maliassov, Serguei

1996-01-01

In this paper an algebraic substructuring preconditioner is considered for nonconforming finite element approximations of second order elliptic problems in 3D domains with a piecewise constant diffusion coefficient. Using a substructuring idea and a block Gauss elimination, part of the unknowns is eliminated and the Schur complement obtained is preconditioned by a spectrally equivalent very sparse matrix. In the case of quasiuniform tetrahedral mesh an appropriate algebraic multigrid solver can be used to solve the problem with this matrix. Explicit estimates of condition numbers and implementation algorithms are established for the constructed preconditioner. It is shown that the condition number of the preconditioned matrix does not depend on either the mesh step size or the jump of the coefficient. Finally, numerical experiments are presented to illustrate the theory being developed.
CPDES3: A preconditioned conjugate gradient solver for linear asymmetric matrix equations arising from coupled partial differential equations in three dimensions

NASA Astrophysics Data System (ADS)

Anderson, D. V.; Koniges, A. E.; Shumaker, D. E.

1988-11-01

Many physical problems require the solution of coupled partial differential equations on three-dimensional domains. When the time scales of interest dictate an implicit discretization of the equations a rather complicated global matrix system needs solution. The exact form of the matrix depends on the choice of spatial grids and on the finite element or finite difference approximations employed. CPDES3 allows each spatial operator to have 7, 15, 19, or 27 point stencils and allows for general couplings between all of the component PDE's and it automatically generates the matrix structures needed to perform the algorithm. The resulting sparse matrix equation is solved by either the preconditioned conjugate gradient (CG) method or by the preconditioned biconjugate gradient (BCG) algorithm. An arbitrary number of component equations are permitted only limited by available memory. In the sub-band representation used, we generate an algorithm that is written compactly in terms of indirect induces which is vectorizable on some of the newer scientific computers.
CPDES2: A preconditioned conjugate gradient solver for linear asymmetric matrix equations arising from coupled partial differential equations in two dimensions

NASA Astrophysics Data System (ADS)

Anderson, D. V.; Koniges, A. E.; Shumaker, D. E.

1988-11-01

Many physical problems require the solution of coupled partial differential equations on two-dimensional domains. When the time scales of interest dictate an implicit discretization of the equations a rather complicated global matrix system needs solution. The exact form of the matrix depends on the choice of spatial grids and on the finite element or finite difference approximations employed. CPDES2 allows each spatial operator to have 5 or 9 point stencils and allows for general couplings between all of the component PDE's and it automatically generates the matrix structures needed to perform the algorithm. The resulting sparse matrix equation is solved by either the preconditioned conjugate gradient (CG) method or by the preconditioned biconjugate gradient (BCG) algorithm. An arbitrary number of component equations are permitted only limited by available memory. In the sub-band representation used, we generate an algorithm that is written compactly in terms of indirect indices which is vectorizable on some of the newer scientific computers.

Unsymmetric ordering using a constrained Markowitz scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amestoy, Patrick R.; Xiaoye S.; Pralet, Stephane

2005-01-18

We present a family of ordering algorithms that can be used as a preprocessing step prior to performing sparse LU factorization. The ordering algorithms simultaneously achieve the objectives of selecting numerically good pivots and preserving the sparsity. We describe the algorithmic properties and challenges in their implementation. By mixing the two objectives we show that we can reduce the amount of fill-in in the factors and reduce the number of numerical problems during factorization. On a set of large unsymmetric real problems, we obtained the median reductions of 12% in the factorization time, of 13% in the size of themore » LU factors, of 20% in the number of operations performed during the factorization phase, and of 11% in the memory needed by the multifrontal solver MA41-UNS. A byproduct of this ordering strategy is an incomplete LU-factored matrix that can be used as a preconditioner in an iterative solver.« less
Analysis, tuning and comparison of two general sparse solvers for distributed memory computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amestoy, P.R.; Duff, I.S.; L'Excellent, J.-Y.

2000-06-30

We describe the work performed in the context of a Franco-Berkeley funded project between NERSC-LBNL located in Berkeley (USA) and CERFACS-ENSEEIHT located in Toulouse (France). We discuss both the tuning and performance analysis of two distributed memory sparse solvers (superlu from Berkeley and mumps from Toulouse) on the 512 processor Cray T3E from NERSC (Lawrence Berkeley National Laboratory). This project gave us the opportunity to improve the algorithms and add new features to the codes. We then quite extensively analyze and compare the two approaches on a set of large problems from real applications. We further explain the main differencesmore » in the behavior of the approaches on artificial regular grid problems. As a conclusion to this activity report, we mention a set of parallel sparse solvers on which this type of study should be extended.« less
ML 3.0 smoothed aggregation user's guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sala, Marzio; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen

2004-05-01

ML is a multigrid preconditioning package intended to solve linear systems of equations Az = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package ormore » to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the AZTEC 2.1 and AZTECOO iterative package [15]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and non-symmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.« less
ML 3.1 smoothed aggregation user's guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sala, Marzio; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen

2004-10-01

ML is a multigrid preconditioning package intended to solve linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package ormore » to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the Aztec 2.1 and AztecOO iterative package [16]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.« less
Three-Dimensional Nacelle Aeroacoustics Code With Application to Impedance Education

NASA Technical Reports Server (NTRS)

Watson, Willie R.

2000-01-01

A three-dimensional nacelle acoustics code that accounts for uniform mean flow and variable surface impedance liners is developed. The code is linked to a commercial version of the NASA-developed General Purpose Solver (for solution of linear systems of equations) in order to obtain the capability to study high frequency waves that may require millions of grid points for resolution. Detailed, single-processor statistics for the performance of the solver in rigid and soft-wall ducts are presented. Over the range of frequencies of current interest in nacelle liner research, noise attenuation levels predicted from the code were in excellent agreement with those predicted from mode theory. The equation solver is memory efficient, requiring only a small fraction of the memory available on modern computers. As an application, the code is combined with an optimization algorithm and used to reduce the impedance spectrum of a ceramic liner. The primary problem with using the code to perform optimization studies at frequencies above I1kHz is the excessive CPU time (a major portion of which is matrix assembly). The research recommends that research be directed toward development of a rapid sparse assembler and exploitation of the multiprocessor capability of the solver to further reduce CPU time.
A pressure flux-split technique for computation of inlet flow behavior

NASA Technical Reports Server (NTRS)

Pordal, H. S.; Khosla, P. K.; Rubin, S. G.

1991-01-01

A method for calculating the flow field in aircraft engine inlets is presented. The phenomena of inlet unstart and restart are investigated. Solutions of the reduced Navier-Stokes (RNS) equations are obtained with a time consistent direct sparse matrix solver that computes the transient flow field both internal and external to the inlet. Time varying shocks and time varying recirculation regions can be efficiently analyzed. The code is quite general and is suitable for the computation of flow for a wide variety of geometries and over a wide range of Mach and Reynolds numbers.
Wavelet-like bases for thin-wire integral equations in electromagnetics

NASA Astrophysics Data System (ADS)

Francomano, E.; Tortorici, A.; Toscano, E.; Ala, G.; Viola, F.

2005-03-01

In this paper, wavelets are used in solving, by the method of moments, a modified version of the thin-wire electric field integral equation, in frequency domain. The time domain electromagnetic quantities, are obtained by using the inverse discrete fast Fourier transform. The retarded scalar electric and vector magnetic potentials are employed in order to obtain the integral formulation. The discretized model generated by applying the direct method of moments via point-matching procedure, results in a linear system with a dense matrix which have to be solved for each frequency of the Fourier spectrum of the time domain impressed source. Therefore, orthogonal wavelet-like basis transform is used to sparsify the moment matrix. In particular, dyadic and M-band wavelet transforms have been adopted, so generating different sparse matrix structures. This leads to an efficient solution in solving the resulting sparse matrix equation. Moreover, a wavelet preconditioner is used to accelerate the convergence rate of the iterative solver employed. These numerical features are used in analyzing the transient behavior of a lightning protection system. In particular, the transient performance of the earth termination system of a lightning protection system or of the earth electrode of an electric power substation, during its operation is focused. The numerical results, obtained by running a complex structure, are discussed and the features of the used method are underlined.
Simultaneous analysis of large INTEGRAL/SPI1 datasets: Optimizing the computation of the solution and its variance using sparse matrix algorithms

NASA Astrophysics Data System (ADS)

Bouchet, L.; Amestoy, P.; Buttari, A.; Rouet, F.-H.; Chauvin, M.

2013-02-01

Nowadays, analyzing and reducing the ever larger astronomical datasets is becoming a crucial challenge, especially for long cumulated observation times. The INTEGRAL/SPI X/γ-ray spectrometer is an instrument for which it is essential to process many exposures at the same time in order to increase the low signal-to-noise ratio of the weakest sources. In this context, the conventional methods for data reduction are inefficient and sometimes not feasible at all. Processing several years of data simultaneously requires computing not only the solution of a large system of equations, but also the associated uncertainties. We aim at reducing the computation time and the memory usage. Since the SPI transfer function is sparse, we have used some popular methods for the solution of large sparse linear systems; we briefly review these methods. We use the Multifrontal Massively Parallel Solver (MUMPS) to compute the solution of the system of equations. We also need to compute the variance of the solution, which amounts to computing selected entries of the inverse of the sparse matrix corresponding to our linear system. This can be achieved through one of the latest features of the MUMPS software that has been partly motivated by this work. In this paper we provide a brief presentation of this feature and evaluate its effectiveness on astrophysical problems requiring the processing of large datasets simultaneously, such as the study of the entire emission of the Galaxy. We used these algorithms to solve the large sparse systems arising from SPI data processing and to obtain both their solutions and the associated variances. In conclusion, thanks to these newly developed tools, processing large datasets arising from SPI is now feasible with both a reasonable execution time and a low memory usage.
A Method for Optimizing Non-Axisymmetric Liners for Multimodal Sound Sources

NASA Technical Reports Server (NTRS)

Watson, W. R.; Jones, M. G.; Parrott, T. L.; Sobieski, J.

2002-01-01

Central processor unit times and memory requirements for a commonly used solver are compared to that of a state-of-the-art, parallel, sparse solver. The sparse solver is then used in conjunction with three constrained optimization methodologies to assess the relative merits of non-axisymmetric versus axisymmetric liner concepts for improving liner acoustic suppression. This assessment is performed with a multimodal noise source (with equal mode amplitudes and phases) in a finite-length rectangular duct without flow. The sparse solver is found to reduce memory requirements by a factor of five and central processing time by a factor of eleven when compared with the commonly used solver. Results show that the optimum impedance of the uniform liner is dominated by the least attenuated mode, whose attenuation is maximized by the Cremer optimum impedance. An optimized, four-segmented liner with impedance segments in a checkerboard arrangement is found to be inferior to an optimized spanwise segmented liner. This optimized spanwise segmented liner is shown to attenuate substantially more sound than the optimized uniform liner and tends to be more effective at the higher frequencies. The most important result of this study is the discovery that when optimized, a spanwise segmented liner with two segments gives attenuations equal to or substantially greater than an optimized axially segmented liner with the same number of segments.
Implicit solvers for unstructured meshes

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Mavriplis, Dimitri J.

1991-01-01

Implicit methods for unstructured mesh computations are developed and tested. The approximate system which arises from the Newton-linearization of the nonlinear evolution operator is solved by using the preconditioned generalized minimum residual technique. These different preconditioners are investigated: the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over-relaxation (SSOR). The preconditioners have been optimized to have good vectorization properties. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also investigated. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R.

PubMed

Pang, Haotian; Liu, Han; Vanderbei, Robert

2014-02-01

We develop an R package fastclime for solving a family of regularized linear programming (LP) problems. Our package efficiently implements the parametric simplex algorithm, which provides a scalable and sophisticated tool for solving large-scale linear programs. As an illustrative example, one use of our LP solver is to implement an important sparse precision matrix estimation method called CLIME (Constrained L 1 Minimization Estimator). Compared with existing packages for this problem such as clime and flare, our package has three advantages: (1) it efficiently calculates the full piecewise-linear regularization path; (2) it provides an accurate dual certificate as stopping criterion; (3) it is completely coded in C and is highly portable. This package is designed to be useful to statisticians and machine learning researchers for solving a wide range of problems.
Object-Oriented Design for Sparse Direct Solvers

NASA Technical Reports Server (NTRS)

Dobrian, Florin; Kumfert, Gary; Pothen, Alex

1999-01-01

We discuss the object-oriented design of a software package for solving sparse, symmetric systems of equations (positive definite and indefinite) by direct methods. At the highest layers, we decouple data structure classes from algorithmic classes for flexibility. We describe the important structural and algorithmic classes in our design, and discuss the trade-offs we made for high performance. The kernels at the lower layers were optimized by hand. Our results show no performance loss from our object-oriented design, while providing flexibility, case of use, and extensibility over solvers using procedural design.
Extending fields in a level set method by solving a biharmonic equation

NASA Astrophysics Data System (ADS)

Moroney, Timothy J.; Lusmore, Dylan R.; McCue, Scott W.; McElwain, D. L. Sean

2017-08-01

We present an approach for computing extensions of velocities or other fields in level set methods by solving a biharmonic equation. The approach differs from other commonly used approaches to velocity extension because it deals with the interface fully implicitly through the level set function. No explicit properties of the interface, such as its location or the velocity on the interface, are required in computing the extension. These features lead to a particularly simple implementation using either a sparse direct solver or a matrix-free conjugate gradient solver. Furthermore, we propose a fast Poisson preconditioner that can be used to accelerate the convergence of the latter. We demonstrate the biharmonic extension on a number of test problems that serve to illustrate its effectiveness at producing smooth and accurate extensions near interfaces. A further feature of the method is the natural way in which it deals with symmetry and periodicity, ensuring through its construction that the extension field also respects these symmetries.
SU-G-TeP1-15: Toward a Novel GPU Accelerated Deterministic Solution to the Linear Boltzmann Transport Equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, R; Fallone, B; Cross Cancer Institute, Edmonton, AB

Purpose: To develop a Graphic Processor Unit (GPU) accelerated deterministic solution to the Linear Boltzmann Transport Equation (LBTE) for accurate dose calculations in radiotherapy (RT). A deterministic solution yields the potential for major speed improvements due to the sparse matrix-vector and vector-vector multiplications and would thus be of benefit to RT. Methods: In order to leverage the massively parallel architecture of GPUs, the first order LBTE was reformulated as a second order self-adjoint equation using the Least Squares Finite Element Method (LSFEM). This produces a symmetric positive-definite matrix which is efficiently solved using a parallelized conjugate gradient (CG) solver. Themore » LSFEM formalism is applied in space, discrete ordinates is applied in angle, and the Multigroup method is applied in energy. The final linear system of equations produced is tightly coupled in space and angle. Our code written in CUDA-C was benchmarked on an Nvidia GeForce TITAN-X GPU against an Intel i7-6700K CPU. A spatial mesh of 30,950 tetrahedral elements was used with an S4 angular approximation. Results: To avoid repeating a full computationally intensive finite element matrix assembly at each Multigroup energy, a novel mapping algorithm was developed which minimized the operations required at each energy. Additionally, a parallelized memory mapping for the kronecker product between the sparse spatial and angular matrices, including Dirichlet boundary conditions, was created. Atomicity is preserved by graph-coloring overlapping nodes into separate kernel launches. The one-time mapping calculations for matrix assembly, kronecker product, and boundary condition application took 452±1ms on GPU. Matrix assembly for 16 energy groups took 556±3s on CPU, and 358±2ms on GPU using the mappings developed. The CG solver took 93±1s on CPU, and 468±2ms on GPU. Conclusion: Three computationally intensive subroutines in deterministically solving the LBTE have been formulated on GPU, resulting in two orders of magnitude speedup. Funding support from Natural Sciences and Engineering Research Council and Alberta Innovates Health Solutions. Dr. Fallone is a co-founder and CEO of MagnetTx Oncology Solutions (under discussions to license Alberta bi-planar linac MR for commercialization).« less
Semi-automatic sparse preconditioners for high-order finite element methods on non-uniform meshes

NASA Astrophysics Data System (ADS)

Austin, Travis M.; Brezina, Marian; Jamroz, Ben; Jhurani, Chetan; Manteuffel, Thomas A.; Ruge, John

2012-05-01

High-order finite elements often have a higher accuracy per degree of freedom than the classical low-order finite elements. However, in the context of implicit time-stepping methods, high-order finite elements present challenges to the construction of efficient simulations due to the high cost of inverting the denser finite element matrix. There are many cases where simulations are limited by the memory required to store the matrix and/or the algorithmic components of the linear solver. We are particularly interested in preconditioned Krylov methods for linear systems generated by discretization of elliptic partial differential equations with high-order finite elements. Using a preconditioner like Algebraic Multigrid can be costly in terms of memory due to the need to store matrix information at the various levels. We present a novel method for defining a preconditioner for systems generated by high-order finite elements that is based on a much sparser system than the original high-order finite element system. We investigate the performance for non-uniform meshes on a cube and a cubed sphere mesh, showing that the sparser preconditioner is more efficient and uses significantly less memory. Finally, we explore new methods to construct the sparse preconditioner and examine their effectiveness for non-uniform meshes. We compare results to a direct use of Algebraic Multigrid as a preconditioner and to a two-level additive Schwarz method.
A Fast MoM Solver (GIFFT) for Large Arrays of Microstrip and Cavity-Backed Antennas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fasenfest, B J; Capolino, F; Wilton, D

2005-02-02

A straightforward numerical analysis of large arrays of arbitrary contour (and possibly missing elements) requires large memory storage and long computation times. Several techniques are currently under development to reduce this cost. One such technique is the GIFFT (Green's function interpolation and FFT) method discussed here that belongs to the class of fast solvers for large structures. This method uses a modification of the standard AIM approach [1] that takes into account the reusability properties of matrices that arise from identical array elements. If the array consists of planar conducting bodies, the array elements are meshed using standard subdomain basismore » functions, such as the RWG basis. The Green's function is then projected onto a sparse regular grid of separable interpolating polynomials. This grid can then be used in a 2D or 3D FFT to accelerate the matrix-vector product used in an iterative solver [2]. The method has been proven to greatly reduce solve time by speeding up the matrix-vector product computation. The GIFFT approach also reduces fill time and memory requirements, since only the near element interactions need to be calculated exactly. The present work extends GIFFT to layered material Green's functions and multiregion interactions via slots in ground planes. In addition, a preconditioner is implemented to greatly reduce the number of iterations required for a solution. The general scheme of the GIFFT method is reported in [2]; this contribution is limited to presenting new results for array antennas made of slot-excited patches and cavity-backed patch antennas.« less
The Development of a Finite Volume Method for Modeling Sound in Coastal Ocean Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Long, Wen; Yang, Zhaoqing; Copping, Andrea E.

: As the rapid growth of marine renewable energy and off-shore wind energy, there have been concerns that the noises generated from construction and operation of the devices may interfere marine animals’ communication. In this research, a underwater sound model is developed to simulate sound prorogation generated by marine-hydrokinetic energy (MHK) devices or offshore wind (OSW) energy platforms. Finite volume and finite difference methods are developed to solve the 3D Helmholtz equation of sound propagation in the coastal environment. For finite volume method, the grid system consists of triangular grids in horizontal plane and sigma-layers in vertical dimension. A 3Dmore » sparse matrix solver with complex coefficients is formed for solving the resulting acoustic pressure field. The Complex Shifted Laplacian Preconditioner (CSLP) method is applied to efficiently solve the matrix system iteratively with MPI parallelization using a high performance cluster. The sound model is then coupled with the Finite Volume Community Ocean Model (FVCOM) for simulating sound propagation generated by human activities in a range-dependent setting, such as offshore wind energy platform constructions and tidal stream turbines. As a proof of concept, initial validation of the finite difference solver is presented for two coastal wedge problems. Validation of finite volume method will be reported separately.« less
Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems.

PubMed

Anzt, H; Quintana-Ortí, E S

2014-06-28

While most recent breakthroughs in scientific research rely on complex simulations carried out in large-scale supercomputers, the power draft and energy spent for this purpose is increasingly becoming a limiting factor to this trend. In this paper, we provide an overview of the current status in energy-efficient scientific computing by reviewing different technologies used to monitor power draft as well as power- and energy-saving mechanisms available in commodity hardware. For the particular domain of sparse linear algebra, we analyse the energy efficiency of a broad collection of hardware architectures and investigate how algorithmic and implementation modifications can improve the energy performance of sparse linear system solvers, without negatively impacting their performance. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Computational efficiency improvements for image colorization

NASA Astrophysics Data System (ADS)

Yu, Chao; Sharma, Gaurav; Aly, Hussein

2013-03-01

We propose an efficient algorithm for colorization of greyscale images. As in prior work, colorization is posed as an optimization problem: a user specifies the color for a few scribbles drawn on the greyscale image and the color image is obtained by propagating color information from the scribbles to surrounding regions, while maximizing the local smoothness of colors. In this formulation, colorization is obtained by solving a large sparse linear system, which normally requires substantial computation and memory resources. Our algorithm improves the computational performance through three innovations over prior colorization implementations. First, the linear system is solved iteratively without explicitly constructing the sparse matrix, which significantly reduces the required memory. Second, we formulate each iteration in terms of integral images obtained by dynamic programming, reducing repetitive computation. Third, we use a coarseto- fine framework, where a lower resolution subsampled image is first colorized and this low resolution color image is upsampled to initialize the colorization process for the fine level. The improvements we develop provide significant speedup and memory savings compared to the conventional approach of solving the linear system directly using off-the-shelf sparse solvers, and allow us to colorize images with typical sizes encountered in realistic applications on typical commodity computing platforms.
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE PAGES

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

2017-09-21

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less

Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, Ajit; Khalil, Mohammad; Pettit, Chris

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics

NASA Astrophysics Data System (ADS)

Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.

2003-09-01

Multiconjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wave-front control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10-2 Hz, i.e., 4-5 orders of magnitude lower than the typical 103 Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
GPU-accelerated Modeling and Element-free Reverse-time Migration with Gauss Points Partition

NASA Astrophysics Data System (ADS)

Zhen, Z.; Jia, X.

2014-12-01

Element-free method (EFM) has been applied to seismic modeling and migration. Compared with finite element method (FEM) and finite difference method (FDM), it is much cheaper and more flexible because only the information of the nodes and the boundary of the study area are required in computation. In the EFM, the number of Gauss points should be consistent with the number of model nodes; otherwise the accuracy of the intermediate coefficient matrices would be harmed. Thus when we increase the nodes of velocity model in order to obtain higher resolution, we find that the size of the computer's memory will be a bottleneck. The original EFM can deal with at most 81×81 nodes in the case of 2G memory, as tested by Jia and Hu (2006). In order to solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition (GPP), and utilize the GPUs to improve the computation efficiency. Considering the characteristics of the Gaussian points, the GPP method doesn't influence the propagation of seismic wave in the velocity model. To overcome the time-consuming computation of the stiffness matrix (K) and the mass matrix (M), we also use the GPUs in our computation program. We employ the compressed sparse row (CSR) format to compress the intermediate sparse matrices and try to simplify the operations by solving the linear equations with the CULA Sparse's Conjugate Gradient (CG) solver instead of the linear sparse solver 'PARDISO'. It is observed that our strategy can significantly reduce the computational time of K and Mcompared with the algorithm based on CPU. The model tested is Marmousi model. The length of the model is 7425m and the depth is 2990m. We discretize the model with 595x298 nodes, 300x300 Gauss cells and 3x3 Gauss points in each cell. In contrast to the computational time of the conventional EFM, the GPUs-GPP approach can substantially improve the efficiency. The speedup ratio of time consumption of computing K, M is 120 and the speedup ratio time consumption of RTM is 11.5. At the same time, the accuracy of imaging is not harmed. Another advantage of the GPUs-GPP method is its easy applications in other numerical methods such as the FEM. Finally, in the GPUs-GPP method, the arrays require quite limited memory storage, which makes the method promising in dealing with large-scale 3D problems.
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
A new fast direct solver for the boundary element method

NASA Astrophysics Data System (ADS)

Huang, S.; Liu, Y. J.

2017-09-01

A new fast direct linear equation solver for the boundary element method (BEM) is presented in this paper. The idea of the new fast direct solver stems from the concept of the hierarchical off-diagonal low-rank matrix. The hierarchical off-diagonal low-rank matrix can be decomposed into the multiplication of several diagonal block matrices. The inverse of the hierarchical off-diagonal low-rank matrix can be calculated efficiently with the Sherman-Morrison-Woodbury formula. In this paper, a more general and efficient approach to approximate the coefficient matrix of the BEM with the hierarchical off-diagonal low-rank matrix is proposed. Compared to the current fast direct solver based on the hierarchical off-diagonal low-rank matrix, the proposed method is suitable for solving general 3-D boundary element models. Several numerical examples of 3-D potential problems with the total number of unknowns up to above 200,000 are presented. The results show that the new fast direct solver can be applied to solve large 3-D BEM models accurately and with better efficiency compared with the conventional BEM.
A Matlab-based finite-difference solver for the Poisson problem with mixed Dirichlet-Neumann boundary conditions

NASA Astrophysics Data System (ADS)

Reimer, Ashton S.; Cheviakov, Alexei F.

2013-03-01

A Matlab-based finite-difference numerical solver for the Poisson equation for a rectangle and a disk in two dimensions, and a spherical domain in three dimensions, is presented. The solver is optimized for handling an arbitrary combination of Dirichlet and Neumann boundary conditions, and allows for full user control of mesh refinement. The solver routines utilize effective and parallelized sparse vector and matrix operations. Computations exhibit high speeds, numerical stability with respect to mesh size and mesh refinement, and acceptable error values even on desktop computers. Catalogue identifier: AENQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AENQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License v3.0 No. of lines in distributed program, including test data, etc.: 102793 No. of bytes in distributed program, including test data, etc.: 369378 Distribution format: tar.gz Programming language: Matlab 2010a. Computer: PC, Macintosh. Operating system: Windows, OSX, Linux. RAM: 8 GB (8, 589, 934, 592 bytes) Classification: 4.3. Nature of problem: To solve the Poisson problem in a standard domain with “patchy surface”-type (strongly heterogeneous) Neumann/Dirichlet boundary conditions. Solution method: Finite difference with mesh refinement. Restrictions: Spherical domain in 3D; rectangular domain or a disk in 2D. Unusual features: Choice between mldivide/iterative solver for the solution of large system of linear algebraic equations that arise. Full user control of Neumann/Dirichlet boundary conditions and mesh refinement. Running time: Depending on the number of points taken and the geometry of the domain, the routine may take from less than a second to several hours to execute.
Efficient sparse matrix-matrix multiplication for computing periodic responses by shooting method on Intel Xeon Phi

NASA Astrophysics Data System (ADS)

Stoykov, S.; Atanassov, E.; Margenov, S.

2016-10-01

Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.
Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×

PubMed Central

Papalexakis, Evangelos E.; Faloutsos, Christos; Mitchell, Tom M.; Talukdar, Partha Pratim; Sidiropoulos, Nicholas D.; Murphy, Brian

2015-01-01

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ‘edible’, ‘fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we accelerate any CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce TURBO-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, by up to 200×, along with an up to 65 fold increase in sparsity, with comparable accuracy to the baseline. We apply TURBO-SMT to BRAINQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. TURBO-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. PMID:26473087
Implicit solvers for unstructured meshes

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Mavriplis, Dimitri J.

1991-01-01

Implicit methods were developed and tested for unstructured mesh computations. The approximate system which arises from the Newton linearization of the nonlinear evolution operator is solved by using the preconditioned GMRES (Generalized Minimum Residual) technique. Three different preconditioners were studied, namely, the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over relaxation (SSOR). The preconditioners were optimized to have good vectorization properties. SSOR and ILU were also studied as iterative schemes. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also studied. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Spotz, William F.

PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics.

PubMed

Gilles, Luc; Ellerbroek, Brent L; Vogel, Curtis R

2003-09-10

Multiconjugate adaptive optics (MCAO) systems with 10(4)-10(5) degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 10(4) actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10(-2) Hz, i.e., 4-5 orders of magnitude lower than the typical 10(3) Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
Approximate method of variational Bayesian matrix factorization/completion with sparse prior

NASA Astrophysics Data System (ADS)

Kawasumi, Ryota; Takeda, Koujin

2018-05-01

We derive the analytical expression of a matrix factorization/completion solution by the variational Bayes method, under the assumption that the observed matrix is originally the product of low-rank, dense and sparse matrices with additive noise. We assume the prior of a sparse matrix is a Laplace distribution by taking matrix sparsity into consideration. Then we use several approximations for the derivation of a matrix factorization/completion solution. By our solution, we also numerically evaluate the performance of a sparse matrix reconstruction in matrix factorization, and completion of a missing matrix element in matrix completion.
An Optimized Multicolor Point-Implicit Solver for Unstructured Grid Applications on Graphics Processing Units

NASA Technical Reports Server (NTRS)

Zubair, Mohammad; Nielsen, Eric; Luitjens, Justin; Hammond, Dana

2016-01-01

In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructuredgrid approach to accommodate geometric complexity. Implicit solution methodologies for such spatial discretizations generally require frequent solution of large tightly-coupled systems of block-sparse linear equations. The multicolor point-implicit solver used in the current work typically requires a significant fraction of the overall application run time. In this work, an efficient implementation of the solver for graphics processing units is proposed. Several factors present unique challenges to achieving an efficient implementation in this environment. These include the variable amount of parallelism available in different kernel calls, indirect memory access patterns, low arithmetic intensity, and the requirement to support variable block sizes. In this work, the solver is reformulated to use standard sparse and dense Basic Linear Algebra Subprograms (BLAS) functions. However, numerical experiments show that the performance of the BLAS functions available in existing CUDA libraries is suboptimal for matrices representative of those encountered in actual simulations. Instead, optimized versions of these functions are developed. Depending on block size, the new implementations show performance gains of up to 7x over the existing CUDA library functions.
Brief announcement: Hypergraph parititioning for parallel sparse matrix-matrix multiplication

DOE PAGES

Ballard, Grey; Druinsky, Alex; Knight, Nicholas; ...

2015-01-01

The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Furthermore, our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computationmore » to improve application-specific algorithms for multiplying sparse matrices.« less
Efficient Kriging Algorithms

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess

2011-01-01

More efficient versions of an interpolation method, called kriging, have been introduced in order to reduce its traditionally high computational cost. Written in C++, these approaches were tested on both synthetic and real data. Kriging is a best unbiased linear estimator and suitable for interpolation of scattered data points. Kriging has long been used in the geostatistic and mining communities, but is now being researched for use in the image fusion of remotely sensed data. This allows a combination of data from various locations to be used to fill in any missing data from any single location. To arrive at the faster algorithms, sparse SYMMLQ iterative solver, covariance tapering, Fast Multipole Methods (FMM), and nearest neighbor searching techniques were used. These implementations were used when the coefficient matrix in the linear system is symmetric, but not necessarily positive-definite.
History of the Nuclei Important for Cosmochemistry

NASA Technical Reports Server (NTRS)

Meyer, Bradley S.

2004-01-01

An essential aspect of studying the nuclei important for cosmochemistry is their production in stars. Over the grant period, we have further developed the Clemson/American University of Beirut stellar evolution code. Through use of a biconjugate-gradient matrix solver, we now routinely solve l0(exp 6) x l0(exp 6) sparse matrices on our desktop computers. This has allowed us to couple nucleosynthesis and convection fully in the 1-D star, which, in turn, provides better estimates of nuclear yields when the mixing and nuclear burning timescales are comparable. We also have incorporated radiation transport into our 1-D supernova explosion code. We used the stellar evolution and explosion codes to compute iron abundances in a 25 Solar mass star and compared the results to data from RIMS.
3-D minimum-structure inversion of magnetotelluric data using the finite-element method and tetrahedral grids

NASA Astrophysics Data System (ADS)

Jahandari, H.; Farquharson, C. G.

2017-11-01

Unstructured grids enable representing arbitrary structures more accurately and with fewer cells compared to regular structured grids. These grids also allow more efficient refinements compared to rectilinear meshes. In this study, tetrahedral grids are used for the inversion of magnetotelluric (MT) data, which allows for the direct inclusion of topography in the model, for constraining an inversion using a wireframe-based geological model and for local refinement at the observation stations. A minimum-structure method with an iterative model-space Gauss-Newton algorithm for optimization is used. An iterative solver is employed for solving the normal system of equations at each Gauss-Newton step and the sensitivity matrix-vector products that are required by this solver are calculated using pseudo-forward problems. This method alleviates the need to explicitly form the Hessian or Jacobian matrices which significantly reduces the required computation memory. Forward problems are formulated using an edge-based finite-element approach and a sparse direct solver is used for the solutions. This solver allows saving and re-using the factorization of matrices for similar pseudo-forward problems within a Gauss-Newton iteration which greatly minimizes the computation time. Two examples are presented to show the capability of the algorithm: the first example uses a benchmark model while the second example represents a realistic geological setting with topography and a sulphide deposit. The data that are inverted are the full-tensor impedance and the magnetic transfer function vector. The inversions sufficiently recovered the models and reproduced the data, which shows the effectiveness of unstructured grids for complex and realistic MT inversion scenarios. The first example is also used to demonstrate the computational efficiency of the presented model-space method by comparison with its data-space counterpart.
An iterative solver for the 3D Helmholtz equation

NASA Astrophysics Data System (ADS)

Belonosov, Mikhail; Dmitriev, Maxim; Kostin, Victor; Neklyudov, Dmitry; Tcheverda, Vladimir

2017-09-01

We develop a frequency-domain iterative solver for numerical simulation of acoustic waves in 3D heterogeneous media. It is based on the application of a unique preconditioner to the Helmholtz equation that ensures convergence for Krylov subspace iteration methods. Effective inversion of the preconditioner involves the Fast Fourier Transform (FFT) and numerical solution of a series of boundary value problems for ordinary differential equations. Matrix-by-vector multiplication for iterative inversion of the preconditioned matrix involves inversion of the preconditioner and pointwise multiplication of grid functions. Our solver has been verified by benchmarking against exact solutions and a time-domain solver.
Simulation of Aerosols and Chemistry with a Unified Global Model

NASA Technical Reports Server (NTRS)

Chin, Mian

2004-01-01

This project is to continue the development of the global simulation capabilities of tropospheric and stratospheric chemistry and aerosols in a unified global model. This is a part of our overall investigation of aerosol-chemistry-climate interaction. In the past year, we have enabled the tropospheric chemistry simulations based on the GEOS-CHEM model, and added stratospheric chemical reactions into the GEOS-CHEM such that a globally unified troposphere-stratosphere chemistry and transport can be simulated consistently without any simplifications. The tropospheric chemical mechanism in the GEOS-CHEM includes 80 species and 150 reactions. 24 tracers are transported, including O3, NOx, total nitrogen (NOy), H2O2, CO, and several types of hydrocarbon. The chemical solver used in the GEOS-CHEM model is a highly accurate sparse-matrix vectorized Gear solver (SMVGEAR). The stratospheric chemical mechanism includes an additional approximately 100 reactions and photolysis processes. Because of the large number of total chemical reactions and photolysis processes and very different photochemical regimes involved in the unified simulation, the model demands significant computer resources that are currently not practical. Therefore, several improvements will be taken, such as massive parallelization, code optimization, or selecting a faster solver. We have also continued aerosol simulation (including sulfate, dust, black carbon, organic carbon, and sea-salt) in the global model to cover most of year 2002. These results have been made available to many groups worldwide and accessible from the website http://code916.gsfc.nasa.gov/People/Chin/aot.html.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less

MODFLOW–USG version 1: An unstructured grid version of MODFLOW for simulating groundwater flow and tightly coupled processes using a control volume finite-difference formulation

USGS Publications Warehouse

Panday, Sorab; Langevin, Christian D.; Niswonger, Richard G.; Ibaraki, Motomu; Hughes, Joseph D.

2013-01-01

A new version of MODFLOW, called MODFLOW–USG (for UnStructured Grid), was developed to support a wide variety of structured and unstructured grid types, including nested grids and grids based on prismatic triangles, rectangles, hexagons, and other cell shapes. Flexibility in grid design can be used to focus resolution along rivers and around wells, for example, or to subdiscretize individual layers to better represent hydrostratigraphic units. MODFLOW–USG is based on an underlying control volume finite difference (CVFD) formulation in which a cell can be connected to an arbitrary number of adjacent cells. To improve accuracy of the CVFD formulation for irregular grid-cell geometries or nested grids, a generalized Ghost Node Correction (GNC) Package was developed, which uses interpolated heads in the flow calculation between adjacent connected cells. MODFLOW–USG includes a Groundwater Flow (GWF) Process, based on the GWF Process in MODFLOW–2005, as well as a new Connected Linear Network (CLN) Process to simulate the effects of multi-node wells, karst conduits, and tile drains, for example. The CLN Process is tightly coupled with the GWF Process in that the equations from both processes are formulated into one matrix equation and solved simultaneously. This robustness results from using an unstructured grid with unstructured matrix storage and solution schemes. MODFLOW–USG also contains an optional Newton-Raphson formulation, based on the formulation in MODFLOW–NWT, for improving solution convergence and avoiding problems with the drying and rewetting of cells. Because the existing MODFLOW solvers were developed for structured and symmetric matrices, they were replaced with a new Sparse Matrix Solver (SMS) Package developed specifically for MODFLOW–USG. The SMS Package provides several methods for resolving nonlinearities and multiple symmetric and asymmetric linear solution schemes to solve the matrix arising from the flow equations and the Newton-Raphson formulation, respectively.
Hybrid reconstruction of quantum density matrix: when low-rank meets sparsity

NASA Astrophysics Data System (ADS)

Li, Kezhi; Zheng, Kai; Yang, Jingbei; Cong, Shuang; Liu, Xiaomei; Li, Zhaokai

2017-12-01

Both the mathematical theory and experiments have verified that the quantum state tomography based on compressive sensing is an efficient framework for the reconstruction of quantum density states. In recent physical experiments, we found that many unknown density matrices in which people are interested in are low-rank as well as sparse. Bearing this information in mind, in this paper we propose a reconstruction algorithm that combines the low-rank and the sparsity property of density matrices and further theoretically prove that the solution of the optimization function can be, and only be, the true density matrix satisfying the model with overwhelming probability, as long as a necessary number of measurements are allowed. The solver leverages the fixed-point equation technique in which a step-by-step strategy is developed by utilizing an extended soft threshold operator that copes with complex values. Numerical experiments of the density matrix estimation for real nuclear magnetic resonance devices reveal that the proposed method achieves a better accuracy compared to some existing methods. We believe that the proposed method could be leveraged as a generalized approach and widely implemented in the quantum state estimation.
Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amestoy, Patrick R.; Duff, Iain S.; L'Excellent, Jean-Yves

2001-10-10

We examine the mechanics of the send and receive mechanism of MPI and in particular how we can implement message passing in a robust way so that our performance is not significantly affected by changes to the MPI system. This leads us to using the Isend/Irecv protocol which will entail sometimes significant algorithmic changes. We discuss this within the context of two different algorithms for sparse Gaussian elimination that we have parallelized. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. Both algorithms are difficult to parallelize on distributed memory machines. Our initial strategiesmore » were based on simple MPI point-to-point communication primitives. With such approaches, the parallel performance of both codes are very sensitive to the MPI implementation, the way MPI internal buffers are used in particular. We then modified our codes to use more sophisticated nonblocking versions of MPI communication. This significantly improved the performance robustness (independent of the MPI buffering mechanism) and scalability, but at the cost of increased code complexity.« less
Sparse Matrices in MATLAB: Design and Implementation

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Moler, Cleve; Schreiber, Robert

1992-01-01

The matrix computation language and environment MATLAB is extended to include sparse matrix storage and operations. The only change to the outward appearance of the MATLAB language is a pair of commands to create full or sparse matrices. Nearly all the operations of MATLAB now apply equally to full or sparse matrices, without any explicit action by the user. The sparse data structure represents a matrix in space proportional to the number of nonzero entries, and most of the operations compute sparse results in time proportional to the number of arithmetic operations on nonzeros.
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saad, Yousef

2014-01-16

The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.

2005-01-01

A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis.

PubMed

Kim, Hyunsoo; Park, Haesun

2007-06-15

Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. The software is available as supplementary material.
SPARSKIT: A basic tool kit for sparse matrix computations

NASA Technical Reports Server (NTRS)

Saad, Youcef

1990-01-01

Presented here are the main features of a tool package for manipulating and working with sparse matrices. One of the goals of the package is to provide basic tools to facilitate the exchange of software and data between researchers in sparse matrix computations. The starting point is the Harwell/Boeing collection of matrices for which the authors provide a number of tools. Among other things, the package provides programs for converting data structures, printing simple statistics on a matrix, plotting a matrix profile, and performing linear algebra operations with sparse matrices.
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pieper, Andreas; Kreutzer, Moritz; Alvermann, Andreas, E-mail: alvermann@physik.uni-greifswald.de

2016-11-15

We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need formore » matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10{sup 2} innermost eigenpairs of a topological insulator matrix with dimension 10{sup 9} derived from quantum physics applications.« less
Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

NASA Astrophysics Data System (ADS)

Galiatsatos, P. G.; Tennyson, J.

2012-11-01

The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Sparse Matrix Software Catalog, Sparse Matrix Symposium 1982, Fairfield Glade, Tennessee, October 24-27, 1982,

DTIC Science & Technology

1982-10-27

are buried within * a much larger, special purpose package. We regret such omissions, but to have reached the practi- tioners in each of the diverse...sparse matrix (form PAQ ) 4. Method of solution: Distribution count sort 5. Programming language: FORTRAN g Precision: Single and double precision 7
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Low-rank matrix decomposition and spatio-temporal sparse recovery for STAP radar

DOE PAGES

Sen, Satyabrata

2015-08-04

We develop space-time adaptive processing (STAP) methods by leveraging the advantages of sparse signal processing techniques in order to detect a slowly-moving target. We observe that the inherent sparse characteristics of a STAP problem can be formulated as the low-rankness of clutter covariance matrix when compared to the total adaptive degrees-of-freedom, and also as the sparse interference spectrum on the spatio-temporal domain. By exploiting these sparse properties, we propose two approaches for estimating the interference covariance matrix. In the first approach, we consider a constrained matrix rank minimization problem (RMP) to decompose the sample covariance matrix into a low-rank positivemore » semidefinite and a diagonal matrix. The solution of RMP is obtained by applying the trace minimization technique and the singular value decomposition with matrix shrinkage operator. Our second approach deals with the atomic norm minimization problem to recover the clutter response-vector that has a sparse support on the spatio-temporal plane. We use convex relaxation based standard sparse-recovery techniques to find the solutions. With extensive numerical examples, we demonstrate the performances of proposed STAP approaches with respect to both the ideal and practical scenarios, involving Doppler-ambiguous clutter ridges, spatial and temporal decorrelation effects. As a result, the low-rank matrix decomposition based solution requires secondary measurements as many as twice the clutter rank to attain a near-ideal STAP performance; whereas the spatio-temporal sparsity based approach needs a considerably small number of secondary data.« less
Sparse Matrix for ECG Identification with Two-Lead Features.

PubMed

Tseng, Kuo-Kun; Luo, Jiao; Hegarty, Robert; Wang, Wenmin; Haiting, Dong

2015-01-01

Electrocardiograph (ECG) human identification has the potential to improve biometric security. However, improvements in ECG identification and feature extraction are required. Previous work has focused on single lead ECG signals. Our work proposes a new algorithm for human identification by mapping two-lead ECG signals onto a two-dimensional matrix then employing a sparse matrix method to process the matrix. And that is the first application of sparse matrix techniques for ECG identification. Moreover, the results of our experiments demonstrate the benefits of our approach over existing methods.
High-Order Automatic Differentiation of Unmodified Linear Algebra Routines via Nilpotent Matrices

NASA Astrophysics Data System (ADS)

Dunham, Benjamin Z.

This work presents a new automatic differentiation method, Nilpotent Matrix Differentiation (NMD), capable of propagating any order of mixed or univariate derivative through common linear algebra functions--most notably third-party sparse solvers and decomposition routines, in addition to basic matrix arithmetic operations and power series--without changing data-type or modifying code line by line; this allows differentiation across sequences of arbitrarily many such functions with minimal implementation effort. NMD works by enlarging the matrices and vectors passed to the routines, replacing each original scalar with a matrix block augmented by derivative data; these blocks are constructed with special sparsity structures, termed "stencils," each designed to be isomorphic to a particular multidimensional hypercomplex algebra. The algebras are in turn designed such that Taylor expansions of hypercomplex function evaluations are finite in length and thus exactly track derivatives without approximation error. Although this use of the method in the "forward mode" is unique in its own right, it is also possible to apply it to existing implementations of the (first-order) discrete adjoint method to find high-order derivatives with lowered cost complexity; for example, for a problem with N inputs and an adjoint solver whose cost is independent of N--i.e., O(1)--the N x N Hessian can be found in O(N) time, which is comparable to existing second-order adjoint methods that require far more problem-specific implementation effort. Higher derivatives are likewise less expensive--e.g., a N x N x N rank-three tensor can be found in O(N2). Alternatively, a Hessian-vector product can be found in O(1) time, which may open up many matrix-based simulations to a range of existing optimization or surrogate modeling approaches. As a final corollary in parallel to the NMD-adjoint hybrid method, the existing complex-step differentiation (CD) technique is also shown to be capable of finding the Hessian-vector product. All variants are implemented on a stochastic diffusion problem and compared in-depth with various cost and accuracy metrics.
Sparse matrix multiplications for linear scaling electronic structure calculations in an atom-centered basis set using multiatom blocks.

PubMed

Saravanan, Chandra; Shao, Yihan; Baer, Roi; Ross, Philip N; Head-Gordon, Martin

2003-04-15

A sparse matrix multiplication scheme with multiatom blocks is reported, a tool that can be very useful for developing linear-scaling methods with atom-centered basis functions. Compared to conventional element-by-element sparse matrix multiplication schemes, efficiency is gained by the use of the highly optimized basic linear algebra subroutines (BLAS). However, some sparsity is lost in the multiatom blocking scheme because these matrix blocks will in general contain negligible elements. As a result, an optimal block size that minimizes the CPU time by balancing these two effects is recovered. In calculations on linear alkanes, polyglycines, estane polymers, and water clusters the optimal block size is found to be between 40 and 100 basis functions, where about 55-75% of the machine peak performance was achieved on an IBM RS6000 workstation. In these calculations, the blocked sparse matrix multiplications can be 10 times faster than a standard element-by-element sparse matrix package. Copyright 2003 Wiley Periodicals, Inc. J Comput Chem 24: 618-622, 2003
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Disentangling giant component and finite cluster contributions in sparse random matrix spectra.

PubMed

Kühn, Reimer

2016-04-01

We describe a method for disentangling giant component and finite cluster contributions to sparse random matrix spectra, using sparse symmetric random matrices defined on Erdős-Rényi graphs as an example and test bed. Our methods apply to sparse matrices defined in terms of arbitrary graphs in the configuration model class, as long as they have finite mean degree.
Crystallization of bFGF-DNA Aptamer Complexes Using a Sparse Matrix Designed for Protein-Nucleic Acid Complexes

NASA Technical Reports Server (NTRS)

Cannone, Jaime J.; Barnes, Cindy L.; Achari, Aniruddha; Kundrot, Craig E.; Whitaker, Ann F. (Technical Monitor)

2001-01-01

The Sparse Matrix approach for obtaining lead crystallization conditions has proven to be very fruitful for the crystallization of proteins and nucleic acids. Here we report a Sparse Matrix developed specifically for the crystallization of protein-DNA complexes. This method is rapid and economical, typically requiring 2.5 mg of complex to test 48 conditions. The method was originally developed to crystallize basic fibroblast growth factor (bFGF) complexed with DNA sequences identified through in vitro selection, or SELEX, methods. Two DNA aptamers that bind with approximately nanomolar affinity and inhibit the angiogenic properties of bFGF were selected for co-crystallization. The Sparse Matrix produced lead crystallization conditions for both bFGF-DNA complexes.
High-SNR spectrum measurement based on Hadamard encoding and sparse reconstruction

NASA Astrophysics Data System (ADS)

Wang, Zhaoxin; Yue, Jiang; Han, Jing; Li, Long; Jin, Yong; Gao, Yuan; Li, Baoming

2017-12-01

The denoising capabilities of the H-matrix and cyclic S-matrix based on the sparse reconstruction, employed in the Pixel of Focal Plane Coded Visible Spectrometer for spectrum measurement are investigated, where the spectrum is sparse in a known basis. In the measurement process, the digital micromirror device plays an important role, which implements the Hadamard coding. In contrast with Hadamard transform spectrometry, based on the shift invariability, this spectrometer may have the advantage of a high efficiency. Simulations and experiments show that the nonlinear solution with a sparse reconstruction has a better signal-to-noise ratio than the linear solution and the H-matrix outperforms the cyclic S-matrix whether the reconstruction method is nonlinear or linear.

Compressive Sensing with Cross-Validation and Stop-Sampling for Sparse Polynomial Chaos Expansions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huan, Xun; Safta, Cosmin; Sargsyan, Khachik

Compressive sensing is a powerful technique for recovering sparse solutions of underdetermined linear systems, which is often encountered in uncertainty quanti cation analysis of expensive and high-dimensional physical models. We perform numerical investigations employing several com- pressive sensing solvers that target the unconstrained LASSO formulation, with a focus on linear systems that arise in the construction of polynomial chaos expansions. With core solvers of l1 ls, SpaRSA, CGIST, FPC AS, and ADMM, we develop techniques to mitigate over tting through an automated selection of regularization constant based on cross-validation, and a heuristic strategy to guide the stop-sampling decision. Practical recommendationsmore » on parameter settings for these tech- niques are provided and discussed. The overall method is applied to a series of numerical examples of increasing complexity, including large eddy simulations of supersonic turbulent jet-in-cross flow involving a 24-dimensional input. Through empirical phase-transition diagrams and convergence plots, we illustrate sparse recovery performance under structures induced by polynomial chaos, accuracy and computational tradeoffs between polynomial bases of different degrees, and practi- cability of conducting compressive sensing for a realistic, high-dimensional physical application. Across test cases studied in this paper, we find ADMM to have demonstrated empirical advantages through consistent lower errors and faster computational times.« less
Computing row and column counts for sparse QR and LU factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilbert, John R.; Li, Xiaoye S.; Ng, Esmond G.

2001-01-01

We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in the number of nonzeros in that matrix. They may be used to set up data structures or schedule parallel operations in advance of the numerical factorization. The row and column counts we compute are upper bounds on the actual counts. If the input matrix is strong Hall and theremore » is no coincidental numerical cancellation, the counts are exact for QR factorization and are the tightest bounds possible for LU factorization. These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.« less
Efficient convolutional sparse coding

DOEpatents

Wohlberg, Brendt

2017-06-20

Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
Sparse nonnegative matrix factorization with ℓ0-constraints

PubMed Central

Peharz, Robert; Pernkopf, Franz

2012-01-01

Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the ℓ1-norm of the factor matrices. On the other hand, little work has been done using a more natural sparseness measure, the ℓ0-pseudo-norm. In this paper, we propose a framework for approximate NMF which constrains the ℓ0-norm of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the alternating nonnegative least-squares scheme. In experiments we demonstrate the benefits of our methods, which compare to, or outperform existing approaches. PMID:22505792
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

NASA Technical Reports Server (NTRS)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
A matrix-free implicit unstructured multigrid finite volume method for simulating structural dynamics and fluid structure interaction

NASA Astrophysics Data System (ADS)

Lv, X.; Zhao, Y.; Huang, X. Y.; Xia, G. H.; Su, X. H.

2007-07-01

A new three-dimensional (3D) matrix-free implicit unstructured multigrid finite volume (FV) solver for structural dynamics is presented in this paper. The solver is first validated using classical 2D and 3D cantilever problems. It is shown that very accurate predictions of the fundamental natural frequencies of the problems can be obtained by the solver with fast convergence rates. This method has been integrated into our existing FV compressible solver [X. Lv, Y. Zhao, et al., An efficient parallel/unstructured-multigrid preconditioned implicit method for simulating 3d unsteady compressible flows with moving objects, Journal of Computational Physics 215(2) (2006) 661-690] based on the immersed membrane method (IMM) [X. Lv, Y. Zhao, et al., as mentioned above]. Results for the interaction between the fluid and an immersed fixed-free cantilever are also presented to demonstrate the potential of this integrated fluid-structure interaction approach.
Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1989-01-01

The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.
The solution of linear systems of equations with a structural analysis code on the NAS CRAY-2

NASA Technical Reports Server (NTRS)

Poole, Eugene L.; Overman, Andrea L.

1988-01-01

Two methods for solving linear systems of equations on the NAS Cray-2 are described. One is a direct method; the other is an iterative method. Both methods exploit the architecture of the Cray-2, particularly the vectorization, and are aimed at structural analysis applications. To demonstrate and evaluate the methods, they were installed in a finite element structural analysis code denoted the Computational Structural Mechanics (CSM) Testbed. A description of the techniques used to integrate the two solvers into the Testbed is given. Storage schemes, memory requirements, operation counts, and reformatting procedures are discussed. Finally, results from the new methods are compared with results from the initial Testbed sparse Choleski equation solver for three structural analysis problems. The new direct solvers described achieve the highest computational rates of the methods compared. The new iterative methods are not able to achieve as high computation rates as the vectorized direct solvers but are best for well conditioned problems which require fewer iterations to converge to the solution.
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction.

PubMed

Yang, C L; Wei, H Y; Adler, A; Soleimani, M

2013-06-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
Systematic sparse matrix error control for linear scaling electronic structure calculations.

PubMed

Rubensson, Emanuel H; Sałek, Paweł

2005-11-30

Efficient truncation criteria used in multiatom blocked sparse matrix operations for ab initio calculations are proposed. As system size increases, so does the need to stay on top of errors and still achieve high performance. A variant of a blocked sparse matrix algebra to achieve strict error control with good performance is proposed. The presented idea is that the condition to drop a certain submatrix should depend not only on the magnitude of that particular submatrix, but also on which other submatrices that are dropped. The decision to remove a certain submatrix is based on the contribution the removal would cause to the error in the chosen norm. We study the effect of an accumulated truncation error in iterative algorithms like trace correcting density matrix purification. One way to reduce the initial exponential growth of this error is presented. The presented error control for a sparse blocked matrix toolbox allows for achieving optimal performance by performing only necessary operations needed to maintain the requested level of accuracy. Copyright 2005 Wiley Periodicals, Inc.
Method and apparatus for optimized processing of sparse matrices

DOEpatents

Taylor, Valerie E.

1993-01-01

A computer architecture for processing a sparse matrix is disclosed. The apparatus stores a value-row vector corresponding to nonzero values of a sparse matrix. Each of the nonzero values is located at a defined row and column position in the matrix. The value-row vector includes a first vector including nonzero values and delimiting characters indicating a transition from one column to another. The value-row vector also includes a second vector which defines row position values in the matrix corresponding to the nonzero values in the first vector and column position values in the matrix corresponding to the column position of the nonzero values in the first vector. The architecture also includes a circuit for detecting a special character within the value-row vector. Matrix-vector multiplication is executed on the value-row vector. This multiplication is performed by multiplying an index value of the first vector value by a column value from a second matrix to form a matrix-vector product which is added to a previous matrix-vector product.
Time-domain finite elements in optimal control with application to launch-vehicle guidance. PhD. Thesis

NASA Technical Reports Server (NTRS)

Bless, Robert R.

1991-01-01

A time-domain finite element method is developed for optimal control problems. The theory derived is general enough to handle a large class of problems including optimal control problems that are continuous in the states and controls, problems with discontinuities in the states and/or system equations, problems with control inequality constraints, problems with state inequality constraints, or problems involving any combination of the above. The theory is developed in such a way that no numerical quadrature is necessary regardless of the degree of nonlinearity in the equations. Also, the same shape functions may be employed for every problem because all strong boundary conditions are transformed into natural or weak boundary conditions. In addition, the resulting nonlinear algebraic equations are very sparse. Use of sparse matrix solvers allows for the rapid and accurate solution of very difficult optimization problems. The formulation is applied to launch-vehicle trajectory optimization problems, and results show that real-time optimal guidance is realizable with this method. Finally, a general problem solving environment is created for solving a large class of optimal control problems. The algorithm uses both FORTRAN and a symbolic computation program to solve problems with a minimum of user interaction. The use of symbolic computation eliminates the need for user-written subroutines which greatly reduces the setup time for solving problems.
Simulations of the Microcirculation in the Human Conjunctiva

NASA Astrophysics Data System (ADS)

Dow, William; Jacobitz, Frank; Chen, Peter

2012-11-01

The microcirculation in the conjunctiva of a healthy human subject is analyzed using a simulation approach. A comparison between healthy and diseased states may lead to early diagnosis for a variety of vascular related disorders. Previous work suggests that hypertension, arteriosclerosis, and diabetes mellitus have noticeable very early changes in the microvasculature (Davis and Landau, 1957; Ditzel, 1968; Kunitomo, 1974) and the vessels of the conjunctiva are specifically useful for this research because they can be studied non-invasively. The microcirculation in the conjunctiva has been documented over the course of disease treatments, providing both still images and video footage for information on vessel length, diameter, and connectivity as well as the direction of blood flow. The numerical method is based on a Hagen-Poiseuille balance in the microvessels and a sparse matrix solver is used to obtain the solution. The simulations use realistic vessel topology for the microvasculature, reconstructed from microscope images of tissue samples, and consider blood rheology as well as passive and active vessel properties.
Learning Low-Rank Class-Specific Dictionary and Sparse Intra-Class Variant Dictionary for Face Recognition.

PubMed

Tang, Xin; Feng, Guo-Can; Li, Xiao-Xin; Cai, Jia-Xin

2015-01-01

Face recognition is challenging especially when the images from different persons are similar to each other due to variations in illumination, expression, and occlusion. If we have sufficient training images of each person which can span the facial variations of that person under testing conditions, sparse representation based classification (SRC) achieves very promising results. However, in many applications, face recognition often encounters the small sample size problem arising from the small number of available training images for each person. In this paper, we present a novel face recognition framework by utilizing low-rank and sparse error matrix decomposition, and sparse coding techniques (LRSE+SC). Firstly, the low-rank matrix recovery technique is applied to decompose the face images per class into a low-rank matrix and a sparse error matrix. The low-rank matrix of each individual is a class-specific dictionary and it captures the discriminative feature of this individual. The sparse error matrix represents the intra-class variations, such as illumination, expression changes. Secondly, we combine the low-rank part (representative basis) of each person into a supervised dictionary and integrate all the sparse error matrix of each individual into a within-individual variant dictionary which can be applied to represent the possible variations between the testing and training images. Then these two dictionaries are used to code the query image. The within-individual variant dictionary can be shared by all the subjects and only contribute to explain the lighting conditions, expressions, and occlusions of the query image rather than discrimination. At last, a reconstruction-based scheme is adopted for face recognition. Since the within-individual dictionary is introduced, LRSE+SC can handle the problem of the corrupted training data and the situation that not all subjects have enough samples for training. Experimental results show that our method achieves the state-of-the-art results on AR, FERET, FRGC and LFW databases.
Learning Low-Rank Class-Specific Dictionary and Sparse Intra-Class Variant Dictionary for Face Recognition

PubMed Central

Tang, Xin; Feng, Guo-can; Li, Xiao-xin; Cai, Jia-xin

2015-01-01

Face recognition is challenging especially when the images from different persons are similar to each other due to variations in illumination, expression, and occlusion. If we have sufficient training images of each person which can span the facial variations of that person under testing conditions, sparse representation based classification (SRC) achieves very promising results. However, in many applications, face recognition often encounters the small sample size problem arising from the small number of available training images for each person. In this paper, we present a novel face recognition framework by utilizing low-rank and sparse error matrix decomposition, and sparse coding techniques (LRSE+SC). Firstly, the low-rank matrix recovery technique is applied to decompose the face images per class into a low-rank matrix and a sparse error matrix. The low-rank matrix of each individual is a class-specific dictionary and it captures the discriminative feature of this individual. The sparse error matrix represents the intra-class variations, such as illumination, expression changes. Secondly, we combine the low-rank part (representative basis) of each person into a supervised dictionary and integrate all the sparse error matrix of each individual into a within-individual variant dictionary which can be applied to represent the possible variations between the testing and training images. Then these two dictionaries are used to code the query image. The within-individual variant dictionary can be shared by all the subjects and only contribute to explain the lighting conditions, expressions, and occlusions of the query image rather than discrimination. At last, a reconstruction-based scheme is adopted for face recognition. Since the within-individual dictionary is introduced, LRSE+SC can handle the problem of the corrupted training data and the situation that not all subjects have enough samples for training. Experimental results show that our method achieves the state-of-the-art results on AR, FERET, FRGC and LFW databases. PMID:26571112
Revisiting Parallel Cyclic Reduction and Parallel Prefix-Based Algorithms for Block Tridiagonal System of Equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul

2013-01-01

Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less
A sparse matrix-vector multiplication based algorithm for accurate density matrix computations on systems of millions of atoms

NASA Astrophysics Data System (ADS)

Ghale, Purnima; Johnson, Harley T.

2018-06-01

We present an efficient sparse matrix-vector (SpMV) based method to compute the density matrix P from a given Hamiltonian in electronic structure computations. Our method is a hybrid approach based on Chebyshev-Jackson approximation theory and matrix purification methods like the second order spectral projection purification (SP2). Recent methods to compute the density matrix scale as O(N) in the number of floating point operations but are accompanied by large memory and communication overhead, and they are based on iterative use of the sparse matrix-matrix multiplication kernel (SpGEMM), which is known to be computationally irregular. In addition to irregularity in the sparse Hamiltonian H, the nonzero structure of intermediate estimates of P depends on products of H and evolves over the course of computation. On the other hand, an expansion of the density matrix P in terms of Chebyshev polynomials is straightforward and SpMV based; however, the resulting density matrix may not satisfy the required constraints exactly. In this paper, we analyze the strengths and weaknesses of the Chebyshev-Jackson polynomials and the second order spectral projection purification (SP2) method, and propose to combine them so that the accurate density matrix can be computed using the SpMV computational kernel only, and without having to store the density matrix P. Our method accomplishes these objectives by using the Chebyshev polynomial estimate as the initial guess for SP2, which is followed by using sparse matrix-vector multiplications (SpMVs) to replicate the behavior of the SP2 algorithm for purification. We demonstrate the method on a tight-binding model system of an oxide material containing more than 3 million atoms. In addition, we also present the predicted behavior of our method when applied to near-metallic Hamiltonians with a wide energy spectrum.
Anisotropic resonator analysis using the Fourier-Bessel mode solver

NASA Astrophysics Data System (ADS)

Gauthier, Robert C.

2018-03-01

A numerical mode solver for optical structures that conform to cylindrical symmetry using Faraday's and Ampere's laws as starting expressions is developed when electric or magnetic anisotropy is present. The technique builds on the existing Fourier-Bessel mode solver which allows resonator states to be computed exploiting the symmetry properties of the resonator and states to reduce the matrix system. The introduction of anisotropy into the theoretical frame work facilitates the inclusion of PML borders permitting the computation of open ended structures and a better estimation of the resonator state quality factor. Matrix populating expressions are provided that can accommodate any material anisotropy with arbitrary orientation in the computation domain. Several example of electrical anisotropic computations are provided for rationally symmetric structures such as standard optical fibers, axial Bragg-ring fibers and bottle resonators. The anisotropy present in the materials introduces off diagonal matrix elements in the permittivity tensor when expressed in cylindrical coordinates. The effects of the anisotropy of computed states are presented and discussed.
Pushing Memory Bandwidth Limitations Through Efficient Implementations of Block-Krylov Space Solvers on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro

Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
A multigrid solver for the semiconductor equations

NASA Technical Reports Server (NTRS)

Bachmann, Bernhard

1993-01-01

We present a multigrid solver for the exponential fitting method. The solver is applied to the current continuity equations of semiconductor device simulation in two dimensions. The exponential fitting method is based on a mixed finite element discretization using the lowest-order Raviart-Thomas triangular element. This discretization method yields a good approximation of front layers and guarantees current conservation. The corresponding stiffness matrix is an M-matrix. 'Standard' multigrid solvers, however, cannot be applied to the resulting system, as this is dominated by an unsymmetric part, which is due to the presence of strong convection in part of the domain. To overcome this difficulty, we explore the connection between Raviart-Thomas mixed methods and the nonconforming Crouzeix-Raviart finite element discretization. In this way we can construct nonstandard prolongation and restriction operators using easily computable weighted L(exp 2)-projections based on suitable quadrature rules and the upwind effects of the discretization. The resulting multigrid algorithm shows very good results, even for real-world problems and for locally refined grids.

Sparse subspace clustering for data with missing entries and high-rank matrix completion.

PubMed

Fan, Jicong; Chow, Tommy W S

2017-09-01

Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Newton-based optimization for Kullback-Leibler nonnegative tensor factorizations

DOE PAGES

Plantenga, Todd; Kolda, Tamara G.; Hansen, Samantha

2015-04-30

Tensor factorizations with nonnegativity constraints have found application in analysing data from cyber traffic, social networks, and other areas. We consider application data best described as being generated by a Poisson process (e.g. count data), which leads to sparse tensors that can be modelled by sparse factor matrices. In this paper, we investigate efficient techniques for computing an appropriate canonical polyadic tensor factorization based on the Kullback–Leibler divergence function. We propose novel subproblem solvers within the standard alternating block variable approach. Our new methods exploit structure and reformulate the optimization problem as small independent subproblems. We employ bound-constrained Newton andmore » quasi-Newton methods. Finally, we compare our algorithms against other codes, demonstrating superior speed for high accuracy results and the ability to quickly find sparse solutions.« less
FPGA implementation of sparse matrix algorithm for information retrieval

NASA Astrophysics Data System (ADS)

Bojanic, Slobodan; Jevtic, Ruzica; Nieto-Taladriz, Octavio

2005-06-01

Information text data retrieval requires a tremendous amount of processing time because of the size of the data and the complexity of information retrieval algorithms. In this paper the solution to this problem is proposed via hardware supported information retrieval algorithms. Reconfigurable computing may adopt frequent hardware modifications through its tailorable hardware and exploits parallelism for a given application through reconfigurable and flexible hardware units. The degree of the parallelism can be tuned for data. In this work we implemented standard BLAS (basic linear algebra subprogram) sparse matrix algorithm named Compressed Sparse Row (CSR) that is showed to be more efficient in terms of storage space requirement and query-processing timing over the other sparse matrix algorithms for information retrieval application. Although inverted index algorithm is treated as the de facto standard for information retrieval for years, an alternative approach to store the index of text collection in a sparse matrix structure gains more attention. This approach performs query processing using sparse matrix-vector multiplication and due to parallelization achieves a substantial efficiency over the sequential inverted index. The parallel implementations of information retrieval kernel are presented in this work targeting the Virtex II Field Programmable Gate Arrays (FPGAs) board from Xilinx. A recent development in scientific applications is the use of FPGA to achieve high performance results. Computational results are compared to implementations on other platforms. The design achieves a high level of parallelism for the overall function while retaining highly optimised hardware within processing unit.
Uniform Recovery Bounds for Structured Random Matrices in Corrupted Compressed Sensing

NASA Astrophysics Data System (ADS)

Zhang, Peng; Gan, Lu; Ling, Cong; Sun, Sumei

2018-04-01

We study the problem of recovering an $s$-sparse signal $\\mathbf{x}^{\\star}\\in\\mathbb{C}^n$ from corrupted measurements $\\mathbf{y} = \\mathbf{A}\\mathbf{x}^{\\star}+\\mathbf{z}^{\\star}+\\mathbf{w}$, where $\\mathbf{z}^{\\star}\\in\\mathbb{C}^m$ is a $k$-sparse corruption vector whose nonzero entries may be arbitrarily large and $\\mathbf{w}\\in\\mathbb{C}^m$ is a dense noise with bounded energy. The aim is to exactly and stably recover the sparse signal with tractable optimization programs. In this paper, we prove the uniform recovery guarantee of this problem for two classes of structured sensing matrices. The first class can be expressed as the product of a unit-norm tight frame (UTF), a random diagonal matrix and a bounded columnwise orthonormal matrix (e.g., partial random circulant matrix). When the UTF is bounded (i.e. $\\mu(\\mathbf{U})\\sim1/\\sqrt{m}$), we prove that with high probability, one can recover an $s$-sparse signal exactly and stably by $l_1$ minimization programs even if the measurements are corrupted by a sparse vector, provided $m = \\mathcal{O}(s \\log^2 s \\log^2 n)$ and the sparsity level $k$ of the corruption is a constant fraction of the total number of measurements. The second class considers randomly sub-sampled orthogonal matrix (e.g., random Fourier matrix). We prove the uniform recovery guarantee provided that the corruption is sparse on certain sparsifying domain. Numerous simulation results are also presented to verify and complement the theoretical results.
Matched field localization based on CS-MUSIC algorithm

NASA Astrophysics Data System (ADS)

Guo, Shuangle; Tang, Ruichun; Peng, Linhui; Ji, Xiaopeng

2016-04-01

The problem caused by shortness or excessiveness of snapshots and by coherent sources in underwater acoustic positioning is considered. A matched field localization algorithm based on CS-MUSIC (Compressive Sensing Multiple Signal Classification) is proposed based on the sparse mathematical model of the underwater positioning. The signal matrix is calculated through the SVD (Singular Value Decomposition) of the observation matrix. The observation matrix in the sparse mathematical model is replaced by the signal matrix, and a new concise sparse mathematical model is obtained, which means not only the scale of the localization problem but also the noise level is reduced; then the new sparse mathematical model is solved by the CS-MUSIC algorithm which is a combination of CS (Compressive Sensing) method and MUSIC (Multiple Signal Classification) method. The algorithm proposed in this paper can overcome effectively the difficulties caused by correlated sources and shortness of snapshots, and it can also reduce the time complexity and noise level of the localization problem by using the SVD of the observation matrix when the number of snapshots is large, which will be proved in this paper.
MODFLOW-NWT, A Newton formulation for MODFLOW-2005

USGS Publications Warehouse

Niswonger, Richard G.; Panday, Sorab; Ibaraki, Motomu

2011-01-01

This report documents a Newton formulation of MODFLOW-2005, called MODFLOW-NWT. MODFLOW-NWT is a standalone program that is intended for solving problems involving drying and rewetting nonlinearities of the unconfined groundwater-flow equation. MODFLOW-NWT must be used with the Upstream-Weighting (UPW) Package for calculating intercell conductances in a different manner than is done in the Block-Centered Flow (BCF), Layer Property Flow (LPF), or Hydrogeologic-Unit Flow (HUF; Anderman and Hill, 2000) Packages. The UPW Package treats nonlinearities of cell drying and rewetting by use of a continuous function of groundwater head, rather than the discrete approach of drying and rewetting that is used by the BCF, LPF, and HUF Packages. This further enables application of the Newton formulation for unconfined groundwater-flow problems because conductance derivatives required by the Newton method are smooth over the full range of head for a model cell. The NWT linearization approach generates an asymmetric matrix, which is different from the standard MODFLOW formulation that generates a symmetric matrix. Because all linear solvers presently available for use with MODFLOW-2005 solve only symmetric matrices, MODFLOW-NWT includes two previously developed asymmetric matrix-solver options. The matrix-solver options include a generalized-minimum-residual (GMRES) Solver and an Orthomin / stabilized conjugate-gradient (CGSTAB) Solver. The GMRES Solver is documented in a previously published report, such that only a brief description and input instructions are provided in this report. However, the CGSTAB Solver (called XMD) is documented in this report. Flow-property input for the UPW Package is designed based on the LPF Package and material-property input is identical to that for the LPF Package except that the rewetting and vertical-conductance correction options of the LPF Package are not available with the UPW Package. Input files constructed for the LPF Package can be used with slight modification as input for the UPW Package. This report presents the theory and methods used by MODFLOW-NWT, including the UPW Package. Additionally, this report provides comparisons of the new methodology to analytical solutions of groundwater flow and to standard MODFLOW-2005 results by use of an unconfined aquifer MODFLOW example problem. The standard MODFLOW-2005 simulation uses the LPF Package with the wet/dry option active. A new example problem also is presented to demonstrate MODFLOW-NWT's ability to provide a solution for a difficult unconfined groundwater-flow problem.
1-norm support vector novelty detection and its sparseness.

PubMed

Zhang, Li; Zhou, WeiDa

2013-12-01

This paper proposes a 1-norm support vector novelty detection (SVND) method and discusses its sparseness. 1-norm SVND is formulated as a linear programming problem and uses two techniques for inducing sparseness, or the 1-norm regularization and the hinge loss function. We also find two upper bounds on the sparseness of 1-norm SVND, or exact support vector (ESV) and kernel Gram matrix rank bounds. The ESV bound indicates that 1-norm SVND has a sparser representation model than SVND. The kernel Gram matrix rank bound can loosely estimate the sparseness of 1-norm SVND. Experimental results show that 1-norm SVND is feasible and effective. Copyright © 2013 Elsevier Ltd. All rights reserved.
Using the Intel Math Kernel Library on Peregrine | High-Performance

Science.gov Websites

Computing | NREL the Intel Math Kernel Library on Peregrine Using the Intel Math Kernel Library on Peregrine Learn how to use the Intel Math Kernel Library (MKL) with Peregrine system software. MKL architectures. Core math functions in MKL include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures

DTIC Science & Technology

2008-10-01

42 3.4 Residual history of WSO banded preconditioner for problem 2D 54019 HIGHK . . . . . . . . . . . . . . . . . . . . . . . . . . 43...3.5 Residual history of WSO banded preconditioner for problem Appu 43 3.6 Residual history of WSO banded preconditioner for problem ASIC 680k...44 3.7 Residual history of WSO banded preconditioner for problem BUN- DLE1
Communication Optimal Parallel Multiplication of Sparse Random Matrices

DTIC Science & Technology

2013-02-21

Definition 2.1), and (2) the algorithm is sparsity- independent, where the computation is statically partitioned to processors independent of the sparsity...struc- ture of the input matrices (see Definition 2.5). The second assumption applies to nearly all existing al- gorithms for general sparse matrix-matrix...where A and B are n× n ER(d) matrices: Definition 2.1 An ER(d) matrix is an adjacency matrix of an Erdős-Rényi graph with parameters n and d/n. That
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chow, Edmond

Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.
Noniterative MAP reconstruction using sparse matrix representations.

PubMed

Cao, Guangzhi; Bouman, Charles A; Webb, Kevin J

2009-09-01

We present a method for noniterative maximum a posteriori (MAP) tomographic reconstruction which is based on the use of sparse matrix representations. Our approach is to precompute and store the inverse matrix required for MAP reconstruction. This approach has generally not been used in the past because the inverse matrix is typically large and fully populated (i.e., not sparse). In order to overcome this problem, we introduce two new ideas. The first idea is a novel theory for the lossy source coding of matrix transformations which we refer to as matrix source coding. This theory is based on a distortion metric that reflects the distortions produced in the final matrix-vector product, rather than the distortions in the coded matrix itself. The resulting algorithms are shown to require orthonormal transformations of both the measurement data and the matrix rows and columns before quantization and coding. The second idea is a method for efficiently storing and computing the required orthonormal transformations, which we call a sparse-matrix transform (SMT). The SMT is a generalization of the classical FFT in that it uses butterflies to compute an orthonormal transform; but unlike an FFT, the SMT uses the butterflies in an irregular pattern, and is numerically designed to best approximate the desired transforms. We demonstrate the potential of the noniterative MAP reconstruction with examples from optical tomography. The method requires offline computation to encode the inverse transform. However, once these offline computations are completed, the noniterative MAP algorithm is shown to reduce both storage and computation by well over two orders of magnitude, as compared to a linear iterative reconstruction methods.
Steady potential solver for unsteady aerodynamic analyses

NASA Technical Reports Server (NTRS)

Hoyniak, Dan

1994-01-01

Development of a steady flow solver for use with LINFLO was the objective of this report. The solver must be compatible with LINFLO, be composed of composite mesh, and have transonic capability. The approaches used were: (1) steady flow potential equations written in nonconservative form; (2) Newton's Method; (3) implicit, least-squares, interpolation method to obtain finite difference equations; and (4) matrix inversion routines from LINFLO. This report was given during the NASA LeRC Workshop on Forced Response in Turbomachinery in August of 1993.
Multi scales based sparse matrix spectral clustering image segmentation

NASA Astrophysics Data System (ADS)

Liu, Zhongmin; Chen, Zhicai; Li, Zhanming; Hu, Wenjin

2018-04-01

In image segmentation, spectral clustering algorithms have to adopt the appropriate scaling parameter to calculate the similarity matrix between the pixels, which may have a great impact on the clustering result. Moreover, when the number of data instance is large, computational complexity and memory use of the algorithm will greatly increase. To solve these two problems, we proposed a new spectral clustering image segmentation algorithm based on multi scales and sparse matrix. We devised a new feature extraction method at first, then extracted the features of image on different scales, at last, using the feature information to construct sparse similarity matrix which can improve the operation efficiency. Compared with traditional spectral clustering algorithm, image segmentation experimental results show our algorithm have better degree of accuracy and robustness.
Optimal sparse approximation with integrate and fire neurons.

PubMed

Shapero, Samuel; Zhu, Mengchen; Hasler, Jennifer; Rozell, Christopher

2014-08-01

Sparse approximation is a hypothesized coding strategy where a population of sensory neurons (e.g. V1) encodes a stimulus using as few active neurons as possible. We present the Spiking LCA (locally competitive algorithm), a rate encoded Spiking Neural Network (SNN) of integrate and fire neurons that calculate sparse approximations. The Spiking LCA is designed to be equivalent to the nonspiking LCA, an analog dynamical system that converges on a ℓ(1)-norm sparse approximations exponentially. We show that the firing rate of the Spiking LCA converges on the same solution as the analog LCA, with an error inversely proportional to the sampling time. We simulate in NEURON a network of 128 neuron pairs that encode 8 × 8 pixel image patches, demonstrating that the network converges to nearly optimal encodings within 20 ms of biological time. We also show that when using more biophysically realistic parameters in the neurons, the gain function encourages additional ℓ(0)-norm sparsity in the encoding, relative both to ideal neurons and digital solvers.
On the use of finite difference matrix-vector products in Newton-Krylov solvers for implicit climate dynamics with spectral elements

DOE PAGES

Woodward, Carol S.; Gardner, David J.; Evans, Katherine J.

2015-01-01

Efficient solutions of global climate models require effectively handling disparate length and time scales. Implicit solution approaches allow time integration of the physical system with a step size governed by accuracy of the processes of interest rather than by stability of the fastest time scales present. Implicit approaches, however, require the solution of nonlinear systems within each time step. Usually, a Newton's method is applied to solve these systems. Each iteration of the Newton's method, in turn, requires the solution of a linear model of the nonlinear system. This model employs the Jacobian of the problem-defining nonlinear residual, but thismore » Jacobian can be costly to form. If a Krylov linear solver is used for the solution of the linear system, the action of the Jacobian matrix on a given vector is required. In the case of spectral element methods, the Jacobian is not calculated but only implemented through matrix-vector products. The matrix-vector multiply can also be approximated by a finite difference approximation which may introduce inaccuracy in the overall nonlinear solver. In this paper, we review the advantages and disadvantages of finite difference approximations of these matrix-vector products for climate dynamics within the spectral element shallow water dynamical core of the Community Atmosphere Model.« less
Use of general purpose graphics processing units with MODFLOW

USGS Publications Warehouse

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Theory and implementation of H-matrix based iterative and direct solvers for Helmholtz and elastodynamic oscillatory kernels

NASA Astrophysics Data System (ADS)

Chaillat, Stéphanie; Desiderio, Luca; Ciarlet, Patrick

2017-12-01

In this work, we study the accuracy and efficiency of hierarchical matrix (H-matrix) based fast methods for solving dense linear systems arising from the discretization of the 3D elastodynamic Green's tensors. It is well known in the literature that standard H-matrix based methods, although very efficient tools for asymptotically smooth kernels, are not optimal for oscillatory kernels. H2-matrix and directional approaches have been proposed to overcome this problem. However the implementation of such methods is much more involved than the standard H-matrix representation. The central questions we address are twofold. (i) What is the frequency-range in which the H-matrix format is an efficient representation for 3D elastodynamic problems? (ii) What can be expected of such an approach to model problems in mechanical engineering? We show that even though the method is not optimal (in the sense that more involved representations can lead to faster algorithms) an efficient solver can be easily developed. The capabilities of the method are illustrated on numerical examples using the Boundary Element Method.
Solving large sparse eigenvalue problems on supercomputers

NASA Technical Reports Server (NTRS)

Philippe, Bernard; Saad, Youcef

1988-01-01

An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
Sparse matrix methods based on orthogonality and conjugacy

NASA Technical Reports Server (NTRS)

Lawson, C. L.

1973-01-01

A matrix having a high percentage of zero elements is called spares. In the solution of systems of linear equations or linear least squares problems involving large sparse matrices, significant saving of computer cost can be achieved by taking advantage of the sparsity. The conjugate gradient algorithm and a set of related algorithms are described.

Improved Success of Sparse Matrix Protein Crystallization Screening with Heterogeneous Nucleating Agents

PubMed Central

Thakur, Anil S.; Robin, Gautier; Guncar, Gregor; Saunders, Neil F. W.; Newman, Janet; Martin, Jennifer L.; Kobe, Bostjan

2007-01-01

Background Crystallization is a major bottleneck in the process of macromolecular structure determination by X-ray crystallography. Successful crystallization requires the formation of nuclei and their subsequent growth to crystals of suitable size. Crystal growth generally occurs spontaneously in a supersaturated solution as a result of homogenous nucleation. However, in a typical sparse matrix screening experiment, precipitant and protein concentration are not sampled extensively, and supersaturation conditions suitable for nucleation are often missed. Methodology/Principal Findings We tested the effect of nine potential heterogenous nucleating agents on crystallization of ten test proteins in a sparse matrix screen. Several nucleating agents induced crystal formation under conditions where no crystallization occurred in the absence of the nucleating agent. Four nucleating agents: dried seaweed; horse hair; cellulose and hydroxyapatite, had a considerable overall positive effect on crystallization success. This effect was further enhanced when these nucleating agents were used in combination with each other. Conclusions/Significance Our results suggest that the addition of heterogeneous nucleating agents increases the chances of crystal formation when using sparse matrix screens. PMID:17971854
Design of a Variational Multiscale Method for Turbulent Compressible Flows

NASA Technical Reports Server (NTRS)

Diosady, Laslo Tibor; Murman, Scott M.

2013-01-01

A spectral-element framework is presented for the simulation of subsonic compressible high-Reynolds-number flows. The focus of the work is maximizing the efficiency of the computational schemes to enable unsteady simulations with a large number of spatial and temporal degrees of freedom. A collocation scheme is combined with optimized computational kernels to provide a residual evaluation with computational cost independent of order of accuracy up to 16th order. The optimized residual routines are used to develop a low-memory implicit scheme based on a matrix-free Newton-Krylov method. A preconditioner based on the finite-difference diagonalized ADI scheme is developed which maintains the low memory of the matrix-free implicit solver, while providing improved convergence properties. Emphasis on low memory usage throughout the solver development is leveraged to implement a coupled space-time DG solver which may offer further efficiency gains through adaptivity in both space and time.
A matrix-form GSM-CFD solver for incompressible fluids and its application to hemodynamics

NASA Astrophysics Data System (ADS)

Yao, Jianyao; Liu, G. R.

2014-10-01

A GSM-CFD solver for incompressible flows is developed based on the gradient smoothing method (GSM). A matrix-form algorithm and corresponding data structure for GSM are devised to efficiently approximate the spatial gradients of field variables using the gradient smoothing operation. The calculated gradient values on various test fields show that the proposed GSM is capable of exactly reproducing linear field and of second order accuracy on all kinds of meshes. It is found that the GSM is much more robust to mesh deformation and therefore more suitable for problems with complicated geometries. Integrated with the artificial compressibility approach, the GSM is extended to solve the incompressible flows. As an example, the flow simulation of carotid bifurcation is carried out to show the effectiveness of the proposed GSM-CFD solver. The blood is modeled as incompressible Newtonian fluid and the vessel is treated as rigid wall in this paper.
Effect of Thermal cycles and Dimensions of the Geometry on Residual stress of the Alumina-Kovar Joint

NASA Astrophysics Data System (ADS)

Mishra, Srishti; Pal, Snehanshu; Karak, Swapan Kumar; Shah, Sejal; Venakata Nagaraju, M.; Chakraborty, Arun Kumar

2018-03-01

Finite element method is employed to determine the effect of variation of residual stress with dimension and the stress generated under its working condition along the Kovar. 3 different dimensions of Alumina-Kovar joint with height to diameter ratio of 3/10, using TiCuSil as a filler material. Transient Structural Analysis is carried out for three different dimensions (diameter × height) (i) 60mm × 20mm (Geometry 1) (ii) 90mm × 20mm (Geometry 2) (iii) 120mm × 20mm (Geometry 3). A comparative study has been carried out between the residual stresses developed in the brazed joint that have undergone 5 thermal cycles subsequent to brazing and that between the brazed joint. The heating and cooling rates from the brazed temperature is 10°C/up to room temperature. The brazing temperature and holding time considered for the analysis are 900°C and 10 minutes. Representative Volume Element (RVE) model is used for simulation. Sparse Matrix Direct Solver method is used to evaluate the results, using Augmented Lagrange method formulation in the contact region. All the simulations are performed in ANSYS Workbench 15.0, using solver target Mechanical APDL. From, the above simulations it is observed high concentration of residual stress is observed along the filler region i.e. in between Alumina and Kovar, as a result of difference in coefficient of thermal expansion between Alumina and Kovar. The residual stress decreases with increasing dimensions of the geometry and upon application of thermal cycles, subsequent to brazing.
On Parallel Push-Relabel based Algorithms for Bipartite Maximum Matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Langguth, Johannes; Azad, Md Ariful; Halappanavar, Mahantesh

2014-07-01

We study multithreaded push-relabel based algorithms for computing maximum cardinality matching in bipartite graphs. Matching is a fundamental combinatorial (graph) problem with applications in a wide variety of problems in science and engineering. We are motivated by its use in the context of sparse linear solvers for computing maximum transversal of a matrix. We implement and test our algorithms on several multi-socket multicore systems and compare their performance to state-of-the-art augmenting path-based serial and parallel algorithms using a testset comprised of a wide range of real-world instances. Building on several heuristics for enhancing performance, we demonstrate good scaling for themore » parallel push-relabel algorithm. We show that it is comparable to the best augmenting path-based algorithms for bipartite matching. To the best of our knowledge, this is the first extensive study of multithreaded push-relabel based algorithms. In addition to a direct impact on the applications using matching, the proposed algorithmic techniques can be extended to preflow-push based algorithms for computing maximum flow in graphs.« less
Full Wave Parallel Code for Modeling RF Fields in Hot Plasmas

NASA Astrophysics Data System (ADS)

Spencer, Joseph; Svidzinski, Vladimir; Evstatiev, Evstati; Galkin, Sergei; Kim, Jin-Soo

2015-11-01

FAR-TECH, Inc. is developing a suite of full wave RF codes in hot plasmas. It is based on a formulation in configuration space with grid adaptation capability. The conductivity kernel (which includes a nonlocal dielectric response) is calculated by integrating the linearized Vlasov equation along unperturbed test particle orbits. For Tokamak applications a 2-D version of the code is being developed. Progress of this work will be reported. This suite of codes has the following advantages over existing spectral codes: 1) It utilizes the localized nature of plasma dielectric response to the RF field and calculates this response numerically without approximations. 2) It uses an adaptive grid to better resolve resonances in plasma and antenna structures. 3) It uses an efficient sparse matrix solver to solve the formulated linear equations. The linear wave equation is formulated using two approaches: for cold plasmas the local cold plasma dielectric tensor is used (resolving resonances by particle collisions), while for hot plasmas the conductivity kernel is calculated. Work is supported by the U.S. DOE SBIR program.
Matrix Algebra for GPU and Multicore Architectures (MAGMA) for Large Petascale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, Jack J.; Tomov, Stanimire

2014-03-24

The goal of the MAGMA project is to create a new generation of linear algebra libraries that achieve the fastest possible time to an accurate solution on hybrid Multicore+GPU-based systems, using all the processing power that future high-end systems can make available within given energy constraints. Our efforts at the University of Tennessee achieved the goals set in all of the five areas identified in the proposal: 1. Communication optimal algorithms; 2. Autotuning for GPU and hybrid processors; 3. Scheduling and memory management techniques for heterogeneity and scale; 4. Fault tolerance and robustness for large scale systems; 5. Building energymore » efficiency into software foundations. The University of Tennessee’s main contributions, as proposed, were the research and software development of new algorithms for hybrid multi/many-core CPUs and GPUs, as related to two-sided factorizations and complete eigenproblem solvers, hybrid BLAS, and energy efficiency for dense, as well as sparse, operations. Furthermore, as proposed, we investigated and experimented with various techniques targeting the five main areas outlined.« less
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory

NASA Astrophysics Data System (ADS)

Challacombe, Matt

2000-06-01

A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.
Implementation of hierarchical clustering using k-mer sparse matrix to analyze MERS-CoV genetic relationship

NASA Astrophysics Data System (ADS)

Bustamam, A.; Ulul, E. D.; Hura, H. F. A.; Siswantining, T.

2017-07-01

Hierarchical clustering is one of effective methods in creating a phylogenetic tree based on the distance matrix between DNA (deoxyribonucleic acid) sequences. One of the well-known methods to calculate the distance matrix is k-mer method. Generally, k-mer is more efficient than some distance matrix calculation techniques. The steps of k-mer method are started from creating k-mer sparse matrix, and followed by creating k-mer singular value vectors. The last step is computing the distance amongst vectors. In this paper, we analyze the sequences of MERS-CoV (Middle East Respiratory Syndrome - Coronavirus) DNA by implementing hierarchical clustering using k-mer sparse matrix in order to perform the phylogenetic analysis. Our results show that the ancestor of our MERS-CoV is coming from Egypt. Moreover, we found that the MERS-CoV infection that occurs in one country may not necessarily come from the same country of origin. This suggests that the process of MERS-CoV mutation might not only be influenced by geographical factor.
One-shot 3D scanning by combining sparse landmarks with dense gradient information

NASA Astrophysics Data System (ADS)

Di Martino, Matías; Flores, Jorge; Ferrari, José A.

2018-06-01

Scene understanding is one of the most challenging and popular problems in the field of robotics and computer vision and the estimation of 3D information is at the core of most of these applications. In order to retrieve the 3D structure of a test surface we propose a single shot approach that combines dense gradient information with sparse absolute measurements. To that end, we designed a colored pattern that codes fine horizontal and vertical fringes, with sparse corners landmarks. By measuring the deformation (bending) of horizontal and vertical fringes, we are able to estimate surface local variations (i.e. its gradient field). Then corner sparse landmarks are detected and matched to infer spare absolute information about the test surface height. Local gradient information is combined with the sparse absolute values which work as anchors to guide the integration process. We show that this can be mathematically done in a very compact and intuitive way by properly defining a Poisson-like partial differential equation. Then we address in detail how the problem can be formulated in a discrete domain and how it can be practically solved by straight forward linear numerical solvers. Finally, validation experiment are presented.
Sparse matrix-vector multiplication on network-on-chip

NASA Astrophysics Data System (ADS)

Sun, C.-C.; Götze, J.; Jheng, H.-Y.; Ruan, S.-J.

2010-12-01

In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.
Treatment of geometric singularities in implicit solvent models

NASA Astrophysics Data System (ADS)

Yu, Sining; Geng, Weihua; Wei, G. W.

2007-06-01

Geometric singularities, such as cusps and self-intersecting surfaces, are major obstacles to the accuracy, convergence, and stability of the numerical solution of the Poisson-Boltzmann (PB) equation. In earlier work, an interface technique based PB solver was developed using the matched interface and boundary (MIB) method, which explicitly enforces the flux jump condition at the solvent-solute interfaces and leads to highly accurate biomolecular electrostatics in continuum electric environments. However, such a PB solver, denoted as MIBPB-I, cannot maintain the designed second order convergence whenever there are geometric singularities, such as cusps and self-intersecting surfaces. Moreover, the matrix of the MIBPB-I is not optimally symmetrical, resulting in the convergence difficulty. The present work presents a new interface method based PB solver, denoted as MIBPB-II, to address the aforementioned problems. The present MIBPB-II solver is systematical and robust in treating geometric singularities and delivers second order convergence for arbitrarily complex molecular surfaces of proteins. A new procedure is introduced to make the MIBPB-II matrix optimally symmetrical and diagonally dominant. The MIBPB-II solver is extensively validated by the molecular surfaces of few-atom systems and a set of 24 proteins. Converged electrostatic potentials and solvation free energies are obtained at a coarse grid spacing of 0.5Å and are considerably more accurate than those obtained by the PBEQ and the APBS at finer grid spacings.
Fast iterative image reconstruction using sparse matrix factorization with GPU acceleration

NASA Astrophysics Data System (ADS)

Zhou, Jian; Qi, Jinyi

2011-03-01

Statistically based iterative approaches for image reconstruction have gained much attention in medical imaging. An accurate system matrix that defines the mapping from the image space to the data space is the key to high-resolution image reconstruction. However, an accurate system matrix is often associated with high computational cost and huge storage requirement. Here we present a method to address this problem by using sparse matrix factorization and parallel computing on a graphic processing unit (GPU).We factor the accurate system matrix into three sparse matrices: a sinogram blurring matrix, a geometric projection matrix, and an image blurring matrix. The sinogram blurring matrix models the detector response. The geometric projection matrix is based on a simple line integral model. The image blurring matrix is to compensate for the line-of-response (LOR) degradation due to the simplified geometric projection matrix. The geometric projection matrix is precomputed, while the sinogram and image blurring matrices are estimated by minimizing the difference between the factored system matrix and the original system matrix. The resulting factored system matrix has much less number of nonzero elements than the original system matrix and thus substantially reduces the storage and computation cost. The smaller size also allows an efficient implement of the forward and back projectors on GPUs, which have limited amount of memory. Our simulation studies show that the proposed method can dramatically reduce the computation cost of high-resolution iterative image reconstruction. The proposed technique is applicable to image reconstruction for different imaging modalities, including x-ray CT, PET, and SPECT.
ELSI: A unified software interface for Kohn–Sham electronic structure solvers

DOE PAGES

Yu, Victor Wen-zhe; Corsetti, Fabiano; Garcia, Alberto; ...

2017-09-15

Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aimsmore » to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. As a result, comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.« less
ELSI: A unified software interface for Kohn-Sham electronic structure solvers

NASA Astrophysics Data System (ADS)

Yu, Victor Wen-zhe; Corsetti, Fabiano; García, Alberto; Huhn, William P.; Jacquelin, Mathias; Jia, Weile; Lange, Björn; Lin, Lin; Lu, Jianfeng; Mi, Wenhui; Seifitokaldani, Ali; Vázquez-Mayagoitia, Álvaro; Yang, Chao; Yang, Haizhao; Blum, Volker

2018-01-01

Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aims to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.
ELSI: A unified software interface for Kohn–Sham electronic structure solvers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yu, Victor Wen-zhe; Corsetti, Fabiano; Garcia, Alberto

Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aimsmore » to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. As a result, comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.« less
An Efficient Scheme for Updating Sparse Cholesky Factors

NASA Technical Reports Server (NTRS)

Raghavan, Padma

2002-01-01

Raghavan had earlier developed the software package DCSPACK which can be used for solving sparse linear systems where the coefficient matrix is symmetric and positive definite (this project was not funded by NASA but by agencies such as NSF). DSCPACK-S is the serial code and DSCPACK-P is a parallel implementation suitable for multiprocessors or networks-of-workstations with message passing using MCI. The main algorithm used is the Cholesky factorization of a sparse symmetric positive positive definite matrix A = LL(T). The code can also compute the factorization A = LDL(T). The complexity of the software arises from several factors relating to the sparsity of the matrix A. A sparse N x N matrix A has typically less that cN nonzeroes where c is a small constant. If the matrix were dense, it would have O(N2) nonzeroes. The most complicated part of such sparse Cholesky factorization relates to fill-in, i.e., zeroes in the original matrix that become nonzeroes in the factor L. An efficient implementation depends to a large extent on complex data structures and on techniques from graph theory to reduce, identify, and manage fill. DSCPACK is based on an efficient multifrontal implementation with fill-managing algorithms and implementation arising from earlier research by Raghavan and others. Sparse Cholesky factorization is typically a four step process: (1) ordering to compute a fill-reducing numbering, (2) symbolic factorization to determine the nonzero structure of L, (3) numeric factorization to compute L, and, (4) triangular solution to solve L(T)x = y and Ly = b. The first two steps are symbolic and are performed using the graph of the matrix. The numeric factorization step is of dominant cost and there are several schemes for improving performance by exploiting the nested and dense structure of groups of columns in the factor. The latter are aimed at better utilization of the cache-memory hierarchy on modem processors to prevent cache-misses and provide execution rates (operations/second) that are close to the peak rates for dense matrix computations. Currently, EPISCOPACY is being used in an application at NASA directed by J. Newman and M. James. We propose the implementation of efficient schemes for updating the LL(T) or LDL(T) factors computed in DSCPACK-S to meet the computational requirements of their project. A brief description is provided in the next section.
Acceleration of FDTD mode solver by high-performance computing techniques.

PubMed

Han, Lin; Xi, Yanping; Huang, Wei-Ping

2010-06-21

A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.
Using Chebyshev polynomials and approximate inverse triangular factorizations for preconditioning the conjugate gradient method

NASA Astrophysics Data System (ADS)

Kaporin, I. E.

2012-02-01

In order to precondition a sparse symmetric positive definite matrix, its approximate inverse is examined, which is represented as the product of two sparse mutually adjoint triangular matrices. In this way, the solution of the corresponding system of linear algebraic equations (SLAE) by applying the preconditioned conjugate gradient method (CGM) is reduced to performing only elementary vector operations and calculating sparse matrix-vector products. A method for constructing the above preconditioner is described and analyzed. The triangular factor has a fixed sparsity pattern and is optimal in the sense that the preconditioned matrix has a minimum K-condition number. The use of polynomial preconditioning based on Chebyshev polynomials makes it possible to considerably reduce the amount of scalar product operations (at the cost of an insignificant increase in the total number of arithmetic operations). The possibility of an efficient massively parallel implementation of the resulting method for solving SLAEs is discussed. For a sequential version of this method, the results obtained by solving 56 test problems from the Florida sparse matrix collection (which are large-scale and ill-conditioned) are presented. These results show that the method is highly reliable and has low computational costs.
An adaptive sparse deconvolution method for distinguishing the overlapping echoes of ultrasonic guided waves for pipeline crack inspection

NASA Astrophysics Data System (ADS)

Chang, Yong; Zi, Yanyang; Zhao, Jiyuan; Yang, Zhe; He, Wangpeng; Sun, Hailiang

2017-03-01

In guided wave pipeline inspection, echoes reflected from closely spaced reflectors generally overlap, meaning useful information is lost. To solve the overlapping problem, sparse deconvolution methods have been developed in the past decade. However, conventional sparse deconvolution methods have limitations in handling guided wave signals, because the input signal is directly used as the prototype of the convolution matrix, without considering the waveform change caused by the dispersion properties of the guided wave. In this paper, an adaptive sparse deconvolution (ASD) method is proposed to overcome these limitations. First, the Gaussian echo model is employed to adaptively estimate the column prototype of the convolution matrix instead of directly using the input signal as the prototype. Then, the convolution matrix is constructed upon the estimated results. Third, the split augmented Lagrangian shrinkage (SALSA) algorithm is introduced to solve the deconvolution problem with high computational efficiency. To verify the effectiveness of the proposed method, guided wave signals obtained from pipeline inspection are investigated numerically and experimentally. Compared to conventional sparse deconvolution methods, e.g. the {{l}1} -norm deconvolution method, the proposed method shows better performance in handling the echo overlap problem in the guided wave signal.

Robust extraction of basis functions for simultaneous and proportional myoelectric control via sparse non-negative matrix factorization

NASA Astrophysics Data System (ADS)

Lin, Chuang; Wang, Binghui; Jiang, Ning; Farina, Dario

2018-04-01

Objective. This paper proposes a novel simultaneous and proportional multiple degree of freedom (DOF) myoelectric control method for active prostheses. Approach. The approach is based on non-negative matrix factorization (NMF) of surface EMG signals with the inclusion of sparseness constraints. By applying a sparseness constraint to the control signal matrix, it is possible to extract the basis information from arbitrary movements (quasi-unsupervised approach) for multiple DOFs concurrently. Main Results. In online testing based on target hitting, able-bodied subjects reached a greater throughput (TP) when using sparse NMF (SNMF) than with classic NMF or with linear regression (LR). Accordingly, the completion time (CT) was shorter for SNMF than NMF or LR. The same observations were made in two patients with unilateral limb deficiencies. Significance. The addition of sparseness constraints to NMF allows for a quasi-unsupervised approach to myoelectric control with superior results with respect to previous methods for the simultaneous and proportional control of multi-DOF. The proposed factorization algorithm allows robust simultaneous and proportional control, is superior to previous supervised algorithms, and, because of minimal supervision, paves the way to online adaptation in myoelectric control.
Sparse PCA with Oracle Property.

PubMed

Gu, Quanquan; Wang, Zhaoran; Liu, Han

In this paper, we study the estimation of the k -dimensional sparse principal subspace of covariance matrix Σ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank- k , and attains a [Formula: see text] statistical rate of convergence with s being the subspace sparsity level and n the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets.
Sparse PCA with Oracle Property

PubMed Central

Gu, Quanquan; Wang, Zhaoran; Liu, Han

2014-01-01

In this paper, we study the estimation of the k-dimensional sparse principal subspace of covariance matrix Σ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank-k, and attains a s/n statistical rate of convergence with s being the subspace sparsity level and n the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets. PMID:25684971
Performance Models for the Spike Banded Linear System Solver

DOE PAGES

Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...

2011-01-01

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
Multigrid Equation Solvers for Large Scale Nonlinear Finite Element Simulations

DTIC Science & Technology

1999-01-01

purpose of the second partitioning phase , on each SMP, is to minimize the communication within the SMP; even if a multi - threaded matrix vector product...8.7 Comparison of model with experimental data for send phase of matrix vector product on ne grid...140 8.4 Matrix vector product phase times : : : : : : : : : : : : : : : : : : : : : : : 145 9.1 Flat and
A solver for General Unilateral Polynomial Matrix Equation with Second-Order Matrices Over Prime Finite Fields

NASA Astrophysics Data System (ADS)

Burtyka, Filipp

2018-03-01

The paper firstly considers the problem of finding solvents for arbitrary unilateral polynomial matrix equations with second-order matrices over prime finite fields from the practical point of view: we implement the solver for this problem. The solver’s algorithm has two step: the first is finding solvents, having Jordan Normal Form (JNF), the second is finding solvents among the rest matrices. The first step reduces to the finding roots of usual polynomials over finite fields, the second is essentially exhaustive search. The first step’s algorithms essentially use the polynomial matrices theory. We estimate the practical duration of computations using our software implementation (for example that one can’t construct unilateral matrix polynomial over finite field, having any predefined number of solvents) and answer some theoretically-valued questions.
Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

NASA Technical Reports Server (NTRS)

Venugopal, Sesh; Naik, Vijay K.

1991-01-01

A block based, automatic partitioning and scheduling methodology is presented for sparse matrix factorization on distributed memory systems. Using experimental results, this technique is analyzed for communication and load imbalance overhead. To study the performance effects, these overheads were compared with those obtained from a straightforward 'wrap mapped' column assignment scheme. All experimental results were obtained using test sparse matrices from the Harwell-Boeing data set. The results show that there is a communication and load balance tradeoff. The block based method results in lower communication cost whereas the wrap mapped scheme gives better load balance.
Response of selected binomial coefficients to varying degrees of matrix sparseness and to matrices with known data interrelationships

USGS Publications Warehouse

Archer, A.W.; Maples, C.G.

1989-01-01

Numerous departures from ideal relationships are revealed by Monte Carlo simulations of widely accepted binomial coefficients. For example, simulations incorporating varying levels of matrix sparseness (presence of zeros indicating lack of data) and computation of expected values reveal that not only are all common coefficients influenced by zero data, but also that some coefficients do not discriminate between sparse or dense matrices (few zero data). Such coefficients computationally merge mutually shared and mutually absent information and do not exploit all the information incorporated within the standard 2 ?? 2 contingency table; therefore, the commonly used formulae for such coefficients are more complicated than the actual range of values produced. Other coefficients do differentiate between mutual presences and absences; however, a number of these coefficients do not demonstrate a linear relationship to matrix sparseness. Finally, simulations using nonrandom matrices with known degrees of row-by-row similarities signify that several coefficients either do not display a reasonable range of values or are nonlinear with respect to known relationships within the data. Analyses with nonrandom matrices yield clues as to the utility of certain coefficients for specific applications. For example, coefficients such as Jaccard, Dice, and Baroni-Urbani and Buser are useful if correction of sparseness is desired, whereas the Russell-Rao coefficient is useful when sparseness correction is not desired. ?? 1989 International Association for Mathematical Geology.
Sparse representation of whole-brain fMRI signals for identification of functional networks.

PubMed

Lv, Jinglei; Jiang, Xi; Li, Xiang; Zhu, Dajiang; Chen, Hanbo; Zhang, Tuo; Zhang, Shu; Hu, Xintao; Han, Junwei; Huang, Heng; Zhang, Jing; Guo, Lei; Liu, Tianming

2015-02-01

There have been several recent studies that used sparse representation for fMRI signal analysis and activation detection based on the assumption that each voxel's fMRI signal is linearly composed of sparse components. Previous studies have employed sparse coding to model functional networks in various modalities and scales. These prior contributions inspired the exploration of whether/how sparse representation can be used to identify functional networks in a voxel-wise way and on the whole brain scale. This paper presents a novel, alternative methodology of identifying multiple functional networks via sparse representation of whole-brain task-based fMRI signals. Our basic idea is that all fMRI signals within the whole brain of one subject are aggregated into a big data matrix, which is then factorized into an over-complete dictionary basis matrix and a reference weight matrix via an effective online dictionary learning algorithm. Our extensive experimental results have shown that this novel methodology can uncover multiple functional networks that can be well characterized and interpreted in spatial, temporal and frequency domains based on current brain science knowledge. Importantly, these well-characterized functional network components are quite reproducible in different brains. In general, our methods offer a novel, effective and unified solution to multiple fMRI data analysis tasks including activation detection, de-activation detection, and functional network identification. Copyright © 2014 Elsevier B.V. All rights reserved.
Incompressible SPH (ISPH) with fast Poisson solver on a GPU

NASA Astrophysics Data System (ADS)

Chow, Alex D.; Rogers, Benedict D.; Lind, Steven J.; Stansby, Peter K.

2018-05-01

This paper presents a fast incompressible SPH (ISPH) solver implemented to run entirely on a graphics processing unit (GPU) capable of simulating several millions of particles in three dimensions on a single GPU. The ISPH algorithm is implemented by converting the highly optimised open-source weakly-compressible SPH (WCSPH) code DualSPHysics to run ISPH on the GPU, combining it with the open-source linear algebra library ViennaCL for fast solutions of the pressure Poisson equation (PPE). Several challenges are addressed with this research: constructing a PPE matrix every timestep on the GPU for moving particles, optimising the limited GPU memory, and exploiting fast matrix solvers. The ISPH pressure projection algorithm is implemented as 4 separate stages, each with a particle sweep, including an algorithm for the population of the PPE matrix suitable for the GPU, and mixed precision storage methods. An accurate and robust ISPH boundary condition ideal for parallel processing is also established by adapting an existing WCSPH boundary condition for ISPH. A variety of validation cases are presented: an impulsively started plate, incompressible flow around a moving square in a box, and dambreaks (2-D and 3-D) which demonstrate the accuracy, flexibility, and speed of the methodology. Fragmentation of the free surface is shown to influence the performance of matrix preconditioners and therefore the PPE matrix solution time. The Jacobi preconditioner demonstrates robustness and reliability in the presence of fragmented flows. For a dambreak simulation, GPU speed ups demonstrate up to 10-18 times and 1.1-4.5 times compared to single-threaded and 16-threaded CPU run times respectively.
Dynamic data integration and stochastic inversion of a confined aquifer

NASA Astrophysics Data System (ADS)

Wang, D.; Zhang, Y.; Irsa, J.; Huang, H.; Wang, L.

2013-12-01

Much work has been done in developing and applying inverse methods to aquifer modeling. The scope of this paper is to investigate the applicability of a new direct method for large inversion problems and to incorporate uncertainty measures in the inversion outcomes (Wang et al., 2013). The problem considered is a two-dimensional inverse model (50×50 grid) of steady-state flow for a heterogeneous ground truth model (500×500 grid) with two hydrofacies. From the ground truth model, decreasing number of wells (12, 6, 3) were sampled for facies types, based on which experimental indicator histograms and directional variograms were computed. These parameters and models were used by Sequential Indicator Simulation to generate 100 realizations of hydrofacies patterns in a 100×100 (geostatistical) grid, which were conditioned to the facies measurements at wells. These realizations were smoothed with Simulated Annealing, coarsened to the 50×50 inverse grid, before they were conditioned with the direct method to the dynamic data, i.e., observed heads and groundwater fluxes at the same sampled wells. A set of realizations of estimated hydraulic conductivities (Ks), flow fields, and boundary conditions were created, which centered on the 'true' solutions from solving the ground truth model. Both hydrofacies conductivities were computed with an estimation accuracy of ×10% (12 wells), ×20% (6 wells), ×35% (3 wells) of the true values. For boundary condition estimation, the accuracy was within × 15% (12 wells), 30% (6 wells), and 50% (3 wells) of the true values. The inversion system of equations was solved with LSQR (Paige et al, 1982), for which coordinate transform and matrix scaling preprocessor were used to improve the condition number (CN) of the coefficient matrix. However, when the inverse grid was refined to 100×100, Gaussian Noise Perturbation was used to limit the growth of the CN before the matrix solve. To scale the inverse problem up (i.e., without smoothing and coarsening and therefore reducing the associated estimation uncertainty), a parallel LSQR solver was written and verified. For the 50×50 grid, the parallel solver sped up the serial solution time by 14X using 4 CPUs (research on parallel performance and scaling is ongoing). A sensitivity analysis was conducted to examine the relation between the observed data and the inversion outcomes, where measurement errors of increasing magnitudes (i.e., ×1, 2, 5, 10% of the total head variation and up to ×2% of the total flux variation) were imposed on the observed data. Inversion results were stable but the accuracy of Ks and boundary estimation degraded with increasing errors, as expected. In particular, quality of the observed heads is critical to hydraulic head recovery, while quality of the observed fluxes plays a dominant role in K estimation. References: Wang, D., Y. Zhang, J. Irsa, H. Huang, and L. Wang (2013), Data integration and stochastic inversion of a confined aquifer with high performance computing, Advances in Water Resources, in preparation. Paige, C. C., and M. A. Saunders (1982), LSQR: an algorithm for sparse linear equations and sparse least squares, ACM Transactions on Mathematical Software, 8(1), 43-71.
Salient Object Detection via Structured Matrix Decomposition.

PubMed

Peng, Houwen; Li, Bing; Ling, Haibin; Hu, Weiming; Xiong, Weihua; Maybank, Stephen J

2016-05-04

Low-rank recovery models have shown potential for salient object detection, where a matrix is decomposed into a low-rank matrix representing image background and a sparse matrix identifying salient objects. Two deficiencies, however, still exist. First, previous work typically assumes the elements in the sparse matrix are mutually independent, ignoring the spatial and pattern relations of image regions. Second, when the low-rank and sparse matrices are relatively coherent, e.g., when there are similarities between the salient objects and background or when the background is complicated, it is difficult for previous models to disentangle them. To address these problems, we propose a novel structured matrix decomposition model with two structural regularizations: (1) a tree-structured sparsity-inducing regularization that captures the image structure and enforces patches from the same object to have similar saliency values, and (2) a Laplacian regularization that enlarges the gaps between salient objects and the background in feature space. Furthermore, high-level priors are integrated to guide the matrix decomposition and boost the detection. We evaluate our model for salient object detection on five challenging datasets including single object, multiple objects and complex scene images, and show competitive results as compared with 24 state-of-the-art methods in terms of seven performance metrics.
Final Report for "Implimentation and Evaluation of Multigrid Linear Solvers into Extended Magnetohydrodynamic Codes for Petascale Computing"

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srinath Vadlamani; Scott Kruger; Travis Austin

Extended magnetohydrodynamic (MHD) codes are used to model the large, slow-growing instabilities that are projected to limit the performance of International Thermonuclear Experimental Reactor (ITER). The multiscale nature of the extended MHD equations requires an implicit approach. The current linear solvers needed for the implicit algorithm scale poorly because the resultant matrices are so ill-conditioned. A new solver is needed, especially one that scales to the petascale. The most successful scalable parallel processor solvers to date are multigrid solvers. Applying multigrid techniques to a set of equations whose fundamental modes are dispersive waves is a promising solution to CEMM problems.more » For the Phase 1, we implemented multigrid preconditioners from the HYPRE project of the Center for Applied Scientific Computing at LLNL via PETSc of the DOE SciDAC TOPS for the real matrix systems of the extended MHD code NIMROD which is a one of the primary modeling codes of the OFES-funded Center for Extended Magnetohydrodynamic Modeling (CEMM) SciDAC. We implemented the multigrid solvers on the fusion test problem that allows for real matrix systems with success, and in the process learned about the details of NIMROD data structures and the difficulties of inverting NIMROD operators. The further success of this project will allow for efficient usage of future petascale computers at the National Leadership Facilities: Oak Ridge National Laboratory, Argonne National Laboratory, and National Energy Research Scientific Computing Center. The project will be a collaborative effort between computational plasma physicists and applied mathematicians at Tech-X Corporation, applied mathematicians Front Range Scientific Computations, Inc. (who are collaborators on the HYPRE project), and other computational plasma physicists involved with the CEMM project.« less
Visual Tracking Based on Extreme Learning Machine and Sparse Representation

PubMed Central

Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

2015-01-01

The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359
Improved Solver Settings for 3D Exploding Wire Simulations in ALEGRA

DTIC Science & Technology

2016-08-01

expanding plasma and shock wave resulting from the wire burst can extend to tens of cen- timeters. The elliptic nature of the magnetic diffusion...such simulations were prohibitively slow due in part to unoptimized (matrix) solver settings. In this report, we address that by varying 6 parameters...distribution is unlimited. simulation code developed by SNL for modeling high-deformation solid dynam- ics, shock -hydrodynamics, magnetohydrodynamics
Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Druinsky, Alex; Ghysels, Pieter; Li, Xiaoye S.

In this paper, we study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSS-embedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservoir simulations. We contrast the performance of our code on one 12-core socket of a Cray XC30 machine with performance on a 60-core Intel Xeon Phi coprocessor. To obtain top performance, we optimized the code to take full advantage of fine-grained parallelism andmore » made it thread-friendly for high thread count. We also developed a bounds-and-bottlenecks performance model of the solver which we used to guide us through the optimization effort, and also carried out performance tuning in the solver’s large parameter space. Finally, as a result, significant speedups were obtained on both machines.« less
Automatic segmentation of right ventricle on ultrasound images using sparse matrix transform and level set

NASA Astrophysics Data System (ADS)

Qin, Xulei; Cong, Zhibin; Halig, Luma V.; Fei, Baowei

2013-03-01

An automatic framework is proposed to segment right ventricle on ultrasound images. This method can automatically segment both epicardial and endocardial boundaries from a continuous echocardiography series by combining sparse matrix transform (SMT), a training model, and a localized region based level set. First, the sparse matrix transform extracts main motion regions of myocardium as eigenimages by analyzing statistical information of these images. Second, a training model of right ventricle is registered to the extracted eigenimages in order to automatically detect the main location of the right ventricle and the corresponding transform relationship between the training model and the SMT-extracted results in the series. Third, the training model is then adjusted as an adapted initialization for the segmentation of each image in the series. Finally, based on the adapted initializations, a localized region based level set algorithm is applied to segment both epicardial and endocardial boundaries of the right ventricle from the whole series. Experimental results from real subject data validated the performance of the proposed framework in segmenting right ventricle from echocardiography. The mean Dice scores for both epicardial and endocardial boundaries are 89.1%+/-2.3% and 83.6+/-7.3%, respectively. The automatic segmentation method based on sparse matrix transform and level set can provide a useful tool for quantitative cardiac imaging.
Robust Principal Component Analysis Regularized by Truncated Nuclear Norm for Identifying Differentially Expressed Genes.

PubMed

Wang, Ya-Xuan; Gao, Ying-Lian; Liu, Jin-Xing; Kong, Xiang-Zhen; Li, Hai-Jun

2017-09-01

Identifying differentially expressed genes from the thousands of genes is a challenging task. Robust principal component analysis (RPCA) is an efficient method in the identification of differentially expressed genes. RPCA method uses nuclear norm to approximate the rank function. However, theoretical studies showed that the nuclear norm minimizes all singular values, so it may not be the best solution to approximate the rank function. The truncated nuclear norm is defined as the sum of some smaller singular values, which may achieve a better approximation of the rank function than nuclear norm. In this paper, a novel method is proposed by replacing nuclear norm of RPCA with the truncated nuclear norm, which is named robust principal component analysis regularized by truncated nuclear norm (TRPCA). The method decomposes the observation matrix of genomic data into a low-rank matrix and a sparse matrix. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Thus, the differentially expressed genes can be identified according to the sparse matrix. The experimental results on The Cancer Genome Atlas data illustrate that the TRPCA method outperforms other state-of-the-art methods in the identification of differentially expressed genes.
The U.S. Geological Survey Modular Ground-Water Model - PCGN: A Preconditioned Conjugate Gradient Solver with Improved Nonlinear Control

USGS Publications Warehouse

Naff, Richard L.; Banta, Edward R.

2008-01-01

The preconditioned conjugate gradient with improved nonlinear control (PCGN) package provides addi-tional means by which the solution of nonlinear ground-water flow problems can be controlled as compared to existing solver packages for MODFLOW. Picard iteration is used to solve nonlinear ground-water flow equations by iteratively solving a linear approximation of the nonlinear equations. The linear solution is provided by means of the preconditioned conjugate gradient algorithm where preconditioning is provided by the modi-fied incomplete Cholesky algorithm. The incomplete Cholesky scheme incorporates two levels of fill, 0 and 1, in which the pivots can be modified so that the row sums of the preconditioning matrix and the original matrix are approximately equal. A relaxation factor is used to implement the modified pivots, which determines the degree of modification allowed. The effects of fill level and degree of pivot modification are briefly explored by means of a synthetic, heterogeneous finite-difference matrix; results are reported in the final section of this report. The preconditioned conjugate gradient method is coupled with Picard iteration so as to efficiently solve the nonlinear equations associated with many ground-water flow problems. The description of this coupling of the linear solver with Picard iteration is a primary concern of this document.
Sparse Regression as a Sparse Eigenvalue Problem

NASA Technical Reports Server (NTRS)

Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai

2008-01-01

We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization

HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION

PubMed Central

Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong

2015-01-01

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645
Adaptive sparse grid approach for the efficient simulation of pulsed eddy current testing inspections

NASA Astrophysics Data System (ADS)

Miorelli, Roberto; Reboud, Christophe

2018-04-01

Pulsed Eddy Current Testing (PECT) is a popular NonDestructive Testing (NDT) technique for some applications like corrosion monitoring in the oil and gas industry, or rivet inspection in the aeronautic area. Its particularity is to use a transient excitation, which allows to retrieve more information from the piece than conventional harmonic ECT, in a simpler and cheaper way than multi-frequency ECT setups. Efficient modeling tools prove, as usual, very useful to optimize experimental sensors and devices or evaluate their performance, for instance. This paper proposes an efficient simulation of PECT signals based on standard time harmonic solvers and use of an Adaptive Sparse Grid (ASG) algorithm. An adaptive sampling of the ECT signal spectrum is performed with this algorithm, then the complete spectrum is interpolated from this sparse representation and PECT signals are finally synthesized by means of inverse Fourier transform. Simulation results corresponding to existing industrial configurations are presented and the performance of the strategy is discussed by comparison to reference results.
Strategies for vectorizing the sparse matrix vector product on the CRAY XMP, CRAY 2, and CYBER 205

NASA Technical Reports Server (NTRS)

Bauschlicher, Charles W., Jr.; Partridge, Harry

1987-01-01

Large, randomly sparse matrix vector products are important in a number of applications in computational chemistry, such as matrix diagonalization and the solution of simultaneous equations. Vectorization of this process is considered for the CRAY XMP, CRAY 2, and CYBER 205, using a matrix of dimension of 20,000 with from 1 percent to 6 percent nonzeros. Efficient scatter/gather capabilities add coding flexibility and yield significant improvements in performance. For the CYBER 205, it is shown that minor changes in the IO can reduce the CPU time by a factor of 50. Similar changes in the CRAY codes make a far smaller improvement.
Linear-scaling density-functional simulations of charged point defects in Al2O3 using hierarchical sparse matrix algebra.

PubMed

Hine, N D M; Haynes, P D; Mostofi, A A; Payne, M C

2010-09-21

We present calculations of formation energies of defects in an ionic solid (Al(2)O(3)) extrapolated to the dilute limit, corresponding to a simulation cell of infinite size. The large-scale calculations required for this extrapolation are enabled by developments in the approach to parallel sparse matrix algebra operations, which are central to linear-scaling density-functional theory calculations. The computational cost of manipulating sparse matrices, whose sizes are determined by the large number of basis functions present, is greatly improved with this new approach. We present details of the sparse algebra scheme implemented in the ONETEP code using hierarchical sparsity patterns, and demonstrate its use in calculations on a wide range of systems, involving thousands of atoms on hundreds to thousands of parallel processes.
Characterizing and differentiating task-based and resting state fMRI signals via two-stage sparse representations.

PubMed

Zhang, Shu; Li, Xiang; Lv, Jinglei; Jiang, Xi; Guo, Lei; Liu, Tianming

2016-03-01

A relatively underexplored question in fMRI is whether there are intrinsic differences in terms of signal composition patterns that can effectively characterize and differentiate task-based or resting state fMRI (tfMRI or rsfMRI) signals. In this paper, we propose a novel two-stage sparse representation framework to examine the fundamental difference between tfMRI and rsfMRI signals. Specifically, in the first stage, the whole-brain tfMRI or rsfMRI signals of each subject were composed into a big data matrix, which was then factorized into a subject-specific dictionary matrix and a weight coefficient matrix for sparse representation. In the second stage, all of the dictionary matrices from both tfMRI/rsfMRI data across multiple subjects were composed into another big data-matrix, which was further sparsely represented by a cross-subjects common dictionary and a weight matrix. This framework has been applied on the recently publicly released Human Connectome Project (HCP) fMRI data and experimental results revealed that there are distinctive and descriptive atoms in the cross-subjects common dictionary that can effectively characterize and differentiate tfMRI and rsfMRI signals, achieving 100% classification accuracy. Moreover, our methods and results can be meaningfully interpreted, e.g., the well-known default mode network (DMN) activities can be recovered from the very noisy and heterogeneous aggregated big-data of tfMRI and rsfMRI signals across all subjects in HCP Q1 release.
Numerical Technology for Large-Scale Computational Electromagnetics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharpe, R; Champagne, N; White, D

The key bottleneck of implicit computational electromagnetics tools for large complex geometries is the solution of the resulting linear system of equations. The goal of this effort was to research and develop critical numerical technology that alleviates this bottleneck for large-scale computational electromagnetics (CEM). The mathematical operators and numerical formulations used in this arena of CEM yield linear equations that are complex valued, unstructured, and indefinite. Also, simultaneously applying multiple mathematical modeling formulations to different portions of a complex problem (hybrid formulations) results in a mixed structure linear system, further increasing the computational difficulty. Typically, these hybrid linear systems aremore » solved using a direct solution method, which was acceptable for Cray-class machines but does not scale adequately for ASCI-class machines. Additionally, LLNL's previously existing linear solvers were not well suited for the linear systems that are created by hybrid implicit CEM codes. Hence, a new approach was required to make effective use of ASCI-class computing platforms and to enable the next generation design capabilities. Multiple approaches were investigated, including the latest sparse-direct methods developed by our ASCI collaborators. In addition, approaches that combine domain decomposition (or matrix partitioning) with general-purpose iterative methods and special purpose pre-conditioners were investigated. Special-purpose pre-conditioners that take advantage of the structure of the matrix were adapted and developed based on intimate knowledge of the matrix properties. Finally, new operator formulations were developed that radically improve the conditioning of the resulting linear systems thus greatly reducing solution time. The goal was to enable the solution of CEM problems that are 10 to 100 times larger than our previous capability.« less
Computing sparse derivatives and consecutive zeros problem

NASA Astrophysics Data System (ADS)

Chandra, B. V. Ravi; Hossain, Shahadat

2013-02-01

We describe a substitution based sparse Jacobian matrix determination method using algorithmic differentiation. Utilizing the a priori known sparsity pattern, a compression scheme is determined using graph coloring. The "compressed pattern" of the Jacobian matrix is then reordered into a form suitable for computation by substitution. We show that the column reordering of the compressed pattern matrix (so as to align the zero entries into consecutive locations in each row) can be viewed as a variant of traveling salesman problem. Preliminary computational results show that on the test problems the performance of nearest-neighbor type heuristic algorithms is highly encouraging.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

B. Hendrickson; T.G. Kolda

1998-09-01

A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
Sparse matrix methods research using the CSM testbed software system

NASA Technical Reports Server (NTRS)

Chu, Eleanor; George, J. Alan

1989-01-01

Research is described on sparse matrix techniques for the Computational Structural Mechanics (CSM) Testbed. The primary objective was to compare the performance of state-of-the-art techniques for solving sparse systems with those that are currently available in the CSM Testbed. Thus, one of the first tasks was to become familiar with the structure of the testbed, and to install some or all of the SPARSPAK package in the testbed. A suite of subroutines to extract from the data base the relevant structural and numerical information about the matrix equations was written, and all the demonstration problems distributed with the testbed were successfully solved. These codes were documented, and performance studies comparing the SPARSPAK technology to the methods currently in the testbed were completed. In addition, some preliminary studies were done comparing some recently developed out-of-core techniques with the performance of the testbed processor INV.
A tight and explicit representation of Q in sparse QR factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ng, E.G.; Peyton, B.W.

1992-05-01

In QR factorization of a sparse m{times}n matrix A (m {ge} n) the orthogonal factor Q is often stored implicitly as a lower trapezoidal matrix H known as the Householder matrix. This paper presents a simple characterization of the row structure of Q, which could be used as the basis for a sparse data structure that can store Q explicitly. The new characterization is a simple extension of a well known row-oriented characterization of the structure of H. Hare, Johnson, Olesky, and van den Driessche have recently provided a complete sparsity analysis of the QR factorization. Let U be themore » matrix consisting of the first n columns of Q. Using results from, we show that the data structures for H and U resulting from our characterizations are tight when A is a strong Hall matrix. We also show that H and the lower trapezoidal part of U have the same sparsity characterization when A is strong Hall. We then show that this characterization can be extended to any weak Hall matrix that has been permuted into block upper triangular form. Finally, we show that permuting to block triangular form never increases the fill incurred during the factorization.« less
Doubly Nonparametric Sparse Nonnegative Matrix Factorization Based on Dependent Indian Buffet Processes.

PubMed

Xuan, Junyu; Lu, Jie; Zhang, Guangquan; Xu, Richard Yi Da; Luo, Xiangfeng

2018-05-01

Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.
Large Covariance Estimation by Thresholding Principal Orthogonal Complements

PubMed Central

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2012-01-01

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented. PMID:24348088
Large Covariance Estimation by Thresholding Principal Orthogonal Complements.

PubMed

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2013-09-01

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
Fast Multilevel Solvers for a Class of Discrete Fourth Order Parabolic Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheng, Bin; Chen, Luoping; Hu, Xiaozhe

2016-03-05

In this paper, we study fast iterative solvers for the solution of fourth order parabolic equations discretized by mixed finite element methods. We propose to use consistent mass matrix in the discretization and use lumped mass matrix to construct efficient preconditioners. We provide eigenvalue analysis for the preconditioned system and estimate the convergence rate of the preconditioned GMRes method. Furthermore, we show that these preconditioners only need to be solved inexactly by optimal multigrid algorithms. Our numerical examples indicate that the proposed preconditioners are very efficient and robust with respect to both discretization parameters and diffusion coefficients. We also investigatemore » the performance of multigrid algorithms with either collective smoothers or distributive smoothers when solving the preconditioner systems.« less
NASA Tech Briefs, August 2005

NASA Technical Reports Server (NTRS)

2005-01-01

Topics include: Hidden Identification on Parts: Magnetic Machine-Readable Matrix Symbols; System for Processing Coded OFDM Under Doppler and Fading; Multipurpose Hyperspectral Imaging System; Magnetic-Flux-Compensated Voltage Divider; High-Performance Satellite/Terrestrial-Network Gateway; Internet-Based System for Voice Communication With the ISS; Stripline/Microstrip Transition in Multilayer Circuit Board; Dual-Band Feed for a Microwave Reflector Antenna; Quadratic Programming for Allocating Control Effort; Range Process Simulation Tool; Simulator of Space Communication Networks; Computing Q-D Relationships for Storage of Rocket Fuels; Contour Error Map Algorithm; Portfolio Analysis Tool; Glass Frit Filters for Collecting Metal Oxide Nanoparticles; Anhydrous Proton-Conducting Membranes for Fuel Cells; Portable Electron-Beam Free-Form Fabrication System; Miniature Laboratory for Detecting Sparse Biomolecules; Multicompartment Liquid-Cooling/Warming Protective Garments; Laser Metrology for an Optical-Path-Length Modulator; PCM Passive Cooling System Containing Active Subsystems; Automated Electrostatics Environmental Chamber; Estimating Aeroheating of a 3D Body Using a 2D Flow Solver; Artificial Immune System for Recognizing Patterns; Computing the Thermodynamic State of a Cryogenic Fluid; Safety and Mission Assurance Performance Metric; Magnetic Control of Concentration Gradient in Microgravity; Avionics for a Small Robotic Inspection Spacecraft; and Simulation of Dynamics of a Flexible Miniature Airplane.
Matlab Geochemistry: An open source geochemistry solver based on MRST

NASA Astrophysics Data System (ADS)

McNeece, C. J.; Raynaud, X.; Nilsen, H.; Hesse, M. A.

2017-12-01

The study of geological systems often requires the solution of complex geochemical relations. To address this need we present an open source geochemical solver based on the Matlab Reservoir Simulation Toolbox (MRST) developed by SINTEF. The implementation supports non-isothermal multicomponent aqueous complexation, surface complexation, ion exchange, and dissolution/precipitation reactions. The suite of tools available in MRST allows for rapid model development, in particular the incorporation of geochemical calculations into transport simulations of multiple phases, complex domain geometry and geomechanics. Different numerical schemes and additional physics can be easily incorporated into the existing tools through the object-oriented framework employed by MRST. The solver leverages the automatic differentiation tools available in MRST to solve arbitrarily complex geochemical systems with any choice of species or element concentration as input. Four mathematical approaches enable the solver to be quite robust: 1) the choice of chemical elements as the basis components makes all entries in the composition matrix positive thus preserving convexity, 2) a log variable transformation is used which transfers the nonlinearity to the convex composition matrix, 3) a priori bounds on variables are calculated from the structure of the problem, constraining Netwon's path and 4) an initial guess is calculated implicitly by sequentially adding model complexity. As a benchmark we compare the model to experimental and semi-analytic solutions of the coupled salinity-acidity transport system. Together with the reservoir simulation capabilities of MRST the solver offers a promising tool for geochemical simulations in reservoir domains for applications in a diversity of fields from enhanced oil recovery to radionuclide storage.
Linear solver performance in elastoplastic problem solution on GPU cluster

NASA Astrophysics Data System (ADS)

Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.

2017-12-01

Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.
Preconditioned conjugate-gradient methods for low-speed flow calculations

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing

1993-01-01

An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations is integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the Lower-Upper Successive Symmetric Over-Relaxation iterative scheme is more efficient than a preconditioner based on Incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional Line Gauss-Seidel Relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
Preconditioned Conjugate Gradient methods for low speed flow calculations

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing

1993-01-01

An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations are integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and the convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the lower-upper (L-U)-successive symmetric over-relaxation iterative scheme is more efficient than a preconditioner based on incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional line Gauss-Seidel relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
Collaborative sparse priors for multi-view ATR

NASA Astrophysics Data System (ADS)

Li, Xuelu; Monga, Vishal

2018-04-01

Recent work has seen a surge of sparse representation based classification (SRC) methods applied to automatic target recognition problems. While traditional SRC approaches used l0 or l1 norm to quantify sparsity, spike and slab priors have established themselves as the gold standard for providing general tunable sparse structures on vectors. In this work, we employ collaborative spike and slab priors that can be applied to matrices to encourage sparsity for the problem of multi-view ATR. That is, target images captured from multiple views are expanded in terms of a training dictionary multiplied with a coefficient matrix. Ideally, for a test image set comprising of multiple views of a target, coefficients corresponding to its identifying class are expected to be active, while others should be zero, i.e. the coefficient matrix is naturally sparse. We develop a new approach to solve the optimization problem that estimates the sparse coefficient matrix jointly with the sparsity inducing parameters in the collaborative prior. ATR problems are investigated on the mid-wave infrared (MWIR) database made available by the US Army Night Vision and Electronic Sensors Directorate, which has a rich collection of views. Experimental results show that the proposed joint prior and coefficient estimation method (JPCEM) can: 1.) enable improved accuracy when multiple views vs. a single one are invoked, and 2.) outperform state of the art alternatives particularly when training imagery is limited.

Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.

PubMed

Lam, Clifford; Fan, Jianqing

2009-01-01

This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.
Immunohistochemical evidence of rapid extracellular matrix remodeling after iron-particle irradiation of mouse mammary gland

NASA Technical Reports Server (NTRS)

Ehrhart, E. J.; Gillette, E. L.; Barcellos-Hoff, M. H.; Chaterjee, A. (Principal Investigator)

1996-01-01

High-LET radiation has unique physical and biological properties compared to sparsely ionizing radiation. Recent studies demonstrate that sparsely ionizing radiation rapidly alters the pattern of extracellular matrix expression in several tissues, but little is known about the effect of heavy-ion radiation. This study investigates densely ionizing radiation-induced changes in extracellular matrix localization in the mammary glands of adult female BALB/c mice after whole-body irradiation with 0.8 Gy 600 MeV iron particles. The basement membrane and interstitial extracellular matrix proteins of the mammary gland stroma were mapped with respect to time postirradiation using immunofluorescence. Collagen III was induced in the adipose stroma within 1 day, continued to increase through day 9 and was resolved by day 14. Immunoreactive tenascin was induced in the epithelium by day 1, was evident at the epithelial-stromal interface by day 5-9 and persisted as a condensed layer beneath the basement membrane through day 14. These findings parallel similar changes induced by gamma irradiation but demonstrate different onset and chronicity. In contrast, the integrity of epithelial basement membrane, which was unaffected by sparsely ionizing radiation, was disrupted by iron-particle irradiation. Laminin immunoreactivity was mildly irregular at 1 h postirradiation and showed discontinuities and thickening from days 1 to 9. Continuity was restored by day 14. Thus high-LET radiation, like sparsely ionizing radiation, induces rapid-remodeling of the stromal extracellular matrix but also appears to alter the integrity of the epithelial basement membrane, which is an important regulator of epithelial cell proliferation and differentiation.
High-dimensional statistical inference: From vector to matrix

NASA Astrophysics Data System (ADS)

Zhang, Anru

Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.
Optimized and parallelized implementation of the electronegativity equalization method and the atom-bond electronegativity equalization method.

PubMed

Vareková, R Svobodová; Koca, J

2006-02-01

The most common way to calculate charge distribution in a molecule is ab initio quantum mechanics (QM). Some faster alternatives to QM have also been developed, the so-called "equalization methods" EEM and ABEEM, which are based on DFT. We have implemented and optimized the EEM and ABEEM methods and created the EEM SOLVER and ABEEM SOLVER programs. It has been found that the most time-consuming part of equalization methods is the reduction of the matrix belonging to the equation system generated by the method. Therefore, for both methods this part was replaced by the parallel algorithm WIRS and implemented within the PVM environment. The parallelized versions of the programs EEM SOLVER and ABEEM SOLVER showed promising results, especially on a single computer with several processors (compact PVM). The implemented programs are available through the Web page http://ncbr.chemi.muni.cz/~n19n/eem_abeem.
Solving large tomographic linear systems: size reduction and error estimation

NASA Astrophysics Data System (ADS)

Voronin, Sergey; Mikesell, Dylan; Slezak, Inna; Nolet, Guust

2014-10-01

We present a new approach to reduce a sparse, linear system of equations associated with tomographic inverse problems. We begin by making a modification to the commonly used compressed sparse-row format, whereby our format is tailored to the sparse structure of finite-frequency (volume) sensitivity kernels in seismic tomography. Next, we cluster the sparse matrix rows to divide a large matrix into smaller subsets representing ray paths that are geographically close. Singular value decomposition of each subset allows us to project the data onto a subspace associated with the largest eigenvalues of the subset. After projection we reject those data that have a signal-to-noise ratio (SNR) below a chosen threshold. Clustering in this way assures that the sparse nature of the system is minimally affected by the projection. Moreover, our approach allows for a precise estimation of the noise affecting the data while also giving us the ability to identify outliers. We illustrate the method by reducing large matrices computed for global tomographic systems with cross-correlation body wave delays, as well as with surface wave phase velocity anomalies. For a massive matrix computed for 3.7 million Rayleigh wave phase velocity measurements, imposing a threshold of 1 for the SNR, we condensed the matrix size from 1103 to 63 Gbyte. For a global data set of multiple-frequency P wave delays from 60 well-distributed deep earthquakes we obtain a reduction to 5.9 per cent. This type of reduction allows one to avoid loss of information due to underparametrizing models. Alternatively, if data have to be rejected to fit the system into computer memory, it assures that the most important data are preserved.
Discriminative Transfer Subspace Learning via Low-Rank and Sparse Representation.

PubMed

Xu, Yong; Fang, Xiaozhao; Wu, Jian; Li, Xuelong; Zhang, David

2016-02-01

In this paper, we address the problem of unsupervised domain transfer learning in which no labels are available in the target domain. We use a transformation matrix to transfer both the source and target data to a common subspace, where each target sample can be represented by a combination of source samples such that the samples from different domains can be well interlaced. In this way, the discrepancy of the source and target domains is reduced. By imposing joint low-rank and sparse constraints on the reconstruction coefficient matrix, the global and local structures of data can be preserved. To enlarge the margins between different classes as much as possible and provide more freedom to diminish the discrepancy, a flexible linear classifier (projection) is obtained by learning a non-negative label relaxation matrix that allows the strict binary label matrix to relax into a slack variable matrix. Our method can avoid a potentially negative transfer by using a sparse matrix to model the noise and, thus, is more robust to different types of noise. We formulate our problem as a constrained low-rankness and sparsity minimization problem and solve it by the inexact augmented Lagrange multiplier method. Extensive experiments on various visual domain adaptation tasks show the superiority of the proposed method over the state-of-the art methods. The MATLAB code of our method will be publicly available at http://www.yongxu.org/lunwen.html.
Improved analysis of SP and CoSaMP under total perturbations

NASA Astrophysics Data System (ADS)

Li, Haifeng

2016-12-01

Practically, in the underdetermined model y= A x, where x is a K sparse vector (i.e., it has no more than K nonzero entries), both y and A could be totally perturbed. A more relaxed condition means less number of measurements are needed to ensure the sparse recovery from theoretical aspect. In this paper, based on restricted isometry property (RIP), for subspace pursuit (SP) and compressed sampling matching pursuit (CoSaMP), two relaxed sufficient conditions are presented under total perturbations to guarantee that the sparse vector x is recovered. Taking random matrix as measurement matrix, we also discuss the advantage of our condition. Numerical experiments validate that SP and CoSaMP can provide oracle-order recovery performance.
Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions

PubMed Central

Liu, Weidong; Luo, Xi

2014-01-01

This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset. PMID:25750463
Multi-energy CT based on a prior rank, intensity and sparsity model (PRISM).

PubMed

Gao, Hao; Yu, Hengyong; Osher, Stanley; Wang, Ge

2011-11-01

We propose a compressive sensing approach for multi-energy computed tomography (CT), namely the prior rank, intensity and sparsity model (PRISM). To further compress the multi-energy image for allowing the reconstruction with fewer CT data and less radiation dose, the PRISM models a multi-energy image as the superposition of a low-rank matrix and a sparse matrix (with row dimension in space and column dimension in energy), where the low-rank matrix corresponds to the stationary background over energy that has a low matrix rank, and the sparse matrix represents the rest of distinct spectral features that are often sparse. Distinct from previous methods, the PRISM utilizes the generalized rank, e.g., the matrix rank of tight-frame transform of a multi-energy image, which offers a way to characterize the multi-level and multi-filtered image coherence across the energy spectrum. Besides, the energy-dependent intensity information can be incorporated into the PRISM in terms of the spectral curves for base materials, with which the restoration of the multi-energy image becomes the reconstruction of the energy-independent material composition matrix. In other words, the PRISM utilizes prior knowledge on the generalized rank and sparsity of a multi-energy image, and intensity/spectral characteristics of base materials. Furthermore, we develop an accurate and fast split Bregman method for the PRISM and demonstrate the superior performance of the PRISM relative to several competing methods in simulations.
A new optimization method using a compressed sensing inspired solver for real-time LDR-brachytherapy treatment planning

NASA Astrophysics Data System (ADS)

Guthier, C.; Aschenbrenner, K. P.; Buergy, D.; Ehmann, M.; Wenz, F.; Hesser, J. W.

2015-03-01

This work discusses a novel strategy for inverse planning in low dose rate brachytherapy. It applies the idea of compressed sensing to the problem of inverse treatment planning and a new solver for this formulation is developed. An inverse planning algorithm was developed incorporating brachytherapy dose calculation methods as recommended by AAPM TG-43. For optimization of the functional a new variant of a matching pursuit type solver is presented. The results are compared with current state-of-the-art inverse treatment planning algorithms by means of real prostate cancer patient data. The novel strategy outperforms the best state-of-the-art methods in speed, while achieving comparable quality. It is able to find solutions with comparable values for the objective function and it achieves these results within a few microseconds, being up to 542 times faster than competing state-of-the-art strategies, allowing real-time treatment planning. The sparse solution of inverse brachytherapy planning achieved with methods from compressed sensing is a new paradigm for optimization in medical physics. Through the sparsity of required needles and seeds identified by this method, the cost of intervention may be reduced.
A new optimization method using a compressed sensing inspired solver for real-time LDR-brachytherapy treatment planning.

PubMed

Guthier, C; Aschenbrenner, K P; Buergy, D; Ehmann, M; Wenz, F; Hesser, J W

2015-03-21

This work discusses a novel strategy for inverse planning in low dose rate brachytherapy. It applies the idea of compressed sensing to the problem of inverse treatment planning and a new solver for this formulation is developed. An inverse planning algorithm was developed incorporating brachytherapy dose calculation methods as recommended by AAPM TG-43. For optimization of the functional a new variant of a matching pursuit type solver is presented. The results are compared with current state-of-the-art inverse treatment planning algorithms by means of real prostate cancer patient data. The novel strategy outperforms the best state-of-the-art methods in speed, while achieving comparable quality. It is able to find solutions with comparable values for the objective function and it achieves these results within a few microseconds, being up to 542 times faster than competing state-of-the-art strategies, allowing real-time treatment planning. The sparse solution of inverse brachytherapy planning achieved with methods from compressed sensing is a new paradigm for optimization in medical physics. Through the sparsity of required needles and seeds identified by this method, the cost of intervention may be reduced.
Detection of Protein Complexes Based on Penalized Matrix Decomposition in a Sparse Protein⁻Protein Interaction Network.

PubMed

Cao, Buwen; Deng, Shuguang; Qin, Hua; Ding, Pingjian; Chen, Shaopeng; Li, Guanghui

2018-06-15

High-throughput technology has generated large-scale protein interaction data, which is crucial in our understanding of biological organisms. Many complex identification algorithms have been developed to determine protein complexes. However, these methods are only suitable for dense protein interaction networks, because their capabilities decrease rapidly when applied to sparse protein⁻protein interaction (PPI) networks. In this study, based on penalized matrix decomposition ( PMD ), a novel method of penalized matrix decomposition for the identification of protein complexes (i.e., PMD pc ) was developed to detect protein complexes in the human protein interaction network. This method mainly consists of three steps. First, the adjacent matrix of the protein interaction network is normalized. Second, the normalized matrix is decomposed into three factor matrices. The PMD pc method can detect protein complexes in sparse PPI networks by imposing appropriate constraints on factor matrices. Finally, the results of our method are compared with those of other methods in human PPI network. Experimental results show that our method can not only outperform classical algorithms, such as CFinder, ClusterONE, RRW, HC-PIN, and PCE-FR, but can also achieve an ideal overall performance in terms of a composite score consisting of F-measure, accuracy (ACC), and the maximum matching ratio (MMR).
Efficient Broadband Simulation of Fluid-Structure Coupling for Membrane-Type Acoustic Transducer Arrays Using the Multilevel Fast Multipole Algorithm.

PubMed

Shieh, Bernard; Sabra, Karim G; Degertekin, F Levent

2016-11-01

A boundary element model provides great flexibility for the simulation of membrane-type micromachined ultrasonic transducers (MUTs) in terms of membrane shape, actuating mechanism, and array layout. Acoustic crosstalk is accounted for through a mutual impedance matrix that captures the primary crosstalk mechanism of dispersive-guided modes generated at the fluid-solid interface. However, finding the solution to the fully populated boundary element matrix equation using standard techniques requires computation time and memory usage that scales by the cube and by the square of the number of nodes, respectively, limiting simulation to a small number of membranes. We implement a solver with improved speed and efficiency through the application of a multilevel fast multipole algorithm (FMA). By approximating the fields of collections of nodes using multipole expansions of the free-space Green's function, an FMA solver can enable the simulation of hundreds of thousands of nodes while incurring an approximation error that is controllable. Convergence is drastically improved using a problem-specific block-diagonal preconditioner. We demonstrate the solver's capabilities by simulating a 32-element 7-MHz 1-D capacitive MUT (CMUT) phased array with 2880 membranes. The array is simulated using 233280 nodes for a very wide frequency band up to 50 MHz. For a simulation with 15210 nodes, the FMA solver performed ten times faster and used 32 times less memory than a standard solver based on LU decomposition. We investigate the effects of mesh density and phasing on the predicted array response and find that it is necessary to use about seven nodes over the width of the membrane to observe convergence of the solution-even below the first membrane resonance frequency-due to the influence of higher order membrane modes.
Target detection in GPR data using joint low-rank and sparsity constraints

NASA Astrophysics Data System (ADS)

Bouzerdoum, Abdesselam; Tivive, Fok Hing Chi; Abeynayake, Canicious

2016-05-01

In ground penetrating radars, background clutter, which comprises the signals backscattered from the rough, uneven ground surface and the background noise, impairs the visualization of buried objects and subsurface inspections. In this paper, a clutter mitigation method is proposed for target detection. The removal of background clutter is formulated as a constrained optimization problem to obtain a low-rank matrix and a sparse matrix. The low-rank matrix captures the ground surface reflections and the background noise, whereas the sparse matrix contains the target reflections. An optimization method based on split-Bregman algorithm is developed to estimate these two matrices from the input GPR data. Evaluated on real radar data, the proposed method achieves promising results in removing the background clutter and enhancing the target signature.
Code Samples Used for Complexity and Control

NASA Astrophysics Data System (ADS)

Ivancevic, Vladimir G.; Reid, Darryn J.

2015-11-01

The following sections are included: * MathematicaⓇ Code * Generic Chaotic Simulator * Vector Differential Operators * NLS Explorer * 2C++ Code * C++ Lambda Functions for Real Calculus * Accelerometer Data Processor * Simple Predictor-Corrector Integrator * Solving the BVP with the Shooting Method * Linear Hyperbolic PDE Solver * Linear Elliptic PDE Solver * Method of Lines for a Set of the NLS Equations * C# Code * Iterative Equation Solver * Simulated Annealing: A Function Minimum * Simple Nonlinear Dynamics * Nonlinear Pendulum Simulator * Lagrangian Dynamics Simulator * Complex-Valued Crowd Attractor Dynamics * Freeform Fortran Code * Lorenz Attractor Simulator * Complex Lorenz Attractor * Simple SGE Soliton * Complex Signal Presentation * Gaussian Wave Packet * Hermitian Matrices * Euclidean L2-Norm * Vector/Matrix Operations * Plain C-Code: Levenberg-Marquardt Optimizer * Free Basic Code: 2D Crowd Dynamics with 3000 Agents
Implementing a Matrix-free Analytical Jacobian to Handle Nonlinearities in Models of 3D Lithospheric Deformation

NASA Astrophysics Data System (ADS)

Kaus, B.; Popov, A.

2015-12-01

The analytical expression for the Jacobian is a key component to achieve fast and robust convergence of the nonlinear Newton-Raphson iterative solver. Accomplishing this task in practice often requires a significant algebraic effort. Therefore it is quite common to use a cheap alternative instead, for example by approximating the Jacobian with a finite difference estimation. Despite its simplicity it is a relatively fragile and unreliable technique that is sensitive to the scaling of the residual and unknowns, as well as to the perturbation parameter selection. Unfortunately no universal rule can be applied to provide both a robust scaling and a perturbation. The approach we use here is to derive the analytical Jacobian for the coupled set of momentum, mass, and energy conservation equations together with the elasto-visco-plastic rheology and a marker in cell/staggered finite difference method. The software project LaMEM (Lithosphere and Mantle Evolution Model) is primarily developed for the thermo-mechanically coupled modeling of the 3D lithospheric deformation. The code is based on a staggered grid finite difference discretization in space, and uses customized scalable solvers form PETSc library to efficiently run on the massively parallel machines (such as IBM Blue Gene/Q). Currently LaMEM relies on the Jacobian-Free Newton-Krylov (JFNK) nonlinear solver, which approximates the Jacobian-vector product using a simple finite difference formula. This approach never requires an assembled Jacobian matrix and uses only the residual computation routine. We use an approximate Jacobian (Picard) matrix to precondition the Krylov solver with the Galerkin geometric multigrid. Because of the inherent problems of the finite difference Jacobian estimation, this approach doesn't always result in stable convergence. In this work we present and discuss a matrix-free technique in which the Jacobian-vector product is replaced by analytically-derived expressions and compare results with those obtained with a finite difference approximation of the Jacobian. This project is funded by ERC Starting Grant 258830 and computer facilities were provided by Jülich supercomputer center (Germany).
Sparse image reconstruction for molecular imaging.

PubMed

Ting, Michael; Raich, Raviv; Hero, Alfred O

2009-06-01

The application that motivates this paper is molecular imaging at the atomic level. When discretized at subatomic distances, the volume is inherently sparse. Noiseless measurements from an imaging technology can be modeled by convolution of the image with the system point spread function (psf). Such is the case with magnetic resonance force microscopy (MRFM), an emerging technology where imaging of an individual tobacco mosaic virus was recently demonstrated with nanometer resolution. We also consider additive white Gaussian noise (AWGN) in the measurements. Many prior works of sparse estimators have focused on the case when H has low coherence; however, the system matrix H in our application is the convolution matrix for the system psf. A typical convolution matrix has high coherence. This paper, therefore, does not assume a low coherence H. A discrete-continuous form of the Laplacian and atom at zero (LAZE) p.d.f. used by Johnstone and Silverman is formulated, and two sparse estimators derived by maximizing the joint p.d.f. of the observation and image conditioned on the hyperparameters. A thresholding rule that generalizes the hard and soft thresholding rule appears in the course of the derivation. This so-called hybrid thresholding rule, when used in the iterative thresholding framework, gives rise to the hybrid estimator, a generalization of the lasso. Estimates of the hyperparameters for the lasso and hybrid estimator are obtained via Stein's unbiased risk estimate (SURE). A numerical study with a Gaussian psf and two sparse images shows that the hybrid estimator outperforms the lasso.
Total variation-based method for radar coincidence imaging with model mismatch for extended target

NASA Astrophysics Data System (ADS)

Cao, Kaicheng; Zhou, Xiaoli; Cheng, Yongqiang; Fan, Bo; Qin, Yuliang

2017-11-01

Originating from traditional optical coincidence imaging, radar coincidence imaging (RCI) is a staring/forward-looking imaging technique. In RCI, the reference matrix must be computed precisely to reconstruct the image as preferred; unfortunately, such precision is almost impossible due to the existence of model mismatch in practical applications. Although some conventional sparse recovery algorithms are proposed to solve the model-mismatch problem, they are inapplicable to nonsparse targets. We therefore sought to derive the signal model of RCI with model mismatch by replacing the sparsity constraint item with total variation (TV) regularization in the sparse total least squares optimization problem; in this manner, we obtain the objective function of RCI with model mismatch for an extended target. A more robust and efficient algorithm called TV-TLS is proposed, in which the objective function is divided into two parts and the perturbation matrix and scattering coefficients are updated alternately. Moreover, due to the ability of TV regularization to recover sparse signal or image with sparse gradient, TV-TLS method is also applicable to sparse recovering. Results of numerical experiments demonstrate that, for uniform extended targets, sparse targets, and real extended targets, the algorithm can achieve preferred imaging performance both in suppressing noise and in adapting to model mismatch.
Equivalent charge source model based iterative maximum neighbor weight for sparse EEG source localization.

PubMed

Xu, Peng; Tian, Yin; Lei, Xu; Hu, Xiao; Yao, Dezhong

2008-12-01

How to localize the neural electric activities within brain effectively and precisely from the scalp electroencephalogram (EEG) recordings is a critical issue for current study in clinical neurology and cognitive neuroscience. In this paper, based on the charge source model and the iterative re-weighted strategy, proposed is a new maximum neighbor weight based iterative sparse source imaging method, termed as CMOSS (Charge source model based Maximum neighbOr weight Sparse Solution). Different from the weight used in focal underdetermined system solver (FOCUSS) where the weight for each point in the discrete solution space is independently updated in iterations, the new designed weight for each point in each iteration is determined by the source solution of the last iteration at both the point and its neighbors. Using such a new weight, the next iteration may have a bigger chance to rectify the local source location bias existed in the previous iteration solution. The simulation studies with comparison to FOCUSS and LORETA for various source configurations were conducted on a realistic 3-shell head model, and the results confirmed the validation of CMOSS for sparse EEG source localization. Finally, CMOSS was applied to localize sources elicited in a visual stimuli experiment, and the result was consistent with those source areas involved in visual processing reported in previous studies.
Randomized subspace-based robust principal component analysis for hyperspectral anomaly detection

NASA Astrophysics Data System (ADS)

Sun, Weiwei; Yang, Gang; Li, Jialin; Zhang, Dianfa

2018-01-01

A randomized subspace-based robust principal component analysis (RSRPCA) method for anomaly detection in hyperspectral imagery (HSI) is proposed. The RSRPCA combines advantages of randomized column subspace and robust principal component analysis (RPCA). It assumes that the background has low-rank properties, and the anomalies are sparse and do not lie in the column subspace of the background. First, RSRPCA implements random sampling to sketch the original HSI dataset from columns and to construct a randomized column subspace of the background. Structured random projections are also adopted to sketch the HSI dataset from rows. Sketching from columns and rows could greatly reduce the computational requirements of RSRPCA. Second, the RSRPCA adopts the columnwise RPCA (CWRPCA) to eliminate negative effects of sampled anomaly pixels and that purifies the previous randomized column subspace by removing sampled anomaly columns. The CWRPCA decomposes the submatrix of the HSI data into a low-rank matrix (i.e., background component), a noisy matrix (i.e., noise component), and a sparse anomaly matrix (i.e., anomaly component) with only a small proportion of nonzero columns. The algorithm of inexact augmented Lagrange multiplier is utilized to optimize the CWRPCA problem and estimate the sparse matrix. Nonzero columns of the sparse anomaly matrix point to sampled anomaly columns in the submatrix. Third, all the pixels are projected onto the complemental subspace of the purified randomized column subspace of the background and the anomaly pixels in the original HSI data are finally exactly located. Several experiments on three real hyperspectral images are carefully designed to investigate the detection performance of RSRPCA, and the results are compared with four state-of-the-art methods. Experimental results show that the proposed RSRPCA outperforms four comparison methods both in detection performance and in computational time.

Highly parallel sparse Cholesky factorization

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Schreiber, Robert

1990-01-01

Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Preconditioned augmented Lagrangian formulation for nearly incompressible cardiac mechanics.

PubMed

Campos, Joventino Oliveira; Dos Santos, Rodrigo Weber; Sundnes, Joakim; Rocha, Bernardo Martins

2018-04-01

Computational modeling of the heart is a subject of substantial medical and scientific interest, which may contribute to increase the understanding of several phenomena associated with cardiac physiological and pathological states. Modeling the mechanics of the heart have led to considerable insights, but it still represents a complex and a demanding computational problem, especially in a strongly coupled electromechanical setting. Passive cardiac tissue is commonly modeled as hyperelastic and is characterized by quasi-incompressible, orthotropic, and nonlinear material behavior. These factors are known to be very challenging for the numerical solution of the model. The near-incompressibility is known to cause numerical issues such as the well-known locking phenomenon and ill-conditioning of the stiffness matrix. In this work, the augmented Lagrangian method is used to handle the nearly incompressible condition. This approach can potentially improve computational performance by reducing the condition number of the stiffness matrix and thereby improving the convergence of iterative solvers. We also improve the performance of iterative solvers by the use of an algebraic multigrid preconditioner. Numerical results of the augmented Lagrangian method combined with a preconditioned iterative solver for a cardiac mechanics benchmark suite are presented to show its improved performance. Copyright © 2017 John Wiley & Sons, Ltd.
An implicit numerical scheme for the simulation of internal viscous flows on unstructured grids

NASA Technical Reports Server (NTRS)

Jorgenson, Philip C. E.; Pletcher, Richard H.

1994-01-01

The Navier-Stokes equations are solved numerically for two-dimensional steady viscous laminar flows. The grids are generated based on the method of Delaunay triangulation. A finite-volume approach is used to discretize the conservation law form of the compressible flow equations written in terms of primitive variables. A preconditioning matrix is added to the equations so that low Mach number flows can be solved economically. The equations are time marched using either an implicit Gauss-Seidel iterative procedure or a solver based on a conjugate gradient like method. A four color scheme is employed to vectorize the block Gauss-Seidel relaxation procedure. This increases the memory requirements minimally and decreases the computer time spent solving the resulting system of equations substantially. A factor of 7.6 speed up in the matrix solver is typical for the viscous equations. Numerical results are obtained for inviscid flow over a bump in a channel at subsonic and transonic conditions for validation with structured solvers. Viscous results are computed for developing flow in a channel, a symmetric sudden expansion, periodic tandem cylinders in a cross-flow, and a four-port valve. Comparisons are made with available results obtained by other investigators.
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE PAGES

Li, Ruipeng; Saad, Yousef

2017-08-01

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Ruipeng; Saad, Yousef

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS.

PubMed

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2011-01-01

The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied.
Sparse electrocardiogram signals recovery based on solving a row echelon-like form of system.

PubMed

Cai, Pingmei; Wang, Guinan; Yu, Shiwei; Zhang, Hongjuan; Ding, Shuxue; Wu, Zikai

2016-02-01

The study of biology and medicine in a noise environment is an evolving direction in biological data analysis. Among these studies, analysis of electrocardiogram (ECG) signals in a noise environment is a challenging direction in personalized medicine. Due to its periodic characteristic, ECG signal can be roughly regarded as sparse biomedical signals. This study proposes a two-stage recovery algorithm for sparse biomedical signals in time domain. In the first stage, the concentration subspaces are found in advance. Then by exploiting these subspaces, the mixing matrix is estimated accurately. In the second stage, based on the number of active sources at each time point, the time points are divided into different layers. Next, by constructing some transformation matrices, these time points form a row echelon-like system. After that, the sources at each layer can be solved out explicitly by corresponding matrix operations. It is noting that all these operations are conducted under a weak sparse condition that the number of active sources is less than the number of observations. Experimental results show that the proposed method has a better performance for sparse ECG signal recovery problem.
Selection of polynomial chaos bases via Bayesian model uncertainty methods with applications to sparse approximation of PDEs with stochastic inputs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karagiannis, Georgios, E-mail: georgios.karagiannis@pnnl.gov; Lin, Guang, E-mail: guang.lin@pnnl.gov

2014-02-15

Generalized polynomial chaos (gPC) expansions allow us to represent the solution of a stochastic system using a series of polynomial chaos basis functions. The number of gPC terms increases dramatically as the dimension of the random input variables increases. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs when the corresponding deterministic solver is computationally expensive, evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solutions, in both spatial and random domains, bymore » coupling Bayesian model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spatial points, via (1) the Bayesian model average (BMA) or (2) the median probability model, and their construction as spatial functions on the spatial domain via spline interpolation. The former accounts for the model uncertainty and provides Bayes-optimal predictions; while the latter provides a sparse representation of the stochastic solutions by evaluating the expansion on a subset of dominating gPC bases. Moreover, the proposed methods quantify the importance of the gPC bases in the probabilistic sense through inclusion probabilities. We design a Markov chain Monte Carlo (MCMC) sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed methods are suitable for, but not restricted to, problems whose stochastic solutions are sparse in the stochastic space with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the accuracy and performance of the proposed methods and make comparisons with other approaches on solving elliptic SPDEs with 1-, 14- and 40-random dimensions.« less
Accelerating wave propagation modeling in the frequency domain using Python

NASA Astrophysics Data System (ADS)

Jo, Sang Hoon; Park, Min Jun; Ha, Wan Soo

2017-04-01

Python is a dynamic programming language adopted in many science and engineering areas. We used Python to simulate wave propagation in the frequency domain. We used the Pardiso matrix solver to solve the impedance matrix of the wave equation. Numerical examples shows that Python with numpy consumes longer time to construct the impedance matrix using the finite element method when compared with Fortran; however we could reduce the time significantly to be comparable to that of Fortran using a simple Numba decorator.
Application of a fast Newton-Krylov solver for equilibrium simulations of phosphorus and oxygen

NASA Astrophysics Data System (ADS)

Fu, Weiwei; Primeau, François

2017-11-01

Model drift due to inadequate spinup is a serious problem that complicates the interpretation of climate change simulations. Even after a 300 year spinup we show that solutions are not only still drifting but often drifting away from their eventual equilibrium over large parts of the ocean. Here we present a Newton-Krylov solver for computing cyclostationary equilibrium solutions of a biogeochemical model for the cycling of phosphorus and oxygen. In addition to using previously developed preconditioning strategies - time-averaging and coarse-graining the Jacobian matrix - we also introduce a new strategy: the adiabatic elimination of a fast variable (particulate organic phosphorus) by slaving it to a slow variable (dissolved inorganic phosphorus). We use transport matrices derived from the Community Earth System Model (CESM) with a nominal horizontal resolution of 1° × 1° and 60 vertical levels to implement and test the solver. We find that the new solver obtains seasonally-varying equilibrium solutions with no visible drift using no more than 80 simulation years.
A physiologically motivated sparse, compact, and smooth (SCS) approach to EEG source localization.

PubMed

Cao, Cheng; Akalin Acar, Zeynep; Kreutz-Delgado, Kenneth; Makeig, Scott

2012-01-01

Here, we introduce a novel approach to the EEG inverse problem based on the assumption that principal cortical sources of multi-channel EEG recordings may be assumed to be spatially sparse, compact, and smooth (SCS). To enforce these characteristics of solutions to the EEG inverse problem, we propose a correlation-variance model which factors a cortical source space covariance matrix into the multiplication of a pre-given correlation coefficient matrix and the square root of the diagonal variance matrix learned from the data under a Bayesian learning framework. We tested the SCS method using simulated EEG data with various SNR and applied it to a real ECOG data set. We compare the results of SCS to those of an established SBL algorithm.
Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

2005-01-01

The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
Optimization of the sources in local hyperthermia using a combined finite element-genetic algorithm method.

PubMed

Siauve, N; Nicolas, L; Vollaire, C; Marchal, C

2004-12-01

This article describes an optimization process specially designed for local and regional hyperthermia in order to achieve the desired specific absorption rate in the patient. It is based on a genetic algorithm coupled to a finite element formulation. The optimization method is applied to real human organs meshes assembled from computerized tomography scans. A 3D finite element formulation is used to calculate the electromagnetic field in the patient, achieved by radiofrequency or microwave sources. Space discretization is performed using incomplete first order edge elements. The sparse complex symmetric matrix equation is solved using a conjugate gradient solver with potential projection pre-conditionning. The formulation is validated by comparison of calculated specific absorption rate distributions in a phantom to temperature measurements. A genetic algorithm is used to optimize the specific absorption rate distribution to predict the phases and amplitudes of the sources leading to the best focalization. The objective function is defined as the specific absorption rate ratio in the tumour and healthy tissues. Several constraints, regarding the specific absorption rate in tumour and the total power in the patient, may be prescribed. Results obtained with two types of applicators (waveguides and annular phased array) are presented and show the faculties of the developed optimization process.
An incremental strategy for calculating consistent discrete CFD sensitivity derivatives

NASA Technical Reports Server (NTRS)

Korivi, Vamshi Mohan; Taylor, Arthur C., III; Newman, Perry A.; Hou, Gene W.; Jones, Henry E.

1992-01-01

In this preliminary study involving advanced computational fluid dynamic (CFD) codes, an incremental formulation, also known as the 'delta' or 'correction' form, is presented for solving the very large sparse systems of linear equations which are associated with aerodynamic sensitivity analysis. For typical problems in 2D, a direct solution method can be applied to these linear equations which are associated with aerodynamic sensitivity analysis. For typical problems in 2D, a direct solution method can be applied to these linear equations in either the standard or the incremental form, in which case the two are equivalent. Iterative methods appear to be needed for future 3D applications; however, because direct solver methods require much more computer memory than is currently available. Iterative methods for solving these equations in the standard form result in certain difficulties, such as ill-conditioning of the coefficient matrix, which can be overcome when these equations are cast in the incremental form; these and other benefits are discussed. The methodology is successfully implemented and tested in 2D using an upwind, cell-centered, finite volume formulation applied to the thin-layer Navier-Stokes equations. Results are presented for two laminar sample problems: (1) transonic flow through a double-throat nozzle; and (2) flow over an isolated airfoil.
RELATIVISTIC MAGNETOHYDRODYNAMICS: RENORMALIZED EIGENVECTORS AND FULL WAVE DECOMPOSITION RIEMANN SOLVER

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anton, Luis; MartI, Jose M; Ibanez, Jose M

2010-05-01

We obtain renormalized sets of right and left eigenvectors of the flux vector Jacobians of the relativistic MHD equations, which are regular and span a complete basis in any physical state including degenerate ones. The renormalization procedure relies on the characterization of the degeneracy types in terms of the normal and tangential components of the magnetic field to the wave front in the fluid rest frame. Proper expressions of the renormalized eigenvectors in conserved variables are obtained through the corresponding matrix transformations. Our work completes previous analysis that present different sets of right eigenvectors for non-degenerate and degenerate states, andmore » can be seen as a relativistic generalization of earlier work performed in classical MHD. Based on the full wave decomposition (FWD) provided by the renormalized set of eigenvectors in conserved variables, we have also developed a linearized (Roe-type) Riemann solver. Extensive testing against one- and two-dimensional standard numerical problems allows us to conclude that our solver is very robust. When compared with a family of simpler solvers that avoid the knowledge of the full characteristic structure of the equations in the computation of the numerical fluxes, our solver turns out to be less diffusive than HLL and HLLC, and comparable in accuracy to the HLLD solver. The amount of operations needed by the FWD solver makes it less efficient computationally than those of the HLL family in one-dimensional problems. However, its relative efficiency increases in multidimensional simulations.« less
A performance study of sparse Cholesky factorization on INTEL iPSC/860

NASA Technical Reports Server (NTRS)

Zubair, M.; Ghose, M.

1992-01-01

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
Sparse Gaussian elimination with controlled fill-in on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Alaghband, Gita; Jordan, Harry F.

1989-01-01

It is shown that in sparse matrices arising from electronic circuits, it is possible to do computations on many diagonal elements simultaneously. A technique for obtaining an ordered compatible set directly from the ordered incompatible table is given. The ordering is based on the Markowitz number of the pivot candidates. This technique generates a set of compatible pivots with the property of generating few fills. A novel heuristic algorithm is presented that combines the idea of an order-compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. An elimination set for reducing the matrix is generated and selected on the basis of a minimum Markowitz sum number. The parallel pivoting technique presented is a stepwise algorithm and can be applied to any submatrix of the original matrix. Thus, it is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices using the HEP multiprocessor (Kowalik, 1985) are presented and analyzed.
Non-convex Statistical Optimization for Sparse Tensor Graphical Model

PubMed Central

Sun, Wei; Wang, Zhaoran; Liu, Han; Cheng, Guang

2016-01-01

We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. The penalized maximum likelihood estimation of this model involves minimizing a non-convex objective function. In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence as well as consistent graph recovery. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. Our theoretical results are backed by thorough numerical studies. PMID:28316459
Securing image information using double random phase encoding and parallel compressive sensing with updated sampling processes

NASA Astrophysics Data System (ADS)

Hu, Guiqiang; Xiao, Di; Wang, Yong; Xiang, Tao; Zhou, Qing

2017-11-01

Recently, a new kind of image encryption approach using compressive sensing (CS) and double random phase encoding has received much attention due to the advantages such as compressibility and robustness. However, this approach is found to be vulnerable to chosen plaintext attack (CPA) if the CS measurement matrix is re-used. Therefore, designing an efficient measurement matrix updating mechanism that ensures resistance to CPA is of practical significance. In this paper, we provide a novel solution to update the CS measurement matrix by altering the secret sparse basis with the help of counter mode operation. Particularly, the secret sparse basis is implemented by a reality-preserving fractional cosine transform matrix. Compared with the conventional CS-based cryptosystem that totally generates all the random entries of measurement matrix, our scheme owns efficiency superiority while guaranteeing resistance to CPA. Experimental and analysis results show that the proposed scheme has a good security performance and has robustness against noise and occlusion.
HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS

PubMed Central

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2012-01-01

The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied. PMID:22661790

Low-memory iterative density fitting.

PubMed

Grajciar, Lukáš

2015-07-30

A new low-memory modification of the density fitting approximation based on a combination of a continuous fast multipole method (CFMM) and a preconditioned conjugate gradient solver is presented. Iterative conjugate gradient solver uses preconditioners formed from blocks of the Coulomb metric matrix that decrease the number of iterations needed for convergence by up to one order of magnitude. The matrix-vector products needed within the iterative algorithm are calculated using CFMM, which evaluates them with the linear scaling memory requirements only. Compared with the standard density fitting implementation, up to 15-fold reduction of the memory requirements is achieved for the most efficient preconditioner at a cost of only 25% increase in computational time. The potential of the method is demonstrated by performing density functional theory calculations for zeolite fragment with 2592 atoms and 121,248 auxiliary basis functions on a single 12-core CPU workstation. © 2015 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yeung, Yu-Hong; Pothen, Alex; Halappanavar, Mahantesh

We present an augmented matrix approach to update the solution to a linear system of equations when the coefficient matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to performmore » $N-x$ contingency analysis, i.e., determine the state of the system when up to $x$ links from $N$ fail. Our algorithms augment the coefficient matrix to account for the changes in it, and then compute the solution to the augmented system without refactoring the modified matrix. We provide two algorithms, a direct method, and a hybrid direct-iterative method for solving the augmented system. We also exploit the sparsity of the matrices and vectors to accelerate the overall computation. Our algorithms are compared on three power grids with PARDISO, a parallel direct solver, and CHOLMOD, a direct solver with the ability to modify the Cholesky factors of the coefficient matrix. We show that our augmented algorithms outperform PARDISO (by two orders of magnitude), and CHOLMOD (by a factor of up to 5). Further, our algorithms scale better than CHOLMOD as the number of elements updated increases. The solutions are computed with high accuracy. Our algorithms are capable of computing $N-x$ contingency analysis on a $778K$ bus grid, updating a solution with $x=20$ elements in $$1.6 \\times 10^{-2}$$ seconds on an Intel Xeon processor.« less
Discovering mutated driver genes through a robust and sparse co-regularized matrix factorization framework with prior information from mRNA expression patterns and interaction network.

PubMed

Xi, Jianing; Wang, Minghui; Li, Ao

2018-06-05

Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions. To incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods. In summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.
Large-region acoustic source mapping using a movable array and sparse covariance fitting.

PubMed

Zhao, Shengkui; Tuna, Cagdas; Nguyen, Thi Ngoc Tho; Jones, Douglas L

2017-01-01

Large-region acoustic source mapping is important for city-scale noise monitoring. Approaches using a single-position measurement scheme to scan large regions using small arrays cannot provide clean acoustic source maps, while deploying large arrays spanning the entire region of interest is prohibitively expensive. A multiple-position measurement scheme is applied to scan large regions at multiple spatial positions using a movable array of small size. Based on the multiple-position measurement scheme, a sparse-constrained multiple-position vectorized covariance matrix fitting approach is presented. In the proposed approach, the overall sample covariance matrix of the incoherent virtual array is first estimated using the multiple-position array data and then vectorized using the Khatri-Rao (KR) product. A linear model is then constructed for fitting the vectorized covariance matrix and a sparse-constrained reconstruction algorithm is proposed for recovering source powers from the model. The user parameter settings are discussed. The proposed approach is tested on a 30 m × 40 m region and a 60 m × 40 m region using simulated and measured data. Much cleaner acoustic source maps and lower sound pressure level errors are obtained compared to the beamforming approaches and the previous sparse approach [Zhao, Tuna, Nguyen, and Jones, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (2016)].
Dynamic Textures Modeling via Joint Video Dictionary Learning.

PubMed

Wei, Xian; Li, Yuanxiang; Shen, Hao; Chen, Fang; Kleinsteuber, Martin; Wang, Zhongfeng

2017-04-06

Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DT) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
A Sparse Matrix Approach for Simultaneous Quantification of Nystagmus and Saccade

NASA Technical Reports Server (NTRS)

Kukreja, Sunil L.; Stone, Lee; Boyle, Richard D.

2012-01-01

The vestibulo-ocular reflex (VOR) consists of two intermingled non-linear subsystems; namely, nystagmus and saccade. Typically, nystagmus is analysed using a single sufficiently long signal or a concatenation of them. Saccade information is not analysed and discarded due to insufficient data length to provide consistent and minimum variance estimates. This paper presents a novel sparse matrix approach to system identification of the VOR. It allows for the simultaneous estimation of both nystagmus and saccade signals. We show via simulation of the VOR that our technique provides consistent and unbiased estimates in the presence of output additive noise.
An efficient implementation of a high-order filter for a cubed-sphere spectral element model

NASA Astrophysics Data System (ADS)

Kang, Hyun-Gyu; Cheong, Hyeong-Bin

2017-03-01

A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
On-Chip Neural Data Compression Based On Compressed Sensing With Sparse Sensing Matrices.

PubMed

Zhao, Wenfeng; Sun, Biao; Wu, Tong; Yang, Zhi

2018-02-01

On-chip neural data compression is an enabling technique for wireless neural interfaces that suffer from insufficient bandwidth and power budgets to transmit the raw data. The data compression algorithm and its implementation should be power and area efficient and functionally reliable over different datasets. Compressed sensing is an emerging technique that has been applied to compress various neurophysiological data. However, the state-of-the-art compressed sensing (CS) encoders leverage random but dense binary measurement matrices, which incur substantial implementation costs on both power and area that could offset the benefits from the reduced wireless data rate. In this paper, we propose two CS encoder designs based on sparse measurement matrices that could lead to efficient hardware implementation. Specifically, two different approaches for the construction of sparse measurement matrices, i.e., the deterministic quasi-cyclic array code (QCAC) matrix and -sparse random binary matrix [-SRBM] are exploited. We demonstrate that the proposed CS encoders lead to comparable recovery performance. And efficient VLSI architecture designs are proposed for QCAC-CS and -SRBM encoders with reduced area and total power consumption.
Exact recovery of sparse multiple measurement vectors by [Formula: see text]-minimization.

PubMed

Wang, Changlong; Peng, Jigen

2018-01-01

The joint sparse recovery problem is a generalization of the single measurement vector problem widely studied in compressed sensing. It aims to recover a set of jointly sparse vectors, i.e., those that have nonzero entries concentrated at a common location. Meanwhile [Formula: see text]-minimization subject to matrixes is widely used in a large number of algorithms designed for this problem, i.e., [Formula: see text]-minimization [Formula: see text] Therefore the main contribution in this paper is two theoretical results about this technique. The first one is proving that in every multiple system of linear equations there exists a constant [Formula: see text] such that the original unique sparse solution also can be recovered from a minimization in [Formula: see text] quasi-norm subject to matrixes whenever [Formula: see text]. The other one is showing an analytic expression of such [Formula: see text]. Finally, we display the results of one example to confirm the validity of our conclusions, and we use some numerical experiments to show that we increase the efficiency of these algorithms designed for [Formula: see text]-minimization by using our results.
Application of PDSLin to the magnetic reconnection problem

NASA Astrophysics Data System (ADS)

Yuan, Xuefei; Li, Xiaoye S.; Yamazaki, Ichitaro; Jardin, Stephen C.; Koniges, Alice E.; Keyes, David E.

2013-01-01

Magnetic reconnection is a fundamental process in a magnetized plasma at both low and high magnetic Lundquist numbers (the ratio of the resistive diffusion time to the Alfvén wave transit time), which occurs in a wide variety of laboratory and space plasmas, e.g. magnetic fusion experiments, the solar corona and the Earth's magnetotail. An implicit time advance for the two-fluid magnetic reconnection problem is known to be difficult because of the large condition number of the associated matrix. This is especially troublesome when the collisionless ion skin depth is large so that the Whistler waves, which cause the fast reconnection, dominate the physics (Yuan et al 2012 J. Comput. Phys. 231 5822-53). For small system sizes, a direct solver such as SuperLU can be employed to obtain an accurate solution as long as the condition number is bounded by the reciprocal of the floating-point machine precision. However, SuperLU scales effectively only to hundreds of processors or less. For larger system sizes, it has been shown that physics-based (Chacón and Knoll 2003 J. Comput. Phys. 188 573-92) or other preconditioners can be applied to provide adequate solver performance. In recent years, we have been developing a new algebraic hybrid linear solver, PDSLin (Parallel Domain decomposition Schur complement-based Linear solver) (Yamazaki and Li 2010 Proc. VECPAR pp 421-34 and Yamazaki et al 2011 Technical Report). In this work, we compare numerical results from a direct solver and the proposed hybrid solver for the magnetic reconnection problem and demonstrate that the new hybrid solver is scalable to thousands of processors while maintaining the same robustness as a direct solver.
BinTree Seeking: A Novel Approach to Mine Both Bi-Sparse and Cohesive Modules in Protein Interaction Networks

PubMed Central

Shen, Hong-Bin

2011-01-01

Modern science of networks has brought significant advances to our understanding of complex systems biology. As a representative model of systems biology, Protein Interaction Networks (PINs) are characterized by a remarkable modular structures, reflecting functional associations between their components. Many methods were proposed to capture cohesive modules so that there is a higher density of edges within modules than those across them. Recent studies reveal that cohesively interacting modules of proteins is not a universal organizing principle in PINs, which has opened up new avenues for revisiting functional modules in PINs. In this paper, functional clusters in PINs are found to be able to form unorthodox structures defined as bi-sparse module. In contrast to the traditional cohesive module, the nodes in the bi-sparse module are sparsely connected internally and densely connected with other bi-sparse or cohesive modules. We present a novel protocol called the BinTree Seeking (BTS) for mining both bi-sparse and cohesive modules in PINs based on Edge Density of Module (EDM) and matrix theory. BTS detects modules by depicting links and nodes rather than nodes alone and its derivation procedure is totally performed on adjacency matrix of networks. The number of modules in a PIN can be automatically determined in the proposed BTS approach. BTS is tested on three real PINs and the results demonstrate that functional modules in PINs are not dominantly cohesive but can be sparse. BTS software and the supporting information are available at: www.csbio.sjtu.edu.cn/bioinf/BTS/. PMID:22140454
A Spectral Algorithm for Envelope Reduction of Sparse Matrices

NASA Technical Reports Server (NTRS)

Barnard, Stephen T.; Pothen, Alex; Simon, Horst D.

1993-01-01

The problem of reordering a sparse symmetric matrix to reduce its envelope size is considered. A new spectral algorithm for computing an envelope-reducing reordering is obtained by associating a Laplacian matrix with the given matrix and then sorting the components of a specified eigenvector of the Laplacian. This Laplacian eigenvector solves a continuous relaxation of a discrete problem related to envelope minimization called the minimum 2-sum problem. The permutation vector computed by the spectral algorithm is a closest permutation vector to the specified Laplacian eigenvector. Numerical results show that the new reordering algorithm usually computes smaller envelope sizes than those obtained from the current standard algorithms such as Gibbs-Poole-Stockmeyer (GPS) or SPARSPAK reverse Cuthill-McKee (RCM), in some cases reducing the envelope by more than a factor of two.
High Performance Radiation Transport Simulations on TITAN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Christopher G; Davidson, Gregory G; Evans, Thomas M

2012-01-01

In this paper we describe the Denovo code system. Denovo solves the six-dimensional, steady-state, linear Boltzmann transport equation, of central importance to nuclear technology applications such as reactor core analysis (neutronics), radiation shielding, nuclear forensics and radiation detection. The code features multiple spatial differencing schemes, state-of-the-art linear solvers, the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm for inverting the transport operator, a new multilevel energy decomposition method scaling to hundreds of thousands of processing cores, and a modern, novel code architecture that supports straightforward integration of new features. In this paper we discuss the performance of Denovo on the 10--20 petaflop ORNLmore » GPU-based system, Titan. We describe algorithms and techniques used to exploit the capabilities of Titan's heterogeneous compute node architecture and the challenges of obtaining good parallel performance for this sparse hyperbolic PDE solver containing inherently sequential computations. Numerical results demonstrating Denovo performance on early Titan hardware are presented.« less
AztecOO user guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heroux, Michael Allen

2004-07-01

The Trilinos{trademark} Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. AztecOO{trademark} is a package within Trilinos that enables the use of the Aztec solver library [19] with Epetra{trademark} [13] objects. AztecOO provides access to Aztec preconditioners and solvers by implementing the Aztec 'matrix-free' interface using Epetra. While Aztec is written in C and procedure-oriented, AztecOO is written in C++ and is object-oriented. In addition to providing access to Aztec capabilities, AztecOO also provides some signficant new functionality. In particular it provides an extensible status testing capability that allows expression of sophisticatedmore » stopping criteria as is needed in production use of iterative solvers. AztecOO also provides mechanisms for using Ifpack [2], ML [20] and AztecOO itself as preconditioners.« less
On improving linear solver performance: a block variant of GMRES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, A H; Dennis, J M; Jessup, E R

2004-05-10

The increasing gap between processor performance and memory access time warrants the re-examination of data movement in iterative linear solver algorithms. For this reason, we explore and establish the feasibility of modifying a standard iterative linear solver algorithm in a manner that reduces the movement of data through memory. In particular, we present an alternative to the restarted GMRES algorithm for solving a single right-hand side linear system Ax = b based on solving the block linear system AX = B. Algorithm performance, i.e. time to solution, is improved by using the matrix A in operations on groups of vectors.more » Experimental results demonstrate the importance of implementation choices on data movement as well as the effectiveness of the new method on a variety of problems from different application areas.« less
The application of nonlinear programming and collocation to optimal aeroassisted orbital transfers

NASA Astrophysics Data System (ADS)

Shi, Y. Y.; Nelson, R. L.; Young, D. H.; Gill, P. E.; Murray, W.; Saunders, M. A.

1992-01-01

Sequential quadratic programming (SQP) and collocation of the differential equations of motion were applied to optimal aeroassisted orbital transfers. The Optimal Trajectory by Implicit Simulation (OTIS) computer program codes with updated nonlinear programming code (NZSOL) were used as a testbed for the SQP nonlinear programming (NLP) algorithms. The state-of-the-art sparse SQP method is considered to be effective for solving large problems with a sparse matrix. Sparse optimizers are characterized in terms of memory requirements and computational efficiency. For the OTIS problems, less than 10 percent of the Jacobian matrix elements are nonzero. The SQP method encompasses two phases: finding an initial feasible point by minimizing the sum of infeasibilities and minimizing the quadratic objective function within the feasible region. The orbital transfer problem under consideration involves the transfer from a high energy orbit to a low energy orbit.
Removing flicker based on sparse color correspondences in old film restoration

NASA Astrophysics Data System (ADS)

Huang, Xi; Ding, Youdong; Yu, Bing; Xia, Tianran

2018-04-01

In the long history of human civilization, archived film is an indispensable part of it, and using digital method to repair damaged film is also a mainstream trend nowadays. In this paper, we propose a sparse color correspondences based technique to remove fading flicker for old films. Our model, combined with multi frame images to establish a simple correction model, includes three key steps. Firstly, we recover sparse color correspondences in the input frames to build a matrix with many missing entries. Secondly, we present a low-rank matrix factorization approach to estimate the unknown parameters of this model. Finally, we adopt a two-step strategy that divide the estimated parameters into reference frame parameters for color recovery correction and other frame parameters for color consistency correction to remove flicker. Our method combined multi-frames takes continuity of the input sequence into account, and the experimental results show the method can remove fading flicker efficiently.
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

NASA Astrophysics Data System (ADS)

Gillis, Nicolas; Luce, Robert

2018-01-01

A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Sparse Covariance Matrix Estimation by DCA-Based Algorithms.

PubMed

Phan, Duy Nhat; Le Thi, Hoai An; Dinh, Tao Pham

2017-11-01

This letter proposes a novel approach using the [Formula: see text]-norm regularization for the sparse covariance matrix estimation (SCME) problem. The objective function of SCME problem is composed of a nonconvex part and the [Formula: see text] term, which is discontinuous and difficult to tackle. Appropriate DC (difference of convex functions) approximations of [Formula: see text]-norm are used that result in approximation SCME problems that are still nonconvex. DC programming and DCA (DC algorithm), powerful tools in nonconvex programming framework, are investigated. Two DC formulations are proposed and corresponding DCA schemes developed. Two applications of the SCME problem that are considered are classification via sparse quadratic discriminant analysis and portfolio optimization. A careful empirical experiment is performed through simulated and real data sets to study the performance of the proposed algorithms. Numerical results showed their efficiency and their superiority compared with seven state-of-the-art methods.
Automatic segmentation of right ventricular ultrasound images using sparse matrix transform and a level set

NASA Astrophysics Data System (ADS)

Qin, Xulei; Cong, Zhibin; Fei, Baowei

2013-11-01

An automatic segmentation framework is proposed to segment the right ventricle (RV) in echocardiographic images. The method can automatically segment both epicardial and endocardial boundaries from a continuous echocardiography series by combining sparse matrix transform, a training model, and a localized region-based level set. First, the sparse matrix transform extracts main motion regions of the myocardium as eigen-images by analyzing the statistical information of the images. Second, an RV training model is registered to the eigen-images in order to locate the position of the RV. Third, the training model is adjusted and then serves as an optimized initialization for the segmentation of each image. Finally, based on the initializations, a localized, region-based level set algorithm is applied to segment both epicardial and endocardial boundaries in each echocardiograph. Three evaluation methods were used to validate the performance of the segmentation framework. The Dice coefficient measures the overall agreement between the manual and automatic segmentation. The absolute distance and the Hausdorff distance between the boundaries from manual and automatic segmentation were used to measure the accuracy of the segmentation. Ultrasound images of human subjects were used for validation. For the epicardial and endocardial boundaries, the Dice coefficients were 90.8 ± 1.7% and 87.3 ± 1.9%, the absolute distances were 2.0 ± 0.42 mm and 1.79 ± 0.45 mm, and the Hausdorff distances were 6.86 ± 1.71 mm and 7.02 ± 1.17 mm, respectively. The automatic segmentation method based on a sparse matrix transform and level set can provide a useful tool for quantitative cardiac imaging.

Technical note: Avoiding the direct inversion of the numerator relationship matrix for genotyped animals in single-step genomic best linear unbiased prediction solved with the preconditioned conjugate gradient.

PubMed

Masuda, Y; Misztal, I; Legarra, A; Tsuruta, S; Lourenco, D A L; Fragomeni, B O; Aguilar, I

2017-01-01

This paper evaluates an efficient implementation to multiply the inverse of a numerator relationship matrix for genotyped animals () by a vector (). The computation is required for solving mixed model equations in single-step genomic BLUP (ssGBLUP) with the preconditioned conjugate gradient (PCG). The inverse can be decomposed into sparse matrices that are blocks of the sparse inverse of a numerator relationship matrix () including genotyped animals and their ancestors. The elements of were rapidly calculated with the Henderson's rule and stored as sparse matrices in memory. Implementation of was by a series of sparse matrix-vector multiplications. Diagonal elements of , which were required as preconditioners in PCG, were approximated with a Monte Carlo method using 1,000 samples. The efficient implementation of was compared with explicit inversion of with 3 data sets including about 15,000, 81,000, and 570,000 genotyped animals selected from populations with 213,000, 8.2 million, and 10.7 million pedigree animals, respectively. The explicit inversion required 1.8 GB, 49 GB, and 2,415 GB (estimated) of memory, respectively, and 42 s, 56 min, and 13.5 d (estimated), respectively, for the computations. The efficient implementation required <1 MB, 2.9 GB, and 2.3 GB of memory, respectively, and <1 sec, 3 min, and 5 min, respectively, for setting up. Only <1 sec was required for the multiplication in each PCG iteration for any data sets. When the equations in ssGBLUP are solved with the PCG algorithm, is no longer a limiting factor in the computations.
Compressive sensing using optimized sensing matrix for face verification

NASA Astrophysics Data System (ADS)

Oey, Endra; Jeffry; Wongso, Kelvin; Tommy

2017-12-01

Biometric appears as one of the solutions which is capable in solving problems that occurred in the usage of password in terms of data access, for example there is possibility in forgetting password and hard to recall various different passwords. With biometrics, physical characteristics of a person can be captured and used in the identification process. In this research, facial biometric is used in the verification process to determine whether the user has the authority to access the data or not. Facial biometric is chosen as its low cost implementation and generate quite accurate result for user identification. Face verification system which is adopted in this research is Compressive Sensing (CS) technique, in which aims to reduce dimension size as well as encrypt data in form of facial test image where the image is represented in sparse signals. Encrypted data can be reconstructed using Sparse Coding algorithm. Two types of Sparse Coding namely Orthogonal Matching Pursuit (OMP) and Iteratively Reweighted Least Squares -ℓp (IRLS-ℓp) will be used for comparison face verification system research. Reconstruction results of sparse signals are then used to find Euclidean norm with the sparse signal of user that has been previously saved in system to determine the validity of the facial test image. Results of system accuracy obtained in this research are 99% in IRLS with time response of face verification for 4.917 seconds and 96.33% in OMP with time response of face verification for 0.4046 seconds with non-optimized sensing matrix, while 99% in IRLS with time response of face verification for 13.4791 seconds and 98.33% for OMP with time response of face verification for 3.1571 seconds with optimized sensing matrix.
Modeling of frequency-domain scalar wave equation with the average-derivative optimal scheme based on a multigrid-preconditioned iterative solver

NASA Astrophysics Data System (ADS)

Cao, Jian; Chen, Jing-Bo; Dai, Meng-Xue

2018-01-01

An efficient finite-difference frequency-domain modeling of seismic wave propagation relies on the discrete schemes and appropriate solving methods. The average-derivative optimal scheme for the scalar wave modeling is advantageous in terms of the storage saving for the system of linear equations and the flexibility for arbitrary directional sampling intervals. However, using a LU-decomposition-based direct solver to solve its resulting system of linear equations is very costly for both memory and computational requirements. To address this issue, we consider establishing a multigrid-preconditioned BI-CGSTAB iterative solver fit for the average-derivative optimal scheme. The choice of preconditioning matrix and its corresponding multigrid components is made with the help of Fourier spectral analysis and local mode analysis, respectively, which is important for the convergence. Furthermore, we find that for the computation with unequal directional sampling interval, the anisotropic smoothing in the multigrid precondition may affect the convergence rate of this iterative solver. Successful numerical applications of this iterative solver for the homogenous and heterogeneous models in 2D and 3D are presented where the significant reduction of computer memory and the improvement of computational efficiency are demonstrated by comparison with the direct solver. In the numerical experiments, we also show that the unequal directional sampling interval will weaken the advantage of this multigrid-preconditioned iterative solver in the computing speed or, even worse, could reduce its accuracy in some cases, which implies the need for a reasonable control of directional sampling interval in the discretization.
Signal processing using sparse derivatives with applications to chromatograms and ECG

NASA Astrophysics Data System (ADS)

Ning, Xiaoran

In this thesis, we investigate the sparsity exist in the derivative domain. Particularly, we focus on the type of signals which posses up to Mth (M > 0) order sparse derivatives. Efforts are put on formulating proper penalty functions and optimization problems to capture properties related to sparse derivatives, searching for fast, computationally efficient solvers. Also the effectiveness of these algorithms are applied to two real world applications. In the first application, we provide an algorithm which jointly addresses the problems of chromatogram baseline correction and noise reduction. The series of chromatogram peaks are modeled as sparse with sparse derivatives, and the baseline is modeled as a low-pass signal. A convex optimization problem is formulated so as to encapsulate these non-parametric models. To account for the positivity of chromatogram peaks, an asymmetric penalty function is also utilized with symmetric penalty functions. A robust, computationally efficient, iterative algorithm is developed that is guaranteed to converge to the unique optimal solution. The approach, termed Baseline Estimation And Denoising with Sparsity (BEADS), is evaluated and compared with two state-of-the-art methods using both simulated and real chromatogram data. Promising result is obtained. In the second application, a novel Electrocardiography (ECG) enhancement algorithm is designed also based on sparse derivatives. In the real medical environment, ECG signals are often contaminated by various kinds of noise or artifacts, for example, morphological changes due to motion artifact, non-stationary noise due to muscular contraction (EMG), etc. Some of these contaminations severely affect the usefulness of ECG signals, especially when computer aided algorithms are utilized. By solving the proposed convex l1 optimization problem, artifacts are reduced by modeling the clean ECG signal as a sum of two signals whose second and third-order derivatives (differences) are sparse respectively. At the end, the algorithm is applied to a QRS detection system and validated using the MIT-BIH Arrhythmia database (109452 anotations), resulting a sensitivity of Se = 99.87%$ and a positive prediction of +P = 99.88%.
3D airborne EM modeling based on the spectral-element time-domain (SETD) method

NASA Astrophysics Data System (ADS)

Cao, X.; Yin, C.; Huang, X.; Liu, Y.; Zhang, B., Sr.; Cai, J.; Liu, L.

2017-12-01

In the field of 3D airborne electromagnetic (AEM) modeling, both finite-difference time-domain (FDTD) method and finite-element time-domain (FETD) method have limitations that FDTD method depends too much on the grids and time steps, while FETD requires large number of grids for complex structures. We propose a time-domain spectral-element (SETD) method based on GLL interpolation basis functions for spatial discretization and Backward Euler (BE) technique for time discretization. The spectral-element method is based on a weighted residual technique with polynomials as vector basis functions. It can contribute to an accurate result by increasing the order of polynomials and suppressing spurious solution. BE method is a stable tine discretization technique that has no limitation on time steps and can guarantee a higher accuracy during the iteration process. To minimize the non-zero number of sparse matrix and obtain a diagonal mass matrix, we apply the reduced order integral technique. A direct solver with its speed independent of the condition number is adopted for quickly solving the large-scale sparse linear equations system. To check the accuracy of our SETD algorithm, we compare our results with semi-analytical solutions for a three-layered earth model within the time lapse 10-6-10-2s for different physical meshes and SE orders. The results show that the relative errors for magnetic field B and magnetic induction are both around 3-5%. Further we calculate AEM responses for an AEM system over a 3D earth model in Figure 1. From numerical experiments for both 1D and 3D model, we draw the conclusions that: 1) SETD can deliver an accurate results for both dB/dt and B; 2) increasing SE order improves the modeling accuracy for early to middle time channels when the EM field diffuses fast so the high-order SE can model the detailed variation; 3) at very late time channels, increasing SE order has little improvement on modeling accuracy, but the time interval plays important roles. This research is supported by Key Program of National Natural Science Foundation of China (41530320), China Natural Science Foundation for Young Scientists (41404093), and Key National Research Project of China (2016YFC0303100, 2017YFC0601900). Figure 1: (a) AEM system over a 3D earth model; (b) magnetic field Bz; (c) magnetic induction dBz/dt.
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE PAGES

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai; ...

2017-06-29

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Direction of Arrival Estimation for MIMO Radar via Unitary Nuclear Norm Minimization

PubMed Central

Wang, Xianpeng; Huang, Mengxing; Wu, Xiaoqin; Bi, Guoan

2017-01-01

In this paper, we consider the direction of arrival (DOA) estimation issue of noncircular (NC) source in multiple-input multiple-output (MIMO) radar and propose a novel unitary nuclear norm minimization (UNNM) algorithm. In the proposed method, the noncircular properties of signals are used to double the virtual array aperture, and the real-valued data are obtained by utilizing unitary transformation. Then a real-valued block sparse model is established based on a novel over-complete dictionary, and a UNNM algorithm is formulated for recovering the block-sparse matrix. In addition, the real-valued NC-MUSIC spectrum is used to design a weight matrix for reweighting the nuclear norm minimization to achieve the enhanced sparsity of solutions. Finally, the DOA is estimated by searching the non-zero blocks of the recovered matrix. Because of using the noncircular properties of signals to extend the virtual array aperture and an additional real structure to suppress the noise, the proposed method provides better performance compared with the conventional sparse recovery based algorithms. Furthermore, the proposed method can handle the case of underdetermined DOA estimation. Simulation results show the effectiveness and advantages of the proposed method. PMID:28441770
Convex Banding of the Covariance Matrix

PubMed Central

Bien, Jacob; Bunea, Florentina; Xiao, Luo

2016-01-01

We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. PMID:28042189
Convex Banding of the Covariance Matrix.

PubMed

Bien, Jacob; Bunea, Florentina; Xiao, Luo

2016-01-01

We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings.
Analysis of Monte Carlo accelerated iterative methods for sparse linear systems: Analysis of Monte Carlo accelerated iterative methods for sparse linear systems

DOE PAGES

Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...

2017-03-05

Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Sparse Matrix Motivated Reconstruction of Far-Field Radiation Patterns

DTIC Science & Technology

2015-03-01

method for base - station antenna radiation patterns. IEEE Antennas Propagation Magazine. 2001;43(2):132. 4. Vasiliadis TG, Dimitriou D, Sergiadis JD...algorithm based on sparse representations of radiation patterns using the inverse Discrete Fourier Transform (DFT) and the inverse Discrete Cosine...patterns using a Model- Based Parameter Estimation (MBPE) technique that reduces the computational time required to model radiation patterns. Another
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less
An efficient sparse matrix multiplication scheme for the CYBER 205 computer

NASA Technical Reports Server (NTRS)

Lambiotte, Jules J., Jr.

1988-01-01

This paper describes the development of an efficient algorithm for computing the product of a matrix and vector on a CYBER 205 vector computer. The desire to provide software which allows the user to choose between the often conflicting goals of minimizing central processing unit (CPU) time or storage requirements has led to a diagonal-based algorithm in which one of four types of storage is selected for each diagonal. The candidate storage types employed were chosen to be efficient on the CYBER 205 for diagonals which have nonzero structure which is dense, moderately sparse, very sparse and short, or very sparse and long; however, for many densities, no diagonal type is most efficient with respect to both resource requirements, and a trade-off must be made. For each diagonal, an initialization subroutine estimates the CPU time and storage required for each storage type based on results from previously performed numerical experimentation. These requirements are adjusted by weights provided by the user which reflect the relative importance the user places on the two resources. The adjusted resource requirements are then compared to select the most efficient storage and computational scheme.
Discriminative Dictionary Learning With Two-Level Low Rank and Group Sparse Decomposition for Image Classification.

PubMed

Wen, Zaidao; Hou, Zaidao; Jiao, Licheng

2017-11-01

Discriminative dictionary learning (DDL) framework has been widely used in image classification which aims to learn some class-specific feature vectors as well as a representative dictionary according to a set of labeled training samples. However, interclass similarities and intraclass variances among input samples and learned features will generally weaken the representability of dictionary and the discrimination of feature vectors so as to degrade the classification performance. Therefore, how to explicitly represent them becomes an important issue. In this paper, we present a novel DDL framework with two-level low rank and group sparse decomposition model. In the first level, we learn a class-shared and several class-specific dictionaries, where a low rank and a group sparse regularization are, respectively, imposed on the corresponding feature matrices. In the second level, the class-specific feature matrix will be further decomposed into a low rank and a sparse matrix so that intraclass variances can be separated to concentrate the corresponding feature vectors. Extensive experimental results demonstrate the effectiveness of our model. Compared with the other state-of-the-arts on several popular image databases, our model can achieve a competitive or better performance in terms of the classification accuracy.
Nonlocal low-rank and sparse matrix decomposition for spectral CT reconstruction

NASA Astrophysics Data System (ADS)

Niu, Shanzhou; Yu, Gaohang; Ma, Jianhua; Wang, Jing

2018-02-01

Spectral computed tomography (CT) has been a promising technique in research and clinics because of its ability to produce improved energy resolution images with narrow energy bins. However, the narrow energy bin image is often affected by serious quantum noise because of the limited number of photons used in the corresponding energy bin. To address this problem, we present an iterative reconstruction method for spectral CT using nonlocal low-rank and sparse matrix decomposition (NLSMD), which exploits the self-similarity of patches that are collected in multi-energy images. Specifically, each set of patches can be decomposed into a low-rank component and a sparse component, and the low-rank component represents the stationary background over different energy bins, while the sparse component represents the rest of the different spectral features in individual energy bins. Subsequently, an effective alternating optimization algorithm was developed to minimize the associated objective function. To validate and evaluate the NLSMD method, qualitative and quantitative studies were conducted by using simulated and real spectral CT data. Experimental results show that the NLSMD method improves spectral CT images in terms of noise reduction, artifact suppression and resolution preservation.
3D frequency-domain finite-difference modeling of acoustic wave propagation

NASA Astrophysics Data System (ADS)

Operto, S.; Virieux, J.

2006-12-01

We present a 3D frequency-domain finite-difference method for acoustic wave propagation modeling. This method is developed as a tool to perform 3D frequency-domain full-waveform inversion of wide-angle seismic data. For wide-angle data, frequency-domain full-waveform inversion can be applied only to few discrete frequencies to develop reliable velocity model. Frequency-domain finite-difference (FD) modeling of wave propagation requires resolution of a huge sparse system of linear equations. If this system can be solved with a direct method, solutions for multiple sources can be computed efficiently once the underlying matrix has been factorized. The drawback of the direct method is the memory requirement resulting from the fill-in of the matrix during factorization. We assess in this study whether representative problems can be addressed in 3D geometry with such approach. We start from the velocity-stress formulation of the 3D acoustic wave equation. The spatial derivatives are discretized with second-order accurate staggered-grid stencil on different coordinate systems such that the axis span over as many directions as possible. Once the discrete equations were developed on each coordinate system, the particle velocity fields are eliminated from the first-order hyperbolic system (following the so-called parsimonious staggered-grid method) leading to second-order elliptic wave equations in pressure. The second-order wave equations discretized on each coordinate system are combined linearly to mitigate the numerical anisotropy. Secondly, grid dispersion is minimized by replacing the mass term at the collocation point by its weighted averaging over all the grid points of the stencil. Use of second-order accurate staggered- grid stencil allows to reduce the bandwidth of the matrix to be factorized. The final stencil incorporates 27 points. Absorbing conditions are PML. The system is solved using the parallel direct solver MUMPS developed for distributed-memory computers. The MUMPS solver is based on a multifrontal method for LU factorization. We used the METIS algorithm to perform re-ordering of the matrix coefficients before factorization. Four grid points per minimum wavelength is used for discretization. We applied our algorithm to the 3D SEG/EAGE synthetic onshore OVERTHRUST model of dimensions 20 x 20 x 4.65 km. The velocities range between 2 and 6 km/s. We performed the simulations using 192 processors with 2 Gbytes of RAM memory per processor. We performed simulations for the 5 Hz, 7 Hz and 10 Hz frequencies in some fractions of the OVERTHRUST model. The grid interval was 100 m, 75 m and 50 m respectively. The grid dimensions were 207x207x53, 275x218x71 and 409x109x102 respectively corresponding to 100, 80 and 25 percents of the model respectively. The time for factorization is 20 mn, 108 mn and 163 mn respectively. The time for resolution was 3.8, 9.3 and 10.3 s per source. The total memory used during factorization is 143, 384 and 449 Gbytes respectively. One can note the huge memory requirement for factorization and the efficiency of the direct method to compute solutions for a large number of sources. This highlights the respective drawback and merit of the frequency-domain approach with respect to the time- domain counterpart. These results show that 3D acoustic frequency-domain wave propagation modeling can be performed at low frequencies using direct solver on large clusters of Pcs. This forward modeling algorithm may be used in the future as a tool to image the first kilometers of the crust by frequency-domain full-waveform inversion. For larger problems, we will use the out-of-core memory during factorization that has been implemented by the authors of MUMPS.
Architecting the Finite Element Method Pipeline for the GPU.

PubMed

Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T

2014-02-01

The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.
Comprehensive Thematic T-Matrix Reference Database: A 2014-2015 Update

NASA Technical Reports Server (NTRS)

Mishchenko, Michael I.; Zakharova, Nadezhda; Khlebtsov, Nikolai G.; Videen, Gorden; Wriedt, Thomas

2015-01-01

The T-matrix method is one of the most versatile and efficient direct computer solvers of the macroscopic Maxwell equations and is widely used for the computation of electromagnetic scattering by single and composite particles, discrete random media, and particles in the vicinity of an interface separating two half-spaces with different refractive indices. This paper is the seventh update to the comprehensive thematic database of peer-reviewed T-matrix publications initiated by us in 2004 and includes relevant publications that have appeared since 2013. It also lists a number of earlier publications overlooked previously.
NAS Experiences of Porting CM Fortran Codes to HPF on IBM SP2 and SGI Power Challenge

NASA Technical Reports Server (NTRS)

Saini, Subhash

1995-01-01

Current Connection Machine (CM) Fortran codes developed for the CM-2 and the CM-5 represent an important class of parallel applications. Several users have employed CM Fortran codes in production mode on the CM-2 and the CM-5 for the last five to six years, constituting a heavy investment in terms of cost and time. With Thinking Machines Corporation's decision to withdraw from the hardware business and with the decommissioning of many CM-2 and CM-5 machines, the best way to protect the substantial investment in CM Fortran codes is to port the codes to High Performance Fortran (HPF) on highly parallel systems. HPF is very similar to CM Fortran and thus represents a natural transition. Conversion issues involved in porting CM Fortran codes on the CM-5 to HPF are presented. In particular, the differences between data distribution directives and the CM Fortran Utility Routines Library, as well as the equivalent functionality in the HPF Library are discussed. Several CM Fortran codes (Cannon algorithm for matrix-matrix multiplication, Linear solver Ax=b, 1-D convolution for 2-D datasets, Laplace's Equation solver, and Direct Simulation Monte Carlo (DSMC) codes have been ported to Subset HPF on the IBM SP2 and the SGI Power Challenge. Speedup ratios versus number of processors for the Linear solver and DSMC code are presented.

MILAMIN 2 - Fast MATLAB FEM solver

NASA Astrophysics Data System (ADS)

Dabrowski, Marcin; Krotkiewski, Marcin; Schmid, Daniel W.

2013-04-01

MILAMIN is a free and efficient MATLAB-based two-dimensional FEM solver utilizing unstructured meshes [Dabrowski et al., G-cubed (2008)]. The code consists of steady-state thermal diffusion and incompressible Stokes flow solvers implemented in approximately 200 lines of native MATLAB code. The brevity makes the code easily customizable. An important quality of MILAMIN is speed - it can handle millions of nodes within minutes on one CPU core of a standard desktop computer, and is faster than many commercial solutions. The new MILAMIN 2 allows three-dimensional modeling. It is designed as a set of functional modules that can be used as building blocks for efficient FEM simulations using MATLAB. The utilities are largely implemented as native MATLAB functions. For performance critical parts we use MUTILS - a suite of compiled MEX functions optimized for shared memory multi-core computers. The most important features of MILAMIN 2 are: 1. Modular approach to defining, tracking, and discretizing the geometry of the model 2. Interfaces to external mesh generators (e.g., Triangle, Fade2d, T3D) and mesh utilities (e.g., element type conversion, fast point location, boundary extraction) 3. Efficient computation of the stiffness matrix for a wide range of element types, anisotropic materials and three-dimensional problems 4. Fast global matrix assembly using a dedicated MEX function 5. Automatic integration rules 6. Flexible prescription (spatial, temporal, and field functions) and efficient application of Dirichlet, Neuman, and periodic boundary conditions 7. Treatment of transient and non-linear problems 8. Various iterative and multi-level solution strategies 9. Post-processing tools (e.g., numerical integration) 10. Visualization primitives using MATLAB, and VTK export functions We provide a large number of examples that show how to implement a custom FEM solver using the MILAMIN 2 framework. The examples are MATLAB scripts of increasing complexity that address a given technical topic (e.g., creating meshes, reordering nodes, applying boundary conditions), a given numerical topic (e.g., using various solution strategies, non-linear iterations), or that present a fully-developed solver designed to address a scientific topic (e.g., performing Stokes flow simulations in synthetic porous medium). References: Dabrowski, M., M. Krotkiewski, and D. W. Schmid MILAMIN: MATLAB-based finite element method solver for large problems, Geochem. Geophys. Geosyst., 9, Q04030, 2008
Comparison of an algebraic multigrid algorithm to two iterative solvers used for modeling ground water flow and transport

USGS Publications Warehouse

Detwiler, R.L.; Mehl, S.; Rajaram, H.; Cheung, W.W.

2002-01-01

Numerical solution of large-scale ground water flow and transport problems is often constrained by the convergence behavior of the iterative solvers used to solve the resulting systems of equations. We demonstrate the ability of an algebraic multigrid algorithm (AMG) to efficiently solve the large, sparse systems of equations that result from computational models of ground water flow and transport in large and complex domains. Unlike geometric multigrid methods, this algorithm is applicable to problems in complex flow geometries, such as those encountered in pore-scale modeling of two-phase flow and transport. We integrated AMG into MODFLOW 2000 to compare two- and three-dimensional flow simulations using AMG to simulations using PCG2, a preconditioned conjugate gradient solver that uses the modified incomplete Cholesky preconditioner and is included with MODFLOW 2000. CPU times required for convergence with AMG were up to 140 times faster than those for PCG2. The cost of this increased speed was up to a nine-fold increase in required random access memory (RAM) for the three-dimensional problems and up to a four-fold increase in required RAM for the two-dimensional problems. We also compared two-dimensional numerical simulations of steady-state transport using AMG and the generalized minimum residual method with an incomplete LU-decomposition preconditioner. For these transport simulations, AMG yielded increased speeds of up to 17 times with only a 20% increase in required RAM. The ability of AMG to solve flow and transport problems in large, complex flow systems and its ready availability make it an ideal solver for use in both field-scale and pore-scale modeling.
Solution of large nonlinear quasistatic structural mechanics problems on distributed-memory multiprocessor computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanford, M.

1997-12-31

Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less
A survey of packages for large linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Kesheng; Milne, Brent

2000-02-11

This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
Algorithms for solving large sparse systems of simultaneous linear equations on vector processors

NASA Technical Reports Server (NTRS)

David, R. E.

1984-01-01

Very efficient algorithms for solving large sparse systems of simultaneous linear equations have been developed for serial processing computers. These involve a reordering of matrix rows and columns in order to obtain a near triangular pattern of nonzero elements. Then an LU factorization is developed to represent the matrix inverse in terms of a sequence of elementary Gaussian eliminations, or pivots. In this paper it is shown how these algorithms are adapted for efficient implementation on vector processors. Results obtained on the CYBER 200 Model 205 are presented for a series of large test problems which show the comparative advantages of the triangularization and vector processing algorithms.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems

NASA Astrophysics Data System (ADS)

Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel

Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
Sparse distributed memory and related models

NASA Technical Reports Server (NTRS)

Kanerva, Pentti

1992-01-01

Described here is sparse distributed memory (SDM) as a neural-net associative memory. It is characterized by two weight matrices and by a large internal dimension - the number of hidden units is much larger than the number of input or output units. The first matrix, A, is fixed and possibly random, and the second matrix, C, is modifiable. The SDM is compared and contrasted to (1) computer memory, (2) correlation-matrix memory, (3) feet-forward artificial neural network, (4) cortex of the cerebellum, (5) Marr and Albus models of the cerebellum, and (6) Albus' cerebellar model arithmetic computer (CMAC). Several variations of the basic SDM design are discussed: the selected-coordinate and hyperplane designs of Jaeckel, the pseudorandom associative neural memory of Hassoun, and SDM with real-valued input variables by Prager and Fallside. SDM research conducted mainly at the Research Institute for Advanced Computer Science (RIACS) in 1986-1991 is highlighted.
Newmark-Beta-FDTD method for super-resolution analysis of time reversal waves

NASA Astrophysics Data System (ADS)

Shi, Sheng-Bing; Shao, Wei; Ma, Jing; Jin, Congjun; Wang, Xiao-Hua

2017-09-01

In this work, a new unconditionally stable finite-difference time-domain (FDTD) method with the split-field perfectly matched layer (PML) is proposed for the analysis of time reversal (TR) waves. The proposed method is very suitable for multiscale problems involving microstructures. The spatial and temporal derivatives in this method are discretized by the central difference technique and Newmark-Beta algorithm, respectively, and the derivation results in the calculation of a banded-sparse matrix equation. Since the coefficient matrix keeps unchanged during the whole simulation process, the lower-upper (LU) decomposition of the matrix needs to be performed only once at the beginning of the calculation. Moreover, the reverse Cuthill-Mckee (RCM) technique, an effective preprocessing technique in bandwidth compression of sparse matrices, is used to improve computational efficiency. The super-resolution focusing of TR wave propagation in two- and three-dimensional spaces is included to validate the accuracy and efficiency of the proposed method.
Blind compressed sensing image reconstruction based on alternating direction method

NASA Astrophysics Data System (ADS)

Liu, Qinan; Guo, Shuxu

2018-04-01

In order to solve the problem of how to reconstruct the original image under the condition of unknown sparse basis, this paper proposes an image reconstruction method based on blind compressed sensing model. In this model, the image signal is regarded as the product of a sparse coefficient matrix and a dictionary matrix. Based on the existing blind compressed sensing theory, the optimal solution is solved by the alternative minimization method. The proposed method solves the problem that the sparse basis in compressed sensing is difficult to represent, which restrains the noise and improves the quality of reconstructed image. This method ensures that the blind compressed sensing theory has a unique solution and can recover the reconstructed original image signal from a complex environment with a stronger self-adaptability. The experimental results show that the image reconstruction algorithm based on blind compressed sensing proposed in this paper can recover high quality image signals under the condition of under-sampling.
Generalised Assignment Matrix Methodology in Linear Programming

ERIC Educational Resources Information Center

Jerome, Lawrence

2012-01-01

Discrete Mathematics instructors and students have long been struggling with various labelling and scanning algorithms for solving many important problems. This paper shows how to solve a wide variety of Discrete Mathematics and OR problems using assignment matrices and linear programming, specifically using Excel Solvers although the same…
A Flexible CUDA LU-based Solver for Small, Batched Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Gawande, Nitin A.; Villa, Oreste

This chapter presents the implementation of a batched CUDA solver based on LU factorization for small linear systems. This solver may be used in applications such as reactive flow transport models, which apply the Newton-Raphson technique to linearize and iteratively solve the sets of non linear equations that represent the reactions for ten of thousands to millions of physical locations. The implementation exploits somewhat counterintuitive GPGPU programming techniques: it assigns the solution of a matrix (representing a system) to a single CUDA thread, does not exploit shared memory and employs dynamic memory allocation on the GPUs. These techniques enable ourmore » implementation to simultaneously solve sets of systems with over 100 equations and to employ LU decomposition with complete pivoting, providing the higher numerical accuracy required by certain applications. Other currently available solutions for batched linear solvers are limited by size and only support partial pivoting, although they may result faster in certain conditions. We discuss the code of our implementation and present a comparison with the other implementations, discussing the various tradeoffs in terms of performance and flexibility. This work will enable developers that need batched linear solvers to choose whichever implementation is more appropriate to the features and the requirements of their applications, and even to implement dynamic switching approaches that can choose the best implementation depending on the input data.« less
Communication requirements of sparse Cholesky factorization with nested dissection ordering

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Load distribution schemes for minimizing the communication requirements of the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems are presented. The total data traffic in factoring an n x n sparse symmetric positive definite matrix representing an n-vertex regular two-dimensional grid graph using n exp alpha, alpha not greater than 1, processors are shown to be O(n exp 1 + alpha/2). It is O(n), when n exp alpha, alpha not smaller than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal.
Enhanced Ozone Production at Low Temperatures due to Ethanol (E85)

NASA Astrophysics Data System (ADS)

Ginnebaugh, D. L.; Livingstone, P. L.; Jacobson, M. Z.

2009-12-01

The increased use of ethanol in transportation fuels warrants an investigation of its consequences. An important component of such an investigation is the temperature-dependence of ethanol and gasoline exhaust chemistry. We use the near-explicit Master Chemical Mechanism (MCM, version 3.1, LEEDS University) with the SMVGEAR II chemical ordinary differential solver to provide the speed necessary to simulate explicit chemistry to examine such effects. The MCM has over 13,500 organic reactions and 4,600 species. SMVGEAR II is a sparse-matrix Gear solver that reduces the computation time significantly while maintaining any specified accuracy. Although for this study we use a box model, we determined that the speed of the MCM with the SMVGEAR solver will allow the MCM to be modeled in 3-dimensions. We also verified the accuracy of the model with comparisons to smog chamber data. We use species-resolved tailpipe emissions data for E85 (15% gasoline, 85% ethanol fuel blend) and gasoline vehicles to compare the impact of each on ozone and carcinogenic organic gases as a function of ambient temperature and background concentrations, using Los Angeles in 2020 as a base case. We use two different emissions sets - one is a compilation of data taken at near 24 C and the other from data taken at -7 C - to determine how atmospheric chemistry and emissions are affected by temperature. We include diurnal effects by examining 2 day and 5 day scenarios. We find that for both emission data sets, the average ozone concentrations through the range of temperatures tested are higher with E85 than with gasoline by 8 parts per billion volume (ppbv) at higher temperatures to 55 ppbv at low temperatures and low sunlight (winter conditions) for an area with a high nitrogen oxides (NOx) to non-methane organic gases (NMOG) ratio. The results suggest that E85's effect on health through ozone formation becomes increasingly more significant relative to gasoline as temperatures decreased due to the change in emission composition at lower temperature. This could have implications for the wintertime use of E85. Some carcinogenic species increase while others decrease when using E85 instead of gasoline, implying that the cancer risk is approximately the same for warmer temperatures but may be slightly higher for E85 for cold temperatures. Peroxy acetyl nitrate (PAN), another air pollutant of concern, increases with E85 by 0.4 to 20 ppbv. The sensitivity of the results to background emissions, NOx emissions, and water vapor was also examined.
Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
An overview of NSPCG: A nonsymmetric preconditioned conjugate gradient package

NASA Astrophysics Data System (ADS)

Oppe, Thomas C.; Joubert, Wayne D.; Kincaid, David R.

1989-05-01

The most recent research-oriented software package developed as part of the ITPACK Project is called "NSPCG" since it contains many nonsymmetric preconditioned conjugate gradient procedures. It is designed to solve large sparse systems of linear algebraic equations by a variety of different iterative methods. One of the main purposes for the development of the package is to provide a common modular structure for research on iterative methods for nonsymmetric matrices. Another purpose for the development of the package is to investigate the suitability of several iterative methods for vector computers. Since the vectorizability of an iterative method depends greatly on the matrix structure, NSPCG allows great flexibility in the operator representation. The coefficient matrix can be passed in one of several different matrix data storage schemes. These sparse data formats allow matrices with a wide range of structures from highly structured ones such as those with all nonzeros along a relatively small number of diagonals to completely unstructured sparse matrices. Alternatively, the package allows the user to call the accelerators directly with user-supplied routines for performing certain matrix operations. In this case, one can use the data format from an application program and not be required to copy the matrix into one of the package formats. This is particularly advantageous when memory space is limited. Some of the basic preconditioners that are available are point methods such as Jacobi, Incomplete LU Decomposition and Symmetric Successive Overrelaxation as well as block and multicolor preconditioners. The user can select from a large collection of accelerators such as Conjugate Gradient (CG), Chebyshev (SI, for semi-iterative), Generalized Minimal Residual (GMRES), Biconjugate Gradient Squared (BCGS) and many others. The package is modular so that almost any accelerator can be used with almost any preconditioner.
Color normalization of histology slides using graph regularized sparse NMF

NASA Astrophysics Data System (ADS)

Sha, Lingdao; Schonfeld, Dan; Sethi, Amit

2017-03-01

Computer based automatic medical image processing and quantification are becoming popular in digital pathology. However, preparation of histology slides can vary widely due to differences in staining equipment, procedures and reagents, which can reduce the accuracy of algorithms that analyze their color and texture information. To re- duce the unwanted color variations, various supervised and unsupervised color normalization methods have been proposed. Compared with supervised color normalization methods, unsupervised color normalization methods have advantages of time and cost efficient and universal applicability. Most of the unsupervised color normaliza- tion methods for histology are based on stain separation. Based on the fact that stain concentration cannot be negative and different parts of the tissue absorb different stains, nonnegative matrix factorization (NMF), and particular its sparse version (SNMF), are good candidates for stain separation. However, most of the existing unsupervised color normalization method like PCA, ICA, NMF and SNMF fail to consider important information about sparse manifolds that its pixels occupy, which could potentially result in loss of texture information during color normalization. Manifold learning methods like Graph Laplacian have proven to be very effective in interpreting high-dimensional data. In this paper, we propose a novel unsupervised stain separation method called graph regularized sparse nonnegative matrix factorization (GSNMF). By considering the sparse prior of stain concentration together with manifold information from high-dimensional image data, our method shows better performance in stain color deconvolution than existing unsupervised color deconvolution methods, especially in keeping connected texture information. To utilized the texture information, we construct a nearest neighbor graph between pixels within a spatial area of an image based on their distances using heat kernal in lαβ space. The representation of a pixel in the stain density space is constrained to follow the feature distance of the pixel to pixels in the neighborhood graph. Utilizing color matrix transfer method with the stain concentrations found using our GSNMF method, the color normalization performance was also better than existing methods.
Efficient parallel linear scaling construction of the density matrix for Born-Oppenheimer molecular dynamics.

PubMed

Mniszewski, S M; Cawkwell, M J; Wall, M E; Mohd-Yusof, J; Bock, N; Germann, T C; Niklasson, A M N

2015-10-13

We present an algorithm for the calculation of the density matrix that for insulators scales linearly with system size and parallelizes efficiently on multicore, shared memory platforms with small and controllable numerical errors. The algorithm is based on an implementation of the second-order spectral projection (SP2) algorithm [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] in sparse matrix algebra with the ELLPACK-R data format. We illustrate the performance of the algorithm within self-consistent tight binding theory by total energy calculations of gas phase poly(ethylene) molecules and periodic liquid water systems containing up to 15,000 atoms on up to 16 CPU cores. We consider algorithm-specific performance aspects, such as local vs nonlocal memory access and the degree of matrix sparsity. Comparisons to sparse matrix algebra implementations using off-the-shelf libraries on multicore CPUs, graphics processing units (GPUs), and the Intel many integrated core (MIC) architecture are also presented. The accuracy and stability of the algorithm are illustrated with long duration Born-Oppenheimer molecular dynamics simulations of 1000 water molecules and a 303 atom Trp cage protein solvated by 2682 water molecules.
spammpack, Version 2013-06-18

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-01-17

This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler

Bit error rate tester using fast parallel generation of linear recurring sequences

DOEpatents

Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.

2003-05-06

A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Tensor Sparse Coding for Positive Definite Matrices.

PubMed

Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikos

2013-08-02

In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for e.g., image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.
Tensor sparse coding for positive definite matrices.

PubMed

Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikolaos

2014-03-01

In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for example, image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.
A differentiable reformulation for E-optimal design of experiments in nonlinear dynamic biosystems.

PubMed

Telen, Dries; Van Riet, Nick; Logist, Flip; Van Impe, Jan

2015-06-01

Informative experiments are highly valuable for estimating parameters in nonlinear dynamic bioprocesses. Techniques for optimal experiment design ensure the systematic design of such informative experiments. The E-criterion which can be used as objective function in optimal experiment design requires the maximization of the smallest eigenvalue of the Fisher information matrix. However, one problem with the minimal eigenvalue function is that it can be nondifferentiable. In addition, no closed form expression exists for the computation of eigenvalues of a matrix larger than a 4 by 4 one. As eigenvalues are normally computed with iterative methods, state-of-the-art optimal control solvers are not able to exploit automatic differentiation to compute the derivatives with respect to the decision variables. In the current paper a reformulation strategy from the field of convex optimization is suggested to circumvent these difficulties. This reformulation requires the inclusion of a matrix inequality constraint involving positive semidefiniteness. In this paper, this positive semidefiniteness constraint is imposed via Sylverster's criterion. As a result the maximization of the minimum eigenvalue function can be formulated in standard optimal control solvers through the addition of nonlinear constraints. The presented methodology is successfully illustrated with a case study from the field of predictive microbiology. Copyright © 2015. Published by Elsevier Inc.
Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.

In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less
SPAR reference manual

NASA Technical Reports Server (NTRS)

Whetstone, W. D.

1976-01-01

The functions and operating rules of the SPAR system, which is a group of computer programs used primarily to perform stress, buckling, and vibrational analyses of linear finite element systems, were given. The following subject areas were discussed: basic information, structure definition, format system matrix processors, utility programs, static solutions, stresses, sparse matrix eigensolver, dynamic response, graphics, and substructure processors.
Matrix Recipes for Hard Thresholding Methods

DTIC Science & Technology

2012-11-07

have been proposed to approximate the solution. In [11], Donoho et al . demonstrate that, in the sparse approximation problem, under basic incoherence...inducing convex surrogate ‖ · ‖1 with provable guarantees for unique signal recovery. In the ARM problem, Fazel et al . [12] identified the nuclear norm...sparse recovery for all. Technical report, EPFL, 2011 . [25] N. Halko , P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic
Parallel-vector unsymmetric Eigen-Solver on high performance computers

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Jiangning, Qin

1993-01-01

The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
Comprehensive Thematic T-Matrix Reference Database: A 2015-2017 Update

NASA Technical Reports Server (NTRS)

Mishchenko, Michael I.; Zakharova, Nadezhda; Khlebtsov, Nikolai G.; Videen, Gorden; Wriedt, Thomas

2017-01-01

The T-matrix method pioneered by Peter C. Waterman is one of the most versatile and efficient numerically exact computer solvers of the time-harmonic macroscopic Maxwell equations. It is widely used for the computation of electromagnetic scattering by single and composite particles, discrete random media, periodic structures (including metamaterials), and particles in the vicinity of plane or rough interfaces separating media with different refractive indices. This paper is the eighth update to the comprehensive thematic database of peer-reviewed T-matrix publications initiated in 2004 and lists relevant publications that have appeared since 2015. It also references a small number of earlier publications overlooked previously.
Comprehensive thematic T-matrix reference database: A 2015-2017 update

NASA Astrophysics Data System (ADS)

Mishchenko, Michael I.; Zakharova, Nadezhda T.; Khlebtsov, Nikolai G.; Videen, Gorden; Wriedt, Thomas

2017-11-01

The T-matrix method pioneered by Peter C. Waterman is one of the most versatile and efficient numerically exact computer solvers of the time-harmonic macroscopic Maxwell equations. It is widely used for the computation of electromagnetic scattering by single and composite particles, discrete random media, periodic structures (including metamaterials), and particles in the vicinity of plane or rough interfaces separating media with different refractive indices. This paper is the eighth update to the comprehensive thematic database of peer-reviewed T-matrix publications initiated in 2004 and lists relevant publications that have appeared since 2015. It also references a small number of earlier publications overlooked previously.
An efficient classification method based on principal component and sparse representation.

PubMed

Zhai, Lin; Fu, Shujun; Zhang, Caiming; Liu, Yunxian; Wang, Lu; Liu, Guohua; Yang, Mingqiang

2016-01-01

As an important application in optical imaging, palmprint recognition is interfered by many unfavorable factors. An effective fusion of blockwise bi-directional two-dimensional principal component analysis and grouping sparse classification is presented. The dimension reduction and normalizing are implemented by the blockwise bi-directional two-dimensional principal component analysis for palmprint images to extract feature matrixes, which are assembled into an overcomplete dictionary in sparse classification. A subspace orthogonal matching pursuit algorithm is designed to solve the grouping sparse representation. Finally, the classification result is gained by comparing the residual between testing and reconstructed images. Experiments are carried out on a palmprint database, and the results show that this method has better robustness against position and illumination changes of palmprint images, and can get higher rate of palmprint recognition.
[Formula: see text]-regularized recursive total least squares based sparse system identification for the error-in-variables.

PubMed

Lim, Jun-Seok; Pang, Hee-Suk

2016-01-01

In this paper an [Formula: see text]-regularized recursive total least squares (RTLS) algorithm is considered for the sparse system identification. Although recursive least squares (RLS) has been successfully applied in sparse system identification, the estimation performance in RLS based algorithms becomes worse, when both input and output are contaminated by noise (the error-in-variables problem). We proposed an algorithm to handle the error-in-variables problem. The proposed [Formula: see text]-RTLS algorithm is an RLS like iteration using the [Formula: see text] regularization. The proposed algorithm not only gives excellent performance but also reduces the required complexity through the effective inversion matrix handling. Simulations demonstrate the superiority of the proposed [Formula: see text]-regularized RTLS for the sparse system identification setting.
Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer.

PubMed

Okimoto, Gordon; Zeinalzadeh, Ashkan; Wenska, Tom; Loomis, Michael; Nation, James B; Fabre, Tiphaine; Tiirikainen, Maarit; Hernandez, Brenda; Chan, Owen; Wong, Linda; Kwee, Sandi

2016-01-01

Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of "sparse" left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single "sparsity" parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on "residual" data matrices that result from a given sparse approximation. We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired.
Signal-Preserving Erratic Noise Attenuation via Iterative Robust Sparsity-Promoting Filter

DOE PAGES

Zhao, Qiang; Du, Qizhen; Gong, Xufei; ...

2018-04-06

Sparse domain thresholding filters operating in a sparse domain are highly effective in removing Gaussian random noise under Gaussian distribution assumption. Erratic noise, which designates non-Gaussian noise that consists of large isolated events with known or unknown distribution, also needs to be explicitly taken into account. However, conventional sparse domain thresholding filters based on the least-squares (LS) criterion are severely sensitive to data with high-amplitude and non-Gaussian noise, i.e., the erratic noise, which makes the suppression of this type of noise extremely challenging. Here, in this paper, we present a robust sparsity-promoting denoising model, in which the LS criterion ismore » replaced by the Huber criterion to weaken the effects of erratic noise. The random and erratic noise is distinguished by using a data-adaptive parameter in the presented method, where random noise is described by mean square, while the erratic noise is downweighted through a damped weight. Different from conventional sparse domain thresholding filters, definition of the misfit between noisy data and recovered signal via the Huber criterion results in a nonlinear optimization problem. With the help of theoretical pseudoseismic data, an iterative robust sparsity-promoting filter is proposed to transform the nonlinear optimization problem into a linear LS problem through an iterative procedure. The main advantage of this transformation is that the nonlinear denoising filter can be solved by conventional LS solvers. Lastly, tests with several data sets demonstrate that the proposed denoising filter can successfully attenuate the erratic noise without damaging useful signal when compared with conventional denoising approaches based on the LS criterion.« less
Signal-Preserving Erratic Noise Attenuation via Iterative Robust Sparsity-Promoting Filter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Qiang; Du, Qizhen; Gong, Xufei

Sparse domain thresholding filters operating in a sparse domain are highly effective in removing Gaussian random noise under Gaussian distribution assumption. Erratic noise, which designates non-Gaussian noise that consists of large isolated events with known or unknown distribution, also needs to be explicitly taken into account. However, conventional sparse domain thresholding filters based on the least-squares (LS) criterion are severely sensitive to data with high-amplitude and non-Gaussian noise, i.e., the erratic noise, which makes the suppression of this type of noise extremely challenging. Here, in this paper, we present a robust sparsity-promoting denoising model, in which the LS criterion ismore » replaced by the Huber criterion to weaken the effects of erratic noise. The random and erratic noise is distinguished by using a data-adaptive parameter in the presented method, where random noise is described by mean square, while the erratic noise is downweighted through a damped weight. Different from conventional sparse domain thresholding filters, definition of the misfit between noisy data and recovered signal via the Huber criterion results in a nonlinear optimization problem. With the help of theoretical pseudoseismic data, an iterative robust sparsity-promoting filter is proposed to transform the nonlinear optimization problem into a linear LS problem through an iterative procedure. The main advantage of this transformation is that the nonlinear denoising filter can be solved by conventional LS solvers. Lastly, tests with several data sets demonstrate that the proposed denoising filter can successfully attenuate the erratic noise without damaging useful signal when compared with conventional denoising approaches based on the LS criterion.« less
Adaptive OFDM Waveform Design for Spatio-Temporal-Sparsity Exploited STAP Radar

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sen, Satyabrata

In this chapter, we describe a sparsity-based space-time adaptive processing (STAP) algorithm to detect a slowly moving target using an orthogonal frequency division multiplexing (OFDM) radar. The motivation of employing an OFDM signal is that it improves the target-detectability from the interfering signals by increasing the frequency diversity of the system. However, due to the addition of one extra dimension in terms of frequency, the adaptive degrees-of-freedom in an OFDM-STAP also increases. Therefore, to avoid the construction a fully adaptive OFDM-STAP, we develop a sparsity-based STAP algorithm. We observe that the interference spectrum is inherently sparse in the spatio-temporal domain,more » as the clutter responses occupy only a diagonal ridge on the spatio-temporal plane and the jammer signals interfere only from a few spatial directions. Hence, we exploit that sparsity to develop an efficient STAP technique that utilizes considerably lesser number of secondary data compared to the other existing STAP techniques, and produces nearly optimum STAP performance. In addition to designing the STAP filter, we optimally design the transmit OFDM signals by maximizing the output signal-to-interference-plus-noise ratio (SINR) in order to improve the STAP performance. The computation of output SINR depends on the estimated value of the interference covariance matrix, which we obtain by applying the sparse recovery algorithm. Therefore, we analytically assess the effects of the synthesized OFDM coefficients on the sparse recovery of the interference covariance matrix by computing the coherence measure of the sparse measurement matrix. Our numerical examples demonstrate the achieved STAP-performance due to sparsity-based technique and adaptive waveform design.« less
NoGOA: predicting noisy GO annotations using evidences and sparse representation.

PubMed

Yu, Guoxian; Lu, Chang; Wang, Jun

2017-07-21

Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
AZTEC. Parallel Iterative method Software for Solving Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.; Shadid, J.; Tuminaro, R.

1995-07-01

AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow

NASA Astrophysics Data System (ADS)

Krank, Benjamin; Fehn, Niklas; Wall, Wolfgang A.; Kronbichler, Martin

2017-11-01

We present an efficient discontinuous Galerkin scheme for simulation of the incompressible Navier-Stokes equations including laminar and turbulent flow. We consider a semi-explicit high-order velocity-correction method for time integration as well as nodal equal-order discretizations for velocity and pressure. The non-linear convective term is treated explicitly while a linear system is solved for the pressure Poisson equation and the viscous term. The key feature of our solver is a consistent penalty term reducing the local divergence error in order to overcome recently reported instabilities in spatially under-resolved high-Reynolds-number flows as well as small time steps. This penalty method is similar to the grad-div stabilization widely used in continuous finite elements. We further review and compare our method to several other techniques recently proposed in literature to stabilize the method for such flow configurations. The solver is specifically designed for large-scale computations through matrix-free linear solvers including efficient preconditioning strategies and tensor-product elements, which have allowed us to scale this code up to 34.4 billion degrees of freedom and 147,456 CPU cores. We validate our code and demonstrate optimal convergence rates with laminar flows present in a vortex problem and flow past a cylinder and show applicability of our solver to direct numerical simulation as well as implicit large-eddy simulation of turbulent channel flow at Reτ = 180 as well as 590.
Silencer! A Tool for Substrate Noise Coupling Analysis

DTIC Science & Technology

2004-01-09

network for up to one hundred substrate ports. The solver uses the Laplace equation and then 17 transforms it with Green’s theorem into a...the contact center points can be calculated (using Pythagoras ) and saved in a n x n matrix: ( ) ( ) 2 2 xij cxj cxi yij cyj cyi dij xij yij

Performance Comparison of a Matrix Solver on a Heterogeneous Network Using Two Implementations of MPI: MPICH and LAM

NASA Technical Reports Server (NTRS)

Phillips, Jennifer K.

1995-01-01

Two of the current and most popular implementations of the Message-Passing Standard, Message Passing Interface (MPI), were contrasted: MPICH by Argonne National Laboratory, and LAM by the Ohio Supercomputer Center at Ohio State University. A parallel skyline matrix solver was adapted to be run in a heterogeneous environment using MPI. The Message-Passing Interface Forum was held in May 1994 which lead to a specification of library functions that implement the message-passing model of parallel communication. LAM, which creates it's own environment, is more robust in a highly heterogeneous network. MPICH uses the environment native to the machine architecture. While neither of these free-ware implementations provides the performance of native message-passing or vendor's implementations, MPICH begins to approach that performance on the SP-2. The machines used in this study were: IBM RS6000, 3 Sun4, SGI, and the IBM SP-2. Each machine is unique and a few machines required specific modifications during the installation. When installed correctly, both implementations worked well with only minor problems.
Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

DOE PAGES

Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; ...

2015-07-14

In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important featuresmore » of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.« less
Second derivative time integration methods for discontinuous Galerkin solutions of unsteady compressible flows

NASA Astrophysics Data System (ADS)

Nigro, A.; De Bartolo, C.; Crivellini, A.; Bassi, F.

2017-12-01

In this paper we investigate the possibility of using the high-order accurate A (α) -stable Second Derivative (SD) schemes proposed by Enright for the implicit time integration of the Discontinuous Galerkin (DG) space-discretized Navier-Stokes equations. These multistep schemes are A-stable up to fourth-order, but their use results in a system matrix difficult to compute. Furthermore, the evaluation of the nonlinear function is computationally very demanding. We propose here a Matrix-Free (MF) implementation of Enright schemes that allows to obtain a method without the costs of forming, storing and factorizing the system matrix, which is much less computationally expensive than its matrix-explicit counterpart, and which performs competitively with other implicit schemes, such as the Modified Extended Backward Differentiation Formulae (MEBDF). The algorithm makes use of the preconditioned GMRES algorithm for solving the linear system of equations. The preconditioner is based on the ILU(0) factorization of an approximated but computationally cheaper form of the system matrix, and it has been reused for several time steps to improve the efficiency of the MF Newton-Krylov solver. We additionally employ a polynomial extrapolation technique to compute an accurate initial guess to the implicit nonlinear system. The stability properties of SD schemes have been analyzed by solving a linear model problem. For the analysis on the Navier-Stokes equations, two-dimensional inviscid and viscous test cases, both with a known analytical solution, are solved to assess the accuracy properties of the proposed time integration method for nonlinear autonomous and non-autonomous systems, respectively. The performance of the SD algorithm is compared with the ones obtained by using an MF-MEBDF solver, in order to evaluate its effectiveness, identifying its limitations and suggesting possible further improvements.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.

PubMed

Fan, Jianqing; Xue, Lingzhou; Zou, Hui

2014-06-01

Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.
Superresolution radar imaging based on fast inverse-free sparse Bayesian learning for multiple measurement vectors

NASA Astrophysics Data System (ADS)

He, Xingyu; Tong, Ningning; Hu, Xiaowei

2018-01-01

Compressive sensing has been successfully applied to inverse synthetic aperture radar (ISAR) imaging of moving targets. By exploiting the block sparse structure of the target image, sparse solution for multiple measurement vectors (MMV) can be applied in ISAR imaging and a substantial performance improvement can be achieved. As an effective sparse recovery method, sparse Bayesian learning (SBL) for MMV involves a matrix inverse at each iteration. Its associated computational complexity grows significantly with the problem size. To address this problem, we develop a fast inverse-free (IF) SBL method for MMV. A relaxed evidence lower bound (ELBO), which is computationally more amiable than the traditional ELBO used by SBL, is obtained by invoking fundamental property for smooth functions. A variational expectation-maximization scheme is then employed to maximize the relaxed ELBO, and a computationally efficient IF-MSBL algorithm is proposed. Numerical results based on simulated and real data show that the proposed method can reconstruct row sparse signal accurately and obtain clear superresolution ISAR images. Moreover, the running time and computational complexity are reduced to a great extent compared with traditional SBL methods.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION

PubMed Central

Fan, Jianqing; Xue, Lingzhou; Zou, Hui

2014-01-01

Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560
A GPU-based incompressible Navier-Stokes solver on moving overset grids

NASA Astrophysics Data System (ADS)

Chandar, Dominic D. J.; Sitaraman, Jayanarayanan; Mavriplis, Dimitri J.

2013-07-01

In pursuit of obtaining high fidelity solutions to the fluid flow equations in a short span of time, graphics processing units (GPUs) which were originally intended for gaming applications are currently being used to accelerate computational fluid dynamics (CFD) codes. With a high peak throughput of about 1 TFLOPS on a PC, GPUs seem to be favourable for many high-resolution computations. One such computation that involves a lot of number crunching is computing time accurate flow solutions past moving bodies. The aim of the present paper is thus to discuss the development of a flow solver on unstructured and overset grids and its implementation on GPUs. In its present form, the flow solver solves the incompressible fluid flow equations on unstructured/hybrid/overset grids using a fully implicit projection method. The resulting discretised equations are solved using a matrix-free Krylov solver using several GPU kernels such as gradient, Laplacian and reduction. Some of the simple arithmetic vector calculations are implemented using the CU++: An Object Oriented Framework for Computational Fluid Dynamics Applications using Graphics Processing Units, Journal of Supercomputing, 2013, doi:10.1007/s11227-013-0985-9 approach where GPU kernels are automatically generated at compile time. Results are presented for two- and three-dimensional computations on static and moving grids.
OVERSMART Reporting Tool for Flow Computations Over Large Grid Systems

NASA Technical Reports Server (NTRS)

Kao, David L.; Chan, William M.

2012-01-01

Structured grid solvers such as NASA's OVERFLOW compressible Navier-Stokes flow solver can generate large data files that contain convergence histories for flow equation residuals, turbulence model equation residuals, component forces and moments, and component relative motion dynamics variables. Most of today's large-scale problems can extend to hundreds of grids, and over 100 million grid points. However, due to the lack of efficient tools, only a small fraction of information contained in these files is analyzed. OVERSMART (OVERFLOW Solution Monitoring And Reporting Tool) provides a comprehensive report of solution convergence of flow computations over large, complex grid systems. It produces a one-page executive summary of the behavior of flow equation residuals, turbulence model equation residuals, and component forces and moments. Under the automatic option, a matrix of commonly viewed plots such as residual histograms, composite residuals, sub-iteration bar graphs, and component forces and moments is automatically generated. Specific plots required by the user can also be prescribed via a command file or a graphical user interface. Output is directed to the user s computer screen and/or to an html file for archival purposes. The current implementation has been targeted for the OVERFLOW flow solver, which is used to obtain a flow solution on structured overset grids. The OVERSMART framework allows easy extension to other flow solvers.
Use of direct and iterative solvers for estimation of SNP effects in genome-wide selection

PubMed Central

2010-01-01

The aim of this study was to compare iterative and direct solvers for estimation of marker effects in genomic selection. One iterative and two direct methods were used: Gauss-Seidel with Residual Update, Cholesky Decomposition and Gentleman-Givens rotations. For resembling different scenarios with respect to number of markers and of genotyped animals, a simulated data set divided into 25 subsets was used. Number of markers ranged from 1,200 to 5,925 and number of animals ranged from 1,200 to 5,865. Methods were also applied to real data comprising 3081 individuals genotyped for 45181 SNPs. Results from simulated data showed that the iterative solver was substantially faster than direct methods for larger numbers of markers. Use of a direct solver may allow for computing (co)variances of SNP effects. When applied to real data, performance of the iterative method varied substantially, depending on the level of ill-conditioning of the coefficient matrix. From results with real data, Gentleman-Givens rotations would be the method of choice in this particular application as it provided an exact solution within a fairly reasonable time frame (less than two hours). It would indeed be the preferred method whenever computer resources allow its use. PMID:21637627
Matrix computations in MACSYMA

NASA Technical Reports Server (NTRS)

Wang, P. S.

1977-01-01

Facilities built into MACSYMA for manipulating matrices with numeric or symbolic entries are described. Computations will be done exactly, keeping symbols as symbols. Topics discussed include how to form a matrix and create other matrices by transforming existing matrices within MACSYMA; arithmetic and other computation with matrices; and user control of computational processes through the use of optional variables. Two algorithms designed for sparse matrices are given. The computing times of several different ways to compute the determinant of a matrix are compared.
Fabric defect detection based on visual saliency using deep feature and low-rank recovery

NASA Astrophysics Data System (ADS)

Liu, Zhoufeng; Wang, Baorui; Li, Chunlei; Li, Bicao; Dong, Yan

2018-04-01

Fabric defect detection plays an important role in improving the quality of fabric product. In this paper, a novel fabric defect detection method based on visual saliency using deep feature and low-rank recovery was proposed. First, unsupervised training is carried out by the initial network parameters based on MNIST large datasets. The supervised fine-tuning of fabric image library based on Convolutional Neural Networks (CNNs) is implemented, and then more accurate deep neural network model is generated. Second, the fabric images are uniformly divided into the image block with the same size, then we extract their multi-layer deep features using the trained deep network. Thereafter, all the extracted features are concentrated into a feature matrix. Third, low-rank matrix recovery is adopted to divide the feature matrix into the low-rank matrix which indicates the background and the sparse matrix which indicates the salient defect. In the end, the iterative optimal threshold segmentation algorithm is utilized to segment the saliency maps generated by the sparse matrix to locate the fabric defect area. Experimental results demonstrate that the feature extracted by CNN is more suitable for characterizing the fabric texture than the traditional LBP, HOG and other hand-crafted features extraction method, and the proposed method can accurately detect the defect regions of various fabric defects, even for the image with complex texture.
Tensor-GMRES method for large sparse systems of nonlinear equations

NASA Technical Reports Server (NTRS)

Feng, Dan; Pulliam, Thomas H.

1994-01-01

This paper introduces a tensor-Krylov method, the tensor-GMRES method, for large sparse systems of nonlinear equations. This method is a coupling of tensor model formation and solution techniques for nonlinear equations with Krylov subspace projection techniques for unsymmetric systems of linear equations. Traditional tensor methods for nonlinear equations are based on a quadratic model of the nonlinear function, a standard linear model augmented by a simple second order term. These methods are shown to be significantly more efficient than standard methods both on nonsingular problems and on problems where the Jacobian matrix at the solution is singular. A major disadvantage of the traditional tensor methods is that the solution of the tensor model requires the factorization of the Jacobian matrix, which may not be suitable for problems where the Jacobian matrix is large and has a 'bad' sparsity structure for an efficient factorization. We overcome this difficulty by forming and solving the tensor model using an extension of a Newton-GMRES scheme. Like traditional tensor methods, we show that the new tensor method has significant computational advantages over the analogous Newton counterpart. Consistent with Krylov subspace based methods, the new tensor method does not depend on the factorization of the Jacobian matrix. As a matter of fact, the Jacobian matrix is never needed explicitly.
Recursive Factorization of the Inverse Overlap Matrix in Linear-Scaling Quantum Molecular Dynamics Simulations.

PubMed

Negre, Christian F A; Mniszewski, Susan M; Cawkwell, Marc J; Bock, Nicolas; Wall, Michael E; Niklasson, Anders M N

2016-07-12

We present a reduced complexity algorithm to compute the inverse overlap factors required to solve the generalized eigenvalue problem in a quantum-based molecular dynamics (MD) simulation. Our method is based on the recursive, iterative refinement of an initial guess of Z (inverse square root of the overlap matrix S). The initial guess of Z is obtained beforehand by using either an approximate divide-and-conquer technique or dynamical methods, propagated within an extended Lagrangian dynamics from previous MD time steps. With this formulation, we achieve long-term stability and energy conservation even under the incomplete, approximate, iterative refinement of Z. Linear-scaling performance is obtained using numerically thresholded sparse matrix algebra based on the ELLPACK-R sparse matrix data format, which also enables efficient shared-memory parallelization. As we show in this article using self-consistent density-functional-based tight-binding MD, our approach is faster than conventional methods based on the diagonalization of overlap matrix S for systems as small as a few hundred atoms, substantially accelerating quantum-based simulations even for molecular structures of intermediate size. For a 4158-atom water-solvated polyalanine system, we find an average speedup factor of 122 for the computation of Z in each MD step.
Recursive Factorization of the Inverse Overlap Matrix in Linear Scaling Quantum Molecular Dynamics Simulations

DOE PAGES

Negre, Christian F. A; Mniszewski, Susan M.; Cawkwell, Marc Jon; ...

2016-06-06

We present a reduced complexity algorithm to compute the inverse overlap factors required to solve the generalized eigenvalue problem in a quantum-based molecular dynamics (MD) simulation. Our method is based on the recursive iterative re nement of an initial guess Z of the inverse overlap matrix S. The initial guess of Z is obtained beforehand either by using an approximate divide and conquer technique or dynamically, propagated within an extended Lagrangian dynamics from previous MD time steps. With this formulation, we achieve long-term stability and energy conservation even under incomplete approximate iterative re nement of Z. Linear scaling performance ismore » obtained using numerically thresholded sparse matrix algebra based on the ELLPACK-R sparse matrix data format, which also enables e cient shared memory parallelization. As we show in this article using selfconsistent density functional based tight-binding MD, our approach is faster than conventional methods based on the direct diagonalization of the overlap matrix S for systems as small as a few hundred atoms, substantially accelerating quantum-based simulations even for molecular structures of intermediate size. For a 4,158 atom water-solvated polyalanine system we nd an average speedup factor of 122 for the computation of Z in each MD step.« less
Optimal parallel solution of sparse triangular systems

NASA Technical Reports Server (NTRS)

Alvarado, Fernando L.; Schreiber, Robert

1990-01-01

A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.
Label consistent K-SVD: learning a discriminative dictionary for recognition.

PubMed

Jiang, Zhuolin; Lin, Zhe; Davis, Larry S

2013-11-01

A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistency constraint called "discriminative sparse-code error" and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. The incremental dictionary learning algorithm is presented for the situation of limited memory resources. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse-coding techniques for face, action, scene, and object category recognition under the same learning conditions.
Use of a residual distribution Euler solver to study the occurrence of transonic flow in Wells turbine rotor blades

NASA Astrophysics Data System (ADS)

Henriques, J. C. C.; Gato, L. M. C.

The aim of the present study is to investigate the occurrence of transonic flow in several cascade geometries and blade sections that have been considered in the design of Wells turbine rotor blades. The calculations were performed using an implicit Euler solver for two-dimensional flow. The numerical method uses a multi-dimensional upwind matrix residual distribution scheme formulated on a new symmetrized form of the Euler equations, both in time and in space, that decouples the entropy and the enthalpy equations. Second-order accurate steady-state solutions where obtained using a compact three-point stencil. The results show that unwanted transonic flow may occur in the turbine rotor at relatively low mean-flow Mach numbers.
GASPACHO: a generic automatic solver using proximal algorithms for convex huge optimization problems

NASA Astrophysics Data System (ADS)

Goossens, Bart; Luong, Hiêp; Philips, Wilfried

2017-08-01

Many inverse problems (e.g., demosaicking, deblurring, denoising, image fusion, HDR synthesis) share various similarities: degradation operators are often modeled by a specific data fitting function while image prior knowledge (e.g., sparsity) is incorporated by additional regularization terms. In this paper, we investigate automatic algorithmic techniques for evaluating proximal operators. These algorithmic techniques also enable efficient calculation of adjoints from linear operators in a general matrix-free setting. In particular, we study the simultaneous-direction method of multipliers (SDMM) and the parallel proximal algorithm (PPXA) solvers and show that the automatically derived implementations are well suited for both single-GPU and multi-GPU processing. We demonstrate this approach for an Electron Microscopy (EM) deconvolution problem.
Solving Modal Equations of Motion with Initial Conditions Using MSC/NASTRAN DMAP. Part 2; Coupled Versus Uncoupled Integration

NASA Technical Reports Server (NTRS)

Barnett, Alan R.; Ibrahim, Omar M.; Abdallah, Ayman A.; Sullivan, Timothy L.

1993-01-01

By utilizing MSC/NASTRAN DMAP (Direct Matrix Abstraction Program) in an existing NASA Lewis Research Center coupled loads methodology, solving modal equations of motion with initial conditions is possible using either coupled (Newmark-Beta) or uncoupled (exact mode superposition) integration available within module TRD1. Both the coupled and newly developed exact mode superposition methods have been used to perform transient analyses of various space systems. However, experience has shown that in most cases, significant time savings are realized when the equations of motion are integrated using the uncoupled solver instead of the coupled solver. Through the results of a real-world engineering analysis, advantages of using the exact mode superposition methodology are illustrated.
Subspace aware recovery of low rank and jointly sparse signals

PubMed Central

Biswas, Sampurna; Dasgupta, Soura; Mudumbai, Raghuraman; Jacob, Mathews

2017-01-01

We consider the recovery of a matrix X, which is simultaneously low rank and joint sparse, from few measurements of its columns using a two-step algorithm. Each column of X is measured using a combination of two measurement matrices; one which is the same for every column, while the the second measurement matrix varies from column to column. The recovery proceeds by first estimating the row subspace vectors from the measurements corresponding to the common matrix. The estimated row subspace vectors are then used to recover X from all the measurements using a convex program of joint sparsity minimization. Our main contribution is to provide sufficient conditions on the measurement matrices that guarantee the recovery of such a matrix using the above two-step algorithm. The results demonstrate quite significant savings in number of measurements when compared to the standard multiple measurement vector (MMV) scheme, which assumes same time invariant measurement pattern for all the time frames. We illustrate the impact of the sampling pattern on reconstruction quality using breath held cardiac cine MRI and cardiac perfusion MRI data, while the utility of the algorithm to accelerate the acquisition is demonstrated on MR parameter mapping. PMID:28630889

Beyond Low Rank + Sparse: Multi-scale Low Rank Matrix Decomposition

PubMed Central

Ong, Frank; Lustig, Michael

2016-01-01

We present a natural generalization of the recent low rank + sparse matrix decomposition and consider the decomposition of matrices into components of multiple scales. Such decomposition is well motivated in practice as data matrices often exhibit local correlations in multiple scales. Concretely, we propose a multi-scale low rank modeling that represents a data matrix as a sum of block-wise low rank matrices with increasing scales of block sizes. We then consider the inverse problem of decomposing the data matrix into its multi-scale low rank components and approach the problem via a convex formulation. Theoretically, we show that under various incoherence conditions, the convex program recovers the multi-scale low rank components either exactly or approximately. Practically, we provide guidance on selecting the regularization parameters and incorporate cycle spinning to reduce blocking artifacts. Experimentally, we show that the multi-scale low rank decomposition provides a more intuitive decomposition than conventional low rank methods and demonstrate its effectiveness in four applications, including illumination normalization for face images, motion separation for surveillance videos, multi-scale modeling of the dynamic contrast enhanced magnetic resonance imaging and collaborative filtering exploiting age information. PMID:28450978
Beyond the Purely Cognitive: Metacognition and Social Cognition as Driving Forces in Intellectual Performance.

ERIC Educational Resources Information Center

Schoenfeld, Alan H.

The dimensions of the broad social-cognitive and metacognitive matrix within which pure cognitions reside are examined. Tangible cognitive actions are the cross products of beliefs held about a task, the social environment within which the task takes place, and the problem solvers' perceptions of self and their relation to the task and…
Fast Electromagnetic Solvers for Large-Scale Naval Scattering Problems

DTIC Science & Technology

2008-09-27

IEEE Trans. Antennas Propag., vol. 52, no. 8, pp. 2141–2146, 2004. [12] R. J. Burkholder and J. F. Lee, “Fast dual-MGS block-factorization algorithm...Golub and C. F. V. Loan, Matrix Computations. Baltimore: The Johns Hopkins University Press, 1996. [20] W. D. Li, W. Hong, and H. X. Zhou, “Integral
Combined fast multipole-QR compression technique for solving electrically small to large structures for broadband applications

NASA Technical Reports Server (NTRS)

Jandhyala, Vikram (Inventor); Chowdhury, Indranil (Inventor)

2011-01-01

An approach that efficiently solves for a desired parameter of a system or device that can include both electrically large fast multipole method (FMM) elements, and electrically small QR elements. The system or device is setup as an oct-tree structure that can include regions of both the FMM type and the QR type. An iterative solver is then used to determine a first matrix vector product for any electrically large elements, and a second matrix vector product for any electrically small elements that are included in the structure. These matrix vector products for the electrically large elements and the electrically small elements are combined, and a net delta for a combination of the matrix vector products is determined. The iteration continues until a net delta is obtained that is within predefined limits. The matrix vector products that were last obtained are used to solve for the desired parameter.
Variational calculation of second-order reduced density matrices by strong N-representability conditions and an accurate semidefinite programming solver.

PubMed

Nakata, Maho; Braams, Bastiaan J; Fujisawa, Katsuki; Fukuda, Mituhiro; Percus, Jerome K; Yamashita, Makoto; Zhao, Zhengji

2008-04-28

The reduced density matrix (RDM) method, which is a variational calculation based on the second-order reduced density matrix, is applied to the ground state energies and the dipole moments for 57 different states of atoms, molecules, and to the ground state energies and the elements of 2-RDM for the Hubbard model. We explore the well-known N-representability conditions (P, Q, and G) together with the more recent and much stronger T1 and T2(') conditions. T2(') condition was recently rederived and it implies T2 condition. Using these N-representability conditions, we can usually calculate correlation energies in percentage ranging from 100% to 101%, whose accuracy is similar to CCSD(T) and even better for high spin states or anion systems where CCSD(T) fails. Highly accurate calculations are carried out by handling equality constraints and/or developing multiple precision arithmetic in the semidefinite programming (SDP) solver. Results show that handling equality constraints correctly improves the accuracy from 0.1 to 0.6 mhartree. Additionally, improvements by replacing T2 condition with T2(') condition are typically of 0.1-0.5 mhartree. The newly developed multiple precision arithmetic version of SDP solver calculates extraordinary accurate energies for the one dimensional Hubbard model and Be atom. It gives at least 16 significant digits for energies, where double precision calculations gives only two to eight digits. It also provides physically meaningful results for the Hubbard model in the high correlation limit.
A Comparison of Monte Carlo and Deterministic Solvers for keff and Sensitivity Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haeck, Wim; Parsons, Donald Kent; White, Morgan Curtis

Verification and validation of our solutions for calculating the neutron reactivity for nuclear materials is a key issue to address for many applications, including criticality safety, research reactors, power reactors, and nuclear security. Neutronics codes solve variations of the Boltzmann transport equation. The two main variants are Monte Carlo versus deterministic solutions, e.g. the MCNP [1] versus PARTISN [2] codes, respectively. There have been many studies over the decades that examined the accuracy of such solvers and the general conclusion is that when the problems are well-posed, either solver can produce accurate results. However, the devil is always in themore » details. The current study examines the issue of self-shielding and the stress it puts on deterministic solvers. Most Monte Carlo neutronics codes use continuous-energy descriptions of the neutron interaction data that are not subject to this effect. The issue of self-shielding occurs because of the discretisation of data used by the deterministic solutions. Multigroup data used in these solvers are the average cross section and scattering parameters over an energy range. Resonances in cross sections can occur that change the likelihood of interaction by one to three orders of magnitude over a small energy range. Self-shielding is the numerical effect that the average cross section in groups with strong resonances can be strongly affected as neutrons within that material are preferentially absorbed or scattered out of the resonance energies. This affects both the average cross section and the scattering matrix.« less
A compressive sensing-based computational method for the inversion of wide-band ground penetrating radar data

NASA Astrophysics Data System (ADS)

Gelmini, A.; Gottardi, G.; Moriyama, T.

2017-10-01

This work presents an innovative computational approach for the inversion of wideband ground penetrating radar (GPR) data. The retrieval of the dielectric characteristics of sparse scatterers buried in a lossy soil is performed by combining a multi-task Bayesian compressive sensing (MT-BCS) solver and a frequency hopping (FH) strategy. The developed methodology is able to benefit from the regularization capabilities of the MT-BCS as well as to exploit the multi-chromatic informative content of GPR measurements. A set of numerical results is reported in order to assess the effectiveness of the proposed GPR inverse scattering technique, as well as to compare it to a simpler single-task implementation.
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, part 2

NASA Technical Reports Server (NTRS)

Freund, Roland W.; Nachtigal, Noel M.

1990-01-01

It is shown how the look-ahead Lanczos process (combined with a quasi-minimal residual QMR) approach) can be used to develop a robust black box solver for large sparse non-Hermitian linear systems. Details of an implementation of the resulting QMR algorithm are presented. It is demonstrated that the QMR method is closely related to the biconjugate gradient (BCG) algorithm; however, unlike BCG, the QMR algorithm has smooth convergence curves and good numerical properties. We report numerical experiments with our implementation of the look-ahead Lanczos algorithm, both for eigenvalue problem and linear systems. Also, program listings of FORTRAN implementations of the look-ahead algorithm and the QMR method are included.
Compressed sensing for high-resolution nonlipid suppressed 1 H FID MRSI of the human brain at 9.4T.

PubMed

Nassirpour, Sahar; Chang, Paul; Avdievitch, Nikolai; Henning, Anke

2018-04-29

The aim of this study was to apply compressed sensing to accelerate the acquisition of high resolution metabolite maps of the human brain using a nonlipid suppressed ultra-short TR and TE 1 H FID MRSI sequence at 9.4T. X-t sparse compressed sensing reconstruction was optimized for nonlipid suppressed 1 H FID MRSI data. Coil-by-coil x-t sparse reconstruction was compared with SENSE x-t sparse and low rank reconstruction. The effect of matrix size and spatial resolution on the achievable acceleration factor was studied. Finally, in vivo metabolite maps with different acceleration factors of 2, 4, 5, and 10 were acquired and compared. Coil-by-coil x-t sparse compressed sensing reconstruction was not able to reliably recover the nonlipid suppressed data, rather a combination of parallel and sparse reconstruction was necessary (SENSE x-t sparse). For acceleration factors of up to 5, both the low-rank and the compressed sensing methods were able to reconstruct the data comparably well (root mean squared errors [RMSEs] ≤ 10.5% for Cre). However, the reconstruction time of the low rank algorithm was drastically longer than compressed sensing. Using the optimized compressed sensing reconstruction, acceleration factors of 4 or 5 could be reached for the MRSI data with a matrix size of 64 × 64. For lower spatial resolutions, an acceleration factor of up to R∼4 was successfully achieved. By tailoring the reconstruction scheme to the nonlipid suppressed data through parameter optimization and performance evaluation, we present high resolution (97 µL voxel size) accelerated in vivo metabolite maps of the human brain acquired at 9.4T within scan times of 3 to 3.75 min. © 2018 International Society for Magnetic Resonance in Medicine.
Algorithms and software for solving finite element equations on serial and parallel architectures

NASA Technical Reports Server (NTRS)

Chu, Eleanor; George, Alan

1988-01-01

The primary objective was to compare the performance of state-of-the-art techniques for solving sparse systems with those that are currently available in the Computational Structural Mechanics (MSC) testbed. One of the first tasks was to become familiar with the structure of the testbed, and to install some or all of the SPARSPAK package in the testbed. A brief overview of the CSM Testbed software and its usage is presented. An overview of the sparse matrix research for the Testbed currently employed in the CSM Testbed is given. An interface which was designed and implemented as a research tool for installing and appraising new matrix processors in the CSM Testbed is described. The results of numerical experiments performed in solving a set of testbed demonstration problems using the processor SPK and other experimental processors are contained.
GPU-accelerated algorithms for compressed signals recovery with application to astronomical imagery deblurring

NASA Astrophysics Data System (ADS)

Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico

2018-04-01

Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
Supercomputing on massively parallel bit-serial architectures

NASA Technical Reports Server (NTRS)

Iobst, Ken

1985-01-01

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Improved parallel data partitioning by nested dissection with applications to information retrieval.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Michael M.; Chevalier, Cedric; Boman, Erik Gunnar

The computational work in many information retrieval and analysis algorithms is based on sparse linear algebra. Sparse matrix-vector multiplication is a common kernel in many of these computations. Thus, an important related combinatorial problem in parallel computing is how to distribute the matrix and the vectors among processors so as to minimize the communication cost. We focus on minimizing the total communication volume while keeping the computation balanced across processes. In [1], the first two authors presented a new 2D partitioning method, the nested dissection partitioning algorithm. In this paper, we improve on that algorithm and show that it ismore » a good option for data partitioning in information retrieval. We also show partitioning time can be substantially reduced by using the SCOTCH software, and quality improves in some cases, too.« less
Higher Order, Hybrid BEM/FEM Methods Applied to Antenna Modeling

NASA Technical Reports Server (NTRS)

Fink, P. W.; Wilton, D. R.; Dobbins, J. A.

2002-01-01

In this presentation, the authors address topics relevant to higher order modeling using hybrid BEM/FEM formulations. The first of these is the limitation on convergence rates imposed by geometric modeling errors in the analysis of scattering by a dielectric sphere. The second topic is the application of an Incomplete LU Threshold (ILUT) preconditioner to solve the linear system resulting from the BEM/FEM formulation. The final tOpic is the application of the higher order BEM/FEM formulation to antenna modeling problems. The authors have previously presented work on the benefits of higher order modeling. To achieve these benefits, special attention is required in the integration of singular and near-singular terms arising in the surface integral equation. Several methods for handling these terms have been presented. It is also well known that achieving he high rates of convergence afforded by higher order bases may als'o require the employment of higher order geometry models. A number of publications have described the use of quadratic elements to model curved surfaces. The authors have shown in an EFIE formulation, applied to scattering by a PEC .sphere, that quadratic order elements may be insufficient to prevent the domination of modeling errors. In fact, on a PEC sphere with radius r = 0.58 Lambda(sub 0), a quartic order geometry representation was required to obtain a convergence benefi.t from quadratic bases when compared to the convergence rate achieved with linear bases. Initial trials indicate that, for a dielectric sphere of the same radius, - requirements on the geometry model are not as severe as for the PEC sphere. The authors will present convergence results for higher order bases as a function of the geometry model order in the hybrid BEM/FEM formulation applied to dielectric spheres. It is well known that the system matrix resulting from the hybrid BEM/FEM formulation is ill -conditioned. For many real applications, a good preconditioner is required to obtain usable convergence from an iterative solver. The authors have examined the use of an Incomplete LU Threshold (ILUT) preconditioner . to solver linear systems stemming from higher order BEM/FEM formulations in 2D scattering problems. Although the resulting preconditioner provided aD excellent approximation to the system inverse, its size in terms of non-zero entries represented only a modest improvement when compared with the fill-in associated with a sparse direct solver. Furthermore, the fill-in of the preconditioner could not be substantially reduced without the occurrence of instabilities. In addition to the results for these 2D problems, the authors will present iterative solution data from the application of the ILUT preconditioner to 3D problems.
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagasaka, Y; Matsuoka, S; Azad, A

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Joint Inversion of Body-Wave Arrival Times and Surface-Wave Dispersion Data in the Wavelet Domain Constrained by Sparsity Regularization

NASA Astrophysics Data System (ADS)

Zhang, H.; Fang, H.; Yao, H.; Maceira, M.; van der Hilst, R. D.

2014-12-01

Recently, Zhang et al. (2014, Pure and Appiled Geophysics) have developed a joint inversion code incorporating body-wave arrival times and surface-wave dispersion data. The joint inversion code was based on the regional-scale version of the double-difference tomography algorithm tomoDD. The surface-wave inversion part uses the propagator matrix solver in the algorithm DISPER80 (Saito, 1988) for forward calculation of dispersion curves from layered velocity models and the related sensitivities. The application of the joint inversion code to the SAFOD site in central California shows that the fault structure is better imaged in the new model, which is able to fit both the body-wave and surface-wave observations adequately. Here we present a new joint inversion method that solves the model in the wavelet domain constrained by sparsity regularization. Compared to the previous method, it has the following advantages: (1) The method is both data- and model-adaptive. For the velocity model, it can be represented by different wavelet coefficients at different scales, which are generally sparse. By constraining the model wavelet coefficients to be sparse, the inversion in the wavelet domain can inherently adapt to the data distribution so that the model has higher spatial resolution in the good data coverage zone. Fang and Zhang (2014, Geophysical Journal International) have showed the superior performance of the wavelet-based double-difference seismic tomography method compared to the conventional method. (2) For the surface wave inversion, the joint inversion code takes advantage of the recent development of direct inversion of surface wave dispersion data for 3-D variations of shear wave velocity without the intermediate step of phase or group velocity maps (Fang et al., 2014, Geophysical Journal International). A fast marching method is used to compute, at each period, surface wave traveltimes and ray paths between sources and receivers. We will test the new joint inversion code at the SAFOD site to compare its performance over the previous code. We will also select another fault zone such as the San Jacinto Fault Zone to better image its structure.
A Semiparametric Approach to Simultaneous Covariance Estimation for Bivariate Sparse Longitudinal Data

PubMed Central

Das, Kiranmoy; Daniels, Michael J.

2014-01-01

Summary Estimation of the covariance structure for irregular sparse longitudinal data has been studied by many authors in recent years but typically using fully parametric specifications. In addition, when data are collected from several groups over time, it is known that assuming the same or completely different covariance matrices over groups can lead to loss of efficiency and/or bias. Nonparametric approaches have been proposed for estimating the covariance matrix for regular univariate longitudinal data by sharing information across the groups under study. For the irregular case, with longitudinal measurements that are bivariate or multivariate, modeling becomes more difficult. In this article, to model bivariate sparse longitudinal data from several groups, we propose a flexible covariance structure via a novel matrix stick-breaking process for the residual covariance structure and a Dirichlet process mixture of normals for the random effects. Simulation studies are performed to investigate the effectiveness of the proposed approach over more traditional approaches. We also analyze a subset of Framingham Heart Study data to examine how the blood pressure trajectories and covariance structures differ for the patients from different BMI groups (high, medium and low) at baseline. PMID:24400941
Compressed sensing for energy-efficient wireless telemonitoring of noninvasive fetal ECG via block sparse Bayesian learning.

PubMed

Zhang, Zhilin; Jung, Tzyy-Ping; Makeig, Scott; Rao, Bhaskar D

2013-02-01

Fetal ECG (FECG) telemonitoring is an important branch in telemedicine. The design of a telemonitoring system via a wireless body area network with low energy consumption for ambulatory use is highly desirable. As an emerging technique, compressed sensing (CS) shows great promise in compressing/reconstructing data with low energy consumption. However, due to some specific characteristics of raw FECG recordings such as nonsparsity and strong noise contamination, current CS algorithms generally fail in this application. This paper proposes to use the block sparse Bayesian learning framework to compress/reconstruct nonsparse raw FECG recordings. Experimental results show that the framework can reconstruct the raw recordings with high quality. Especially, the reconstruction does not destroy the interdependence relation among the multichannel recordings. This ensures that the independent component analysis decomposition of the reconstructed recordings has high fidelity. Furthermore, the framework allows the use of a sparse binary sensing matrix with much fewer nonzero entries to compress recordings. Particularly, each column of the matrix can contain only two nonzero entries. This shows that the framework, compared to other algorithms such as current CS algorithms and wavelet algorithms, can greatly reduce code execution in CPU in the data compression stage.
Technical note: an R package for fitting sparse neural networks with application in animal breeding.

PubMed

Wang, Yangfan; Mi, Xue; Rosa, Guilherme J M; Chen, Zhihui; Lin, Ping; Wang, Shi; Bao, Zhenmin

2018-05-04

Neural networks (NNs) have emerged as a new tool for genomic selection (GS) in animal breeding. However, the properties of NN used in GS for the prediction of phenotypic outcomes are not well characterized due to the problem of over-parameterization of NN and difficulties in using whole-genome marker sets as high-dimensional NN input. In this note, we have developed an R package called snnR that finds an optimal sparse structure of a NN by minimizing the square error subject to a penalty on the L1-norm of the parameters (weights and biases), therefore solving the problem of over-parameterization in NN. We have also tested some models fitted in the snnR package to demonstrate their feasibility and effectiveness to be used in several cases as examples. In comparison of snnR to the R package brnn (the Bayesian regularized single layer NNs), with both using the entries of a genotype matrix or a genomic relationship matrix as inputs, snnR has greatly improved the computational efficiency and the prediction ability for the GS in animal breeding because snnR implements a sparse NN with many hidden layers.
A fast collocation method for a variable-coefficient nonlocal diffusion model

NASA Astrophysics Data System (ADS)

Wang, Che; Wang, Hong

2017-02-01

We develop a fast collocation scheme for a variable-coefficient nonlocal diffusion model, for which a numerical discretization would yield a dense stiffness matrix. The development of the fast method is achieved by carefully handling the variable coefficients appearing inside the singular integral operator and exploiting the structure of the dense stiffness matrix. The resulting fast method reduces the computational work from O (N3) required by a commonly used direct solver to O (Nlog ⁡ N) per iteration and the memory requirement from O (N2) to O (N). Furthermore, the fast method reduces the computational work of assembling the stiffness matrix from O (N2) to O (N). Numerical results are presented to show the utility of the fast method.

Jacobian-free approximate solvers for hyperbolic systems: Application to relativistic magnetohydrodynamics

NASA Astrophysics Data System (ADS)

Castro, Manuel J.; Gallardo, José M.; Marquina, Antonio

2017-10-01

We present recent advances in PVM (Polynomial Viscosity Matrix) methods based on internal approximations to the absolute value function, and compare them with Chebyshev-based PVM solvers. These solvers only require a bound on the maximum wave speed, so no spectral decomposition is needed. Another important feature of the proposed methods is that they are suitable to be written in Jacobian-free form, in which only evaluations of the physical flux are used. This is particularly interesting when considering systems for which the Jacobians involve complex expressions, e.g., the relativistic magnetohydrodynamics (RMHD) equations. On the other hand, the proposed Jacobian-free solvers have also been extended to the case of approximate DOT (Dumbser-Osher-Toro) methods, which can be regarded as simple and efficient approximations to the classical Osher-Solomon method, sharing most of it interesting features and being applicable to general hyperbolic systems. To test the properties of our schemes a number of numerical experiments involving the RMHD equations are presented, both in one and two dimensions. The obtained results are in good agreement with those found in the literature and show that our schemes are robust and accurate, running stable under a satisfactory time step restriction. It is worth emphasizing that, although this work focuses on RMHD, the proposed schemes are suitable to be applied to general hyperbolic systems.
Transonic Drag Prediction Using an Unstructured Multigrid Solver

NASA Technical Reports Server (NTRS)

Mavriplis, D. J.; Levy, David W.

2001-01-01

This paper summarizes the results obtained with the NSU-3D unstructured multigrid solver for the AIAA Drag Prediction Workshop held in Anaheim, CA, June 2001. The test case for the workshop consists of a wing-body configuration at transonic flow conditions. Flow analyses for a complete test matrix of lift coefficient values and Mach numbers at a constant Reynolds number are performed, thus producing a set of drag polars and drag rise curves which are compared with experimental data. Results were obtained independently by both authors using an identical baseline grid and different refined grids. Most cases were run in parallel on commodity cluster-type machines while the largest cases were run on an SGI Origin machine using 128 processors. The objective of this paper is to study the accuracy of the subject unstructured grid solver for predicting drag in the transonic cruise regime, to assess the efficiency of the method in terms of convergence, cpu time, and memory, and to determine the effects of grid resolution on this predictive ability and its computational efficiency. A good predictive ability is demonstrated over a wide range of conditions, although accuracy was found to degrade for cases at higher Mach numbers and lift values where increasing amounts of flow separation occur. The ability to rapidly compute large numbers of cases at varying flow conditions using an unstructured solver on inexpensive clusters of commodity computers is also demonstrated.
Regularization and computational methods for precise solution of perturbed orbit transfer problems

NASA Astrophysics Data System (ADS)

Woollands, Robyn Michele

The author has developed a suite of algorithms for solving the perturbed Lambert's problem in celestial mechanics. These algorithms have been implemented as a parallel computation tool that has broad applicability. This tool is composed of four component algorithms and each provides unique benefits for solving a particular type of orbit transfer problem. The first one utilizes a Keplerian solver (a-iteration) for solving the unperturbed Lambert's problem. This algorithm not only provides a "warm start" for solving the perturbed problem but is also used to identify which of several perturbed solvers is best suited for the job. The second algorithm solves the perturbed Lambert's problem using a variant of the modified Chebyshev-Picard iteration initial value solver that solves two-point boundary value problems. This method converges over about one third of an orbit and does not require a Newton-type shooting method and thus no state transition matrix needs to be computed. The third algorithm makes use of regularization of the differential equations through the Kustaanheimo-Stiefel transformation and extends the domain of convergence over which the modified Chebyshev-Picard iteration two-point boundary value solver will converge, from about one third of an orbit to almost a full orbit. This algorithm also does not require a Newton-type shooting method. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver to solve the perturbed two-impulse Lambert problem over multiple revolutions. The method of particular solutions is a shooting method but differs from the Newton-type shooting methods in that it does not require integration of the state transition matrix. The mathematical developments that underlie these four algorithms are derived in the chapters of this dissertation. For each of the algorithms, some orbit transfer test cases are included to provide insight on accuracy and efficiency of these individual algorithms. Following this discussion, the combined parallel algorithm, known as the unified Lambert tool, is presented and an explanation is given as to how it automatically selects which of the three perturbed solvers to compute the perturbed solution for a particular orbit transfer. The unified Lambert tool may be used to determine a single orbit transfer or for generating of an extremal field map. A case study is presented for a mission that is required to rendezvous with two pieces of orbit debris (spent rocket boosters). The unified Lambert tool software developed in this dissertation is already being utilized by several industrial partners and we are confident that it will play a significant role in practical applications, including solution of Lambert problems that arise in the current applications focused on enhanced space situational awareness.
Strategies for global optimization in photonics design.

PubMed

Vukovic, Ana; Sewell, Phillip; Benson, Trevor M

2010-10-01

This paper reports on two important issues that arise in the context of the global optimization of photonic components where large problem spaces must be investigated. The first is the implementation of a fast simulation method and associated matrix solver for assessing particular designs and the second, the strategies that a designer can adopt to control the size of the problem design space to reduce runtimes without compromising the convergence of the global optimization tool. For this study an analytical simulation method based on Mie scattering and a fast matrix solver exploiting the fast multipole method are combined with genetic algorithms (GAs). The impact of the approximations of the simulation method on the accuracy and runtime of individual design assessments and the consequent effects on the GA are also examined. An investigation of optimization strategies for controlling the design space size is conducted on two illustrative examples, namely, 60° and 90° waveguide bends based on photonic microstructures, and their effectiveness is analyzed in terms of a GA's ability to converge to the best solution within an acceptable timeframe. Finally, the paper describes some particular optimized solutions found in the course of this work.
Element sensitive reconstruction of nanostructured surfaces with finite elements and grazing incidence soft X-ray fluorescence.

PubMed

Soltwisch, Victor; Hönicke, Philipp; Kayser, Yves; Eilbracht, Janis; Probst, Jürgen; Scholze, Frank; Beckhoff, Burkhard

2018-03-29

The geometry of a Si3N4 lamellar grating was investigated experimentally with reference-free grazing-incidence X-ray fluorescence analysis. While simple layered systems are usually treated with the matrix formalism to determine the X-ray standing-wave field, this approach fails for laterally structured surfaces. Maxwell solvers based on finite elements are often used to model electrical field strengths for any 2D or 3D structures in the optical spectral range. We show that this approach can also be applied in the field of X-rays. The electrical field distribution obtained with the Maxwell solver can subsequently be used to calculate the fluorescence intensities in full analogy to the X-ray standing-wave field obtained by the matrix formalism. Only the effective 1D integration for the layer system has to be replaced by a 2D integration of the finite elements, taking into account the local excitation conditions. We will show that this approach is capable of reconstructing the geometric line shape of a structured surface with high elemental sensitivity. This combination of GIXRF and finite-element simulations paves the way for a versatile characterization of nanoscale-structured surfaces.
Comparison of two matrix data structures for advanced CSM testbed applications

NASA Technical Reports Server (NTRS)

Regelbrugge, M. E.; Brogan, F. A.; Nour-Omid, B.; Rankin, C. C.; Wright, M. A.

1989-01-01

The first section describes data storage schemes presently used by the Computational Structural Mechanics (CSM) testbed sparse matrix facilities and similar skyline (profile) matrix facilities. The second section contains a discussion of certain features required for the implementation of particular advanced CSM algorithms, and how these features might be incorporated into the data storage schemes described previously. The third section presents recommendations, based on the discussions of the prior sections, for directing future CSM testbed development to provide necessary matrix facilities for advanced algorithm implementation and use. The objective is to lend insight into the matrix structures discussed and to help explain the process of evaluating alternative matrix data structures and utilities for subsequent use in the CSM testbed.
Computational Challenges of 3D Radiative Transfer in Atmospheric Models

NASA Astrophysics Data System (ADS)

Jakub, Fabian; Bernhard, Mayer

2017-04-01

The computation of radiative heating and cooling rates is one of the most expensive components in todays atmospheric models. The high computational cost stems not only from the laborious integration over a wide range of the electromagnetic spectrum but also from the fact that solving the integro-differential radiative transfer equation for monochromatic light is already rather involved. This lead to the advent of numerous approximations and parameterizations to reduce the cost of the solver. One of the most prominent one is the so called independent pixel approximations (IPA) where horizontal energy transfer is neglected whatsoever and radiation may only propagate in the vertical direction (1D). Recent studies implicate that the IPA introduces significant errors in high resolution simulations and affects the evolution and development of convective systems. However, using fully 3D solvers such as for example MonteCarlo methods is not even on state of the art supercomputers feasible. The parallelization of atmospheric models is often realized by a horizontal domain decomposition, and hence, horizontal transfer of energy necessitates communication. E.g. a cloud's shadow at a low zenith angle will cast a long shadow and potentially needs to communication through a multitude of processors. Especially light in the solar spectral range may travel long distances through the atmosphere. Concerning highly parallel simulations, it is vital that 3D radiative transfer solvers put a special emphasis on parallel scalability. We will present an introduction to intricacies computing 3D radiative heating and cooling rates as well as report on the parallel performance of the TenStream solver. The TenStream is a 3D radiative transfer solver using the PETSc framework to iteratively solve a set of partial differential equation. We investigate two matrix preconditioners, (a) geometric algebraic multigrid preconditioning(MG+GAMG) and (b) block Jacobi incomplete LU (ILU) factorization. The TenStream solver is tested for up to 4096 cores and shows a parallel scaling efficiency of 80-90% on various supercomputers.
Large-scale 3-D EM modelling with a Block Low-Rank multifrontal direct solver

NASA Astrophysics Data System (ADS)

Shantsev, Daniil V.; Jaysaval, Piyoosh; de la Kethulle de Ryhove, Sébastien; Amestoy, Patrick R.; Buttari, Alfredo; L'Excellent, Jean-Yves; Mary, Theo

2017-06-01

We put forward the idea of using a Block Low-Rank (BLR) multifrontal direct solver to efficiently solve the linear systems of equations arising from a finite-difference discretization of the frequency-domain Maxwell equations for 3-D electromagnetic (EM) problems. The solver uses a low-rank representation for the off-diagonal blocks of the intermediate dense matrices arising in the multifrontal method to reduce the computational load. A numerical threshold, the so-called BLR threshold, controlling the accuracy of low-rank representations was optimized by balancing errors in the computed EM fields against savings in floating point operations (flops). Simulations were carried out over large-scale 3-D resistivity models representing typical scenarios for marine controlled-source EM surveys, and in particular the SEG SEAM model which contains an irregular salt body. The flop count, size of factor matrices and elapsed run time for matrix factorization are reduced dramatically by using BLR representations and can go down to, respectively, 10, 30 and 40 per cent of their full-rank values for our largest system with N = 20.6 million unknowns. The reductions are almost independent of the number of MPI tasks and threads at least up to 90 × 10 = 900 cores. The BLR savings increase for larger systems, which reduces the factorization flop complexity from O(N2) for the full-rank solver to O(Nm) with m = 1.4-1.6. The BLR savings are significantly larger for deep-water environments that exclude the highly resistive air layer from the computational domain. A study in a scenario where simulations are required at multiple source locations shows that the BLR solver can become competitive in comparison to iterative solvers as an engine for 3-D controlled-source electromagnetic Gauss-Newton inversion that requires forward modelling for a few thousand right-hand sides.
Dense and Sparse Matrix Operations on the Cell Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel W.; Shalf, John; Oliker, Leonid

2005-05-01

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
Multivariable frequency domain identification via 2-norm minimization

NASA Technical Reports Server (NTRS)

Bayard, David S.

1992-01-01

The author develops a computational approach to multivariable frequency domain identification, based on 2-norm minimization. In particular, a Gauss-Newton (GN) iteration is developed to minimize the 2-norm of the error between frequency domain data and a matrix fraction transfer function estimate. To improve the global performance of the optimization algorithm, the GN iteration is initialized using the solution to a particular sequentially reweighted least squares problem, denoted as the SK iteration. The least squares problems which arise from both the SK and GN iterations are shown to involve sparse matrices with identical block structure. A sparse matrix QR factorization method is developed to exploit the special block structure, and to efficiently compute the least squares solution. A numerical example involving the identification of a multiple-input multiple-output (MIMO) plant having 286 unknown parameters is given to illustrate the effectiveness of the algorithm.
Research on sparse feature matching of improved RANSAC algorithm

NASA Astrophysics Data System (ADS)

Kong, Xiangsi; Zhao, Xian

2018-04-01

In this paper, a sparse feature matching method based on modified RANSAC algorithm is proposed to improve the precision and speed. Firstly, the feature points of the images are extracted using the SIFT algorithm. Then, the image pair is matched roughly by generating SIFT feature descriptor. At last, the precision of image matching is optimized by the modified RANSAC algorithm,. The RANSAC algorithm is improved from three aspects: instead of the homography matrix, this paper uses the fundamental matrix generated by the 8 point algorithm as the model; the sample is selected by a random block selecting method, which ensures the uniform distribution and the accuracy; adds sequential probability ratio test(SPRT) on the basis of standard RANSAC, which cut down the overall running time of the algorithm. The experimental results show that this method can not only get higher matching accuracy, but also greatly reduce the computation and improve the matching speed.
3D Reconstruction of human bones based on dictionary learning.

PubMed

Zhang, Binkai; Wang, Xiang; Liang, Xiao; Zheng, Jinjin

2017-11-01

An effective method for reconstructing a 3D model of human bones from computed tomography (CT) image data based on dictionary learning is proposed. In this study, the dictionary comprises the vertices of triangular meshes, and the sparse coefficient matrix indicates the connectivity information. For better reconstruction performance, we proposed a balance coefficient between the approximation and regularisation terms and a method for optimisation. Moreover, we applied a local updating strategy and a mesh-optimisation method to update the dictionary and the sparse matrix, respectively. The two updating steps are iterated alternately until the objective function converges. Thus, a reconstructed mesh could be obtained with high accuracy and regularisation. The experimental results show that the proposed method has the potential to obtain high precision and high-quality triangular meshes for rapid prototyping, medical diagnosis, and tissue engineering. Copyright © 2017 IPEM. Published by Elsevier Ltd. All rights reserved.
Parallel pivoting combined with parallel reduction

NASA Technical Reports Server (NTRS)

Alaghband, Gita

1987-01-01

Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
Representation-Independent Iteration of Sparse Data Arrays

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

An approach is defined that describes a method of iterating over massively large arrays containing sparse data using an approach that is implementation independent of how the contents of the sparse arrays are laid out in memory. What is unique and important here is the decoupling of the iteration over the sparse set of array elements from how they are internally represented in memory. This enables this approach to be backward compatible with existing schemes for representing sparse arrays as well as new approaches. What is novel here is a new approach for efficiently iterating over sparse arrays that is independent of the underlying memory layout representation of the array. A functional interface is defined for implementing sparse arrays in any modern programming language with a particular focus for the Chapel programming language. Examples are provided that show the translation of a loop that computes a matrix vector product into this representation for both the distributed and not-distributed cases. This work is directly applicable to NASA and its High Productivity Computing Systems (HPCS) program that JPL and our current program are engaged in. The goal of this program is to create powerful, scalable, and economically viable high-powered computer systems suitable for use in national security and industry by 2010. This is important to NASA for its computationally intensive requirements for analyzing and understanding the volumes of science data from our returned missions.
A fast, preconditioned conjugate gradient Toeplitz solver

NASA Technical Reports Server (NTRS)

Pan, Victor; Schrieber, Robert

1989-01-01

A simple factorization is given of an arbitrary hermitian, positive definite matrix in which the factors are well-conditioned, hermitian, and positive definite. In fact, given knowledge of the extreme eigenvalues of the original matrix A, an optimal improvement can be achieved, making the condition numbers of each of the two factors equal to the square root of the condition number of A. This technique is to applied to the solution of hermitian, positive definite Toeplitz systems. Large linear systems with hermitian, positive definite Toeplitz matrices arise in some signal processing applications. A stable fast algorithm is given for solving these systems that is based on the preconditioned conjugate gradient method. The algorithm exploits Toeplitz structure to reduce the cost of an iteration to O(n log n) by applying the fast Fourier Transform to compute matrix-vector products. Matrix factorization is used as a preconditioner.
Laplace Inversion of Low-Resolution NMR Relaxometry Data Using Sparse Representation Methods

PubMed Central

Berman, Paula; Levi, Ofer; Parmet, Yisrael; Saunders, Michael; Wiesman, Zeev

2013-01-01

Low-resolution nuclear magnetic resonance (LR-NMR) relaxometry is a powerful tool that can be harnessed for characterizing constituents in complex materials. Conversion of the relaxation signal into a continuous distribution of relaxation components is an ill-posed inverse Laplace transform problem. The most common numerical method implemented today for dealing with this kind of problem is based on L2-norm regularization. However, sparse representation methods via L1 regularization and convex optimization are a relatively new approach for effective analysis and processing of digital images and signals. In this article, a numerical optimization method for analyzing LR-NMR data by including non-negativity constraints and L1 regularization and by applying a convex optimization solver PDCO, a primal-dual interior method for convex objectives, that allows general linear constraints to be treated as linear operators is presented. The integrated approach includes validation of analyses by simulations, testing repeatability of experiments, and validation of the model and its statistical assumptions. The proposed method provides better resolved and more accurate solutions when compared with those suggested by existing tools. © 2013 Wiley Periodicals, Inc. Concepts Magn Reson Part A 42A: 72–88, 2013. PMID:23847452
Laplace Inversion of Low-Resolution NMR Relaxometry Data Using Sparse Representation Methods.

PubMed

Berman, Paula; Levi, Ofer; Parmet, Yisrael; Saunders, Michael; Wiesman, Zeev

2013-05-01

Low-resolution nuclear magnetic resonance (LR-NMR) relaxometry is a powerful tool that can be harnessed for characterizing constituents in complex materials. Conversion of the relaxation signal into a continuous distribution of relaxation components is an ill-posed inverse Laplace transform problem. The most common numerical method implemented today for dealing with this kind of problem is based on L 2 -norm regularization. However, sparse representation methods via L 1 regularization and convex optimization are a relatively new approach for effective analysis and processing of digital images and signals. In this article, a numerical optimization method for analyzing LR-NMR data by including non-negativity constraints and L 1 regularization and by applying a convex optimization solver PDCO, a primal-dual interior method for convex objectives, that allows general linear constraints to be treated as linear operators is presented. The integrated approach includes validation of analyses by simulations, testing repeatability of experiments, and validation of the model and its statistical assumptions. The proposed method provides better resolved and more accurate solutions when compared with those suggested by existing tools. © 2013 Wiley Periodicals, Inc. Concepts Magn Reson Part A 42A: 72-88, 2013.
pySAPC, a python package for sparse affinity propagation clustering: Application to odontogenesis whole genome time series gene-expression data.

PubMed

Cao, Huojun; Amendt, Brad A

2016-11-01

Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis). A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis. pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR <0.1). The three clusters of genes have distinct expression patterns during odontogenesis. Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects. By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016. Published by Elsevier B.V.
Parallel Symmetric Eigenvalue Problem Solvers

DTIC Science & Technology

2015-05-01

tutoring, and mentoring experience as an undergraduate. Last but not least, I thank my family for their love and support. v TABLE OF CONTENTS Page LIST...34 4.6.2 Choice of the Ritz shifts . . . . . . . . . . . . . . . . . . . . 38 4.7 Relationship between TraceMin and...which are determined by the Ritz values of the matrix pencil. We conclude with a discussion of the relationship between TraceMin and simultaneous
Parallel Performance of Linear Solvers and Preconditioners

DTIC Science & Technology

2014-01-01

are produced by a discrete dislocation dynamics ( DDD ) simulation and change with each timestep of the DDD simulation as the dislocation structure...evolves. However, the coefficient—or stiffness matrix— remains constant during the DDD simulation and some expensive matrix factorizations only occur once...discrete dislocation dynamics ( DDD ) simulations. This can be achieved by coupling a DDD simulator for bulk material (Arsenlis et al., 2007) to a

Simulation of High Power Lasers (Preprint)

DTIC Science & Technology

2010-06-01

integration, which requires communication of zonal boundary information after each inner- iteration of the Gauss - Seidel or Jacobi matrix solver. Each...experiment consisting of a supersonic (M~2.2) converging -diverging nozzle section with secondary mass injection in the nozzle expansion downstream of...consists of a section of a supersonic (M~2.2) converging -diverging slit nozzle with one large and two small orifices that inject reactants into the
A new family Jacobian solver for global three-dimensional modeling of atmospheric chemistry

NASA Astrophysics Data System (ADS)

Zhao, Xuepeng; Turco, Richard P.; Shen, Mei

1999-01-01

We present a new technique to solve complex sets of photochemical rate equations that is applicable to global modeling of the troposphere and stratosphere. The approach is based on the concept of "families" of species, whose chemical rate equations are tightly coupled. Variations of species concentrations within a family can be determined by inverting a linearized Jacobian matrix representing the family group. Since this group consists of a relatively small number of species the corresponding Jacobian has a low order (a minimatrix) compared to the Jacobian of the entire system. However, we go further and define a super-family that is the set of all families. The super-family is also solved by linearization and matrix inversion. The resulting Super-Family Matrix Inversion (SFMI) scheme is more stable and accurate than common family approaches. We discuss the numerical structure of the SFMI scheme and apply our algorithms to a comprehensive set of photochemical reactions. To evaluate performance, the SFMI scheme is compared with an optimized Gear solver. We find that the SFMI technique can be at least an order of magnitude more efficient than existing chemical solvers while maintaining relative errors in the calculations of 15% or less over a diurnal cycle. The largest SFMI errors arise at sunrise and sunset and during the evening when species concentrations may be very low. We show that sunrise/sunset errors can be minimized through a careful treatment of photodissociation during these periods; the nighttime deviations are negligible from the point of view of acceptable computational accuracy. The stability and flexibility of the SFMI algorithm should be sufficient for most modeling applications until major improvements in other modeling factors are achieved. In addition, because of its balanced computational design, SFMI can easily be adapted to parallel computing architectures. SFMI thus should allow practical long-term integrations of global chemistry coupled to general circulation and climate models, studies of interannual and interdecadal variability in atmospheric composition, simulations of past multidecadal trends owing to anthropogenic emissions, long-term forecasting associated with projected emissions, and sensitivity analyses for a wide range of physical and chemical parameters.
A Chess-Like Game for Teaching Engineering Students to Solve Large System of Simultaneous Linear Equations

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Mohammed, Ahmed Ali; Kadiam, Subhash

2010-01-01

Solving large (and sparse) system of simultaneous linear equations has been (and continues to be) a major challenging problem for many real-world engineering/science applications [1-2]. For many practical/large-scale problems, the sparse, Symmetrical and Positive Definite (SPD) system of linear equations can be conveniently represented in matrix notation as [A] {x} = {b} , where the square coefficient matrix [A] and the Right-Hand-Side (RHS) vector {b} are known. The unknown solution vector {x} can be efficiently solved by the following step-by-step procedures [1-2]: Reordering phase, Matrix Factorization phase, Forward solution phase, and Backward solution phase. In this research work, a Game-Based Learning (GBL) approach has been developed to help engineering students to understand crucial details about matrix reordering and factorization phases. A "chess-like" game has been developed and can be played by either a single player, or two players. Through this "chess-like" open-ended game, the players/learners will not only understand the key concepts involved in reordering algorithms (based on existing algorithms), but also have the opportunities to "discover new algorithms" which are better than existing algorithms. Implementing the proposed "chess-like" game for matrix reordering and factorization phases can be enhanced by FLASH [3] computer environments, where computer simulation with animated human voice, sound effects, visual/graphical/colorful displays of matrix tables, score (or monetary) awards for the best game players, etc. can all be exploited. Preliminary demonstrations of the developed GBL approach can be viewed by anyone who has access to the internet web-site [4]!
Biclustering sparse binary genomic data.

PubMed

van Uitert, Miranda; Meuleman, Wouter; Wessels, Lodewyk

2008-12-01

Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.
A Sparse Bayesian Approach for Forward-Looking Superresolution Radar Imaging

PubMed Central

Zhang, Yin; Zhang, Yongchao; Huang, Yulin; Yang, Jianyu

2017-01-01

This paper presents a sparse superresolution approach for high cross-range resolution imaging of forward-looking scanning radar based on the Bayesian criterion. First, a novel forward-looking signal model is established as the product of the measurement matrix and the cross-range target distribution, which is more accurate than the conventional convolution model. Then, based on the Bayesian criterion, the widely-used sparse regularization is considered as the penalty term to recover the target distribution. The derivation of the cost function is described, and finally, an iterative expression for minimizing this function is presented. Alternatively, this paper discusses how to estimate the single parameter of Gaussian noise. With the advantage of a more accurate model, the proposed sparse Bayesian approach enjoys a lower model error. Meanwhile, when compared with the conventional superresolution methods, the proposed approach shows high cross-range resolution and small location error. The superresolution results for the simulated point target, scene data, and real measured data are presented to demonstrate the superior performance of the proposed approach. PMID:28604583
Sparse dictionary learning for resting-state fMRI analysis

NASA Astrophysics Data System (ADS)

Lee, Kangjoo; Han, Paul Kyu; Ye, Jong Chul

2011-09-01

Recently, there has been increased interest in the usage of neuroimaging techniques to investigate what happens in the brain at rest. Functional imaging studies have revealed that the default-mode network activity is disrupted in Alzheimer's disease (AD). However, there is no consensus, as yet, on the choice of analysis method for the application of resting-state analysis for disease classification. This paper proposes a novel compressed sensing based resting-state fMRI analysis tool called Sparse-SPM. As the brain's functional systems has shown to have features of complex networks according to graph theoretical analysis, we apply a graph model to represent a sparse combination of information flows in complex network perspectives. In particular, a new concept of spatially adaptive design matrix has been proposed by implementing sparse dictionary learning based on sparsity. The proposed approach shows better performance compared to other conventional methods, such as independent component analysis (ICA) and seed-based approach, in classifying the AD patients from normal using resting-state analysis.
The High-Resolution Wave-Propagation Method Applied to Meso- and Micro-Scale Flows

NASA Technical Reports Server (NTRS)

Ahmad, Nashat N.; Proctor, Fred H.

2012-01-01

The high-resolution wave-propagation method for computing the nonhydrostatic atmospheric flows on meso- and micro-scales is described. The design and implementation of the Riemann solver used for computing the Godunov fluxes is discussed in detail. The method uses a flux-based wave decomposition in which the flux differences are written directly as the linear combination of the right eigenvectors of the hyperbolic system. The two advantages of the technique are: 1) the need for an explicit definition of the Roe matrix is eliminated and, 2) the inclusion of source term due to gravity does not result in discretization errors. The resulting flow solver is conservative and able to resolve regions of large gradients without introducing dispersion errors. The methodology is validated against exact analytical solutions and benchmark cases for non-hydrostatic atmospheric flows.
Improved Convergence and Robustness of USM3D Solutions on Mixed Element Grids (Invited)

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.

2015-01-01

Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Scheme (HANIS), has been developed and implemented. It provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier Stokes (RANS) equations and a nonlinear control of the solution update. Two variants of the new methodology are assessed on four benchmark cases, namely, a zero-pressure gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the baseline solver technology.
Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
TransCut: interactive rendering of translucent cutouts.

PubMed

Li, Dongping; Sun, Xin; Ren, Zhong; Lin, Stephen; Tong, Yiying; Guo, Baining; Zhou, Kun

2013-03-01

We present TransCut, a technique for interactive rendering of translucent objects undergoing fracturing and cutting operations. As the object is fractured or cut open, the user can directly examine and intuitively understand the complex translucent interior, as well as edit material properties through painting on cross sections and recombining the broken pieces—all with immediate and realistic visual feedback. This new mode of interaction with translucent volumes is made possible with two technical contributions. The first is a novel solver for the diffusion equation (DE) over a tetrahedral mesh that produces high-quality results comparable to the state-of-art finite element method (FEM) of Arbree et al. but at substantially higher speeds. This accuracy and efficiency is obtained by computing the discrete divergences of the diffusion equation and constructing the DE matrix using analytic formulas derived for linear finite elements. The second contribution is a multiresolution algorithm to significantly accelerate our DE solver while adapting to the frequent changes in topological structure of dynamic objects. The entire multiresolution DE solver is highly parallel and easily implemented on the GPU. We believe TransCut provides a novel visual effect for heterogeneous translucent objects undergoing fracturing and cutting operations.
Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

NASA Astrophysics Data System (ADS)

Ndiaye, Eugene; Fercoq, Olivier; Gramfort, Alexandre; Leclère, Vincent; Salmon, Joseph

2017-10-01

In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider ℓ 1 penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for uncertainty quantification. In this work, after illustrating numerical difficulties for the Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expensive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.
FPGA architecture and implementation of sparse matrix vector multiplication for the finite element method

NASA Astrophysics Data System (ADS)

Elkurdi, Yousef; Fernández, David; Souleimanov, Evgueni; Giannacopoulos, Dennis; Gross, Warren J.

2008-04-01

The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.
Improved Estimation and Interpretation of Correlations in Neural Circuits

PubMed Central

Yatsenko, Dimitri; Josić, Krešimir; Ecker, Alexander S.; Froudarakis, Emmanouil; Cotton, R. James; Tolias, Andreas S.

2015-01-01

Ambitious projects aim to record the activity of ever larger and denser neuronal populations in vivo. Correlations in neural activity measured in such recordings can reveal important aspects of neural circuit organization. However, estimating and interpreting large correlation matrices is statistically challenging. Estimation can be improved by regularization, i.e. by imposing a structure on the estimate. The amount of improvement depends on how closely the assumed structure represents dependencies in the data. Therefore, the selection of the most efficient correlation matrix estimator for a given neural circuit must be determined empirically. Importantly, the identity and structure of the most efficient estimator informs about the types of dominant dependencies governing the system. We sought statistically efficient estimators of neural correlation matrices in recordings from large, dense groups of cortical neurons. Using fast 3D random-access laser scanning microscopy of calcium signals, we recorded the activity of nearly every neuron in volumes 200 μm wide and 100 μm deep (150–350 cells) in mouse visual cortex. We hypothesized that in these densely sampled recordings, the correlation matrix should be best modeled as the combination of a sparse graph of pairwise partial correlations representing local interactions and a low-rank component representing common fluctuations and external inputs. Indeed, in cross-validation tests, the covariance matrix estimator with this structure consistently outperformed other regularized estimators. The sparse component of the estimate defined a graph of interactions. These interactions reflected the physical distances and orientation tuning properties of cells: The density of positive ‘excitatory’ interactions decreased rapidly with geometric distances and with differences in orientation preference whereas negative ‘inhibitory’ interactions were less selective. Because of its superior performance, this ‘sparse+latent’ estimator likely provides a more physiologically relevant representation of the functional connectivity in densely sampled recordings than the sample correlation matrix. PMID:25826696
The CSM testbed matrix processors internal logic and dataflow descriptions

NASA Technical Reports Server (NTRS)

Regelbrugge, Marc E.; Wright, Mary A.

1988-01-01

This report constitutes the final report for subtask 1 of Task 5 of NASA Contract NAS1-18444, Computational Structural Mechanics (CSM) Research. This report contains a detailed description of the coded workings of selected CSM Testbed matrix processors (i.e., TOPO, K, INV, SSOL) and of the arithmetic utility processor AUS. These processors and the current sparse matrix data structures are studied and documented. Items examined include: details of the data structures, interdependence of data structures, data-blocking logic in the data structures, processor data flow and architecture, and processor algorithmic logic flow.
Energy conserving, linear scaling Born-Oppenheimer molecular dynamics.

PubMed

Cawkwell, M J; Niklasson, Anders M N

2012-10-07

Born-Oppenheimer molecular dynamics simulations with long-term conservation of the total energy and a computational cost that scales linearly with system size have been obtained simultaneously. Linear scaling with a low pre-factor is achieved using density matrix purification with sparse matrix algebra and a numerical threshold on matrix elements. The extended Lagrangian Born-Oppenheimer molecular dynamics formalism [A. M. N. Niklasson, Phys. Rev. Lett. 100, 123004 (2008)] yields microcanonical trajectories with the approximate forces obtained from the linear scaling method that exhibit no systematic drift over hundreds of picoseconds and which are indistinguishable from trajectories computed using exact forces.
Active subspace: toward scalable low-rank learning.

PubMed

Liu, Guangcan; Yan, Shuicheng

2012-12-01

We address the scalability issues in low-rank matrix learning problems. Usually these problems resort to solving nuclear norm regularized optimization problems (NNROPs), which often suffer from high computational complexities if based on existing solvers, especially in large-scale settings. Based on the fact that the optimal solution matrix to an NNROP is often low rank, we revisit the classic mechanism of low-rank matrix factorization, based on which we present an active subspace algorithm for efficiently solving NNROPs by transforming large-scale NNROPs into small-scale problems. The transformation is achieved by factorizing the large solution matrix into the product of a small orthonormal matrix (active subspace) and another small matrix. Although such a transformation generally leads to nonconvex problems, we show that a suboptimal solution can be found by the augmented Lagrange alternating direction method. For the robust PCA (RPCA) (Candès, Li, Ma, & Wright, 2009 ) problem, a typical example of NNROPs, theoretical results verify the suboptimality of the solution produced by our algorithm. For the general NNROPs, we empirically show that our algorithm significantly reduces the computational complexity without loss of optimality.
Reconstruction of Complex Network based on the Noise via QR Decomposition and Compressed Sensing.

PubMed

Li, Lixiang; Xu, Dafei; Peng, Haipeng; Kurths, Jürgen; Yang, Yixian

2017-11-08

It is generally known that the states of network nodes are stable and have strong correlations in a linear network system. We find that without the control input, the method of compressed sensing can not succeed in reconstructing complex networks in which the states of nodes are generated through the linear network system. However, noise can drive the dynamics between nodes to break the stability of the system state. Therefore, a new method integrating QR decomposition and compressed sensing is proposed to solve the reconstruction problem of complex networks under the assistance of the input noise. The state matrix of the system is decomposed by QR decomposition. We construct the measurement matrix with the aid of Gaussian noise so that the sparse input matrix can be reconstructed by compressed sensing. We also discover that noise can build a bridge between the dynamics and the topological structure. Experiments are presented to show that the proposed method is more accurate and more efficient to reconstruct four model networks and six real networks by the comparisons between the proposed method and only compressed sensing. In addition, the proposed method can reconstruct not only the sparse complex networks, but also the dense complex networks.
Signal Sampling for Efficient Sparse Representation of Resting State FMRI Data

PubMed Central

Ge, Bao; Makkie, Milad; Wang, Jin; Zhao, Shijie; Jiang, Xi; Li, Xiang; Lv, Jinglei; Zhang, Shu; Zhang, Wei; Han, Junwei; Guo, Lei; Liu, Tianming

2015-01-01

As the size of brain imaging data such as fMRI grows explosively, it provides us with unprecedented and abundant information about the brain. How to reduce the size of fMRI data but not lose much information becomes a more and more pressing issue. Recent literature studies tried to deal with it by dictionary learning and sparse representation methods, however, their computation complexities are still high, which hampers the wider application of sparse representation method to large scale fMRI datasets. To effectively address this problem, this work proposes to represent resting state fMRI (rs-fMRI) signals of a whole brain via a statistical sampling based sparse representation. First we sampled the whole brain’s signals via different sampling methods, then the sampled signals were aggregate into an input data matrix to learn a dictionary, finally this dictionary was used to sparsely represent the whole brain’s signals and identify the resting state networks. Comparative experiments demonstrate that the proposed signal sampling framework can speed-up by ten times in reconstructing concurrent brain networks without losing much information. The experiments on the 1000 Functional Connectomes Project further demonstrate its effectiveness and superiority. PMID:26646924
Joint Smoothed l₀-Norm DOA Estimation Algorithm for Multiple Measurement Vectors in MIMO Radar.

PubMed

Liu, Jing; Zhou, Weidong; Juwono, Filbert H

2017-05-08

Direction-of-arrival (DOA) estimation is usually confronted with a multiple measurement vector (MMV) case. In this paper, a novel fast sparse DOA estimation algorithm, named the joint smoothed l 0 -norm algorithm, is proposed for multiple measurement vectors in multiple-input multiple-output (MIMO) radar. To eliminate the white or colored Gaussian noises, the new method first obtains a low-complexity high-order cumulants based data matrix. Then, the proposed algorithm designs a joint smoothed function tailored for the MMV case, based on which joint smoothed l 0 -norm sparse representation framework is constructed. Finally, for the MMV-based joint smoothed function, the corresponding gradient-based sparse signal reconstruction is designed, thus the DOA estimation can be achieved. The proposed method is a fast sparse representation algorithm, which can solve the MMV problem and perform well for both white and colored Gaussian noises. The proposed joint algorithm is about two orders of magnitude faster than the l 1 -norm minimization based methods, such as l 1 -SVD (singular value decomposition), RV (real-valued) l 1 -SVD and RV l 1 -SRACV (sparse representation array covariance vectors), and achieves better DOA estimation performance.
Sparsity-Cognizant Algorithms with Applications to Communications, Signal Processing, and the Smart Grid

NASA Astrophysics Data System (ADS)

Zhu, Hao

Sparsity plays an instrumental role in a plethora of scientific fields, including statistical inference for variable selection, parsimonious signal representations, and solving under-determined systems of linear equations - what has led to the ground-breaking result of compressive sampling (CS). This Thesis leverages exciting ideas of sparse signal reconstruction to develop sparsity-cognizant algorithms, and analyze their performance. The vision is to devise tools exploiting the 'right' form of sparsity for the 'right' application domain of multiuser communication systems, array signal processing systems, and the emerging challenges in the smart power grid. Two important power system monitoring tasks are addressed first by capitalizing on the hidden sparsity. To robustify power system state estimation, a sparse outlier model is leveraged to capture the possible corruption in every datum, while the problem nonconvexity due to nonlinear measurements is handled using the semidefinite relaxation technique. Different from existing iterative methods, the proposed algorithm approximates well the global optimum regardless of the initialization. In addition, for enhanced situational awareness, a novel sparse overcomplete representation is introduced to capture (possibly multiple) line outages, and develop real-time algorithms for solving the combinatorially complex identification problem. The proposed algorithms exhibit near-optimal performance while incurring only linear complexity in the number of lines, which makes it possible to quickly bring contingencies to attention. This Thesis also accounts for two basic issues in CS, namely fully-perturbed models and the finite alphabet property. The sparse total least-squares (S-TLS) approach is proposed to furnish CS algorithms for fully-perturbed linear models, leading to statistically optimal and computationally efficient solvers. The S-TLS framework is well motivated for grid-based sensing applications and exhibits higher accuracy than existing sparse algorithms. On the other hand, exploiting the finite alphabet of unknown signals emerges naturally in communication systems, along with sparsity coming from the low activity of each user. Compared to approaches only accounting for either one of the two, joint exploitation of both leads to statistically optimal detectors with improved error performance.

From 2D to 3D modelling in long term tectonics: Modelling challenges and HPC solutions (Invited)

NASA Astrophysics Data System (ADS)

Le Pourhiet, L.; May, D.

2013-12-01

Over the last decades, 3D thermo-mechanical codes have been made available to the long term tectonics community either as open source (Underworld, Gale) or more limited access (Fantom, Elvis3D, Douar, LaMem etc ...). However, to date, few published results using these methods have included the coupling between crustal and lithospheric dynamics at large strain. The fact that these computations are computational expensive is not the primary reason for the relatively slow development of 3D modeling in the long term tectonics community, as compare to the rapid development observed within the mantle dynamic community, or in the short-term tectonics field. Long term tectonics problems have specific issues not found in either of these two field, including; large strain (not an issue for short-term), the inclusion of free surface and the occurence of large viscosity contrasts. The first issue is typically eliminated using a combined marker-ALE method instead of fully lagrangian method, however, the marker-ALE approach can pose some algorithmic challenges in a massively parallel environment. The two last issues are more problematic because they affect the convergence of the linear/non-linear solver and the memory cost. Two options have been tested so far, using low order element and solving with a sparse direct solver, or using higher order stable elements together with a multi-grid solver. The first options, is simpler to code and to use but reaches its limit at around 80^3 low order elements. The second option requires more operations but allows using iterative solver on extremely large computers. In this presentation, I will describe the design philosophy and highlight results obtained using a code from the second-class method. The presentation will be oriented from an end-user point of view, using an application from 3D continental break up to illustrate key concepts. The description will proceed point by point from implementing physics into the code, to dealing with specific issues related to solving the discrete system of non linear equations.
Deploy production sliding mesh capability with linear solver benchmarking.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Domino, Stefan P.; Thomas, Stephen; Barone, Matthew F.

Wind applications require the ability to simulate rotating blades. To support this use-case, a novel design-order sliding mesh algorithm has been developed and deployed. The hybrid method combines the control volume finite element methodology (CVFEM) with concepts found within a discontinuous Galerkin (DG) finite element method (FEM) to manage a sliding mesh. The method has been demonstrated to be design-order for the tested polynomial basis (P=1 and P=2) and has been deployed to provide production simulation capability for a Vestas V27 (225 kW) wind turbine. Other stationary and canonical rotating ow simulations are also presented. As the majority of wind-energymore » applications are driving extensive usage of hybrid meshes, a foundational study that outlines near-wall numerical behavior for a variety of element topologies is presented. Results indicate that the proposed nonlinear stabilization operator (NSO) is an effective stabilization methodology to control Gibbs phenomena at large cell Peclet numbers. The study also provides practical mesh resolution guidelines for future analysis efforts. Application-driven performance and algorithmic improvements have been carried out to increase robustness of the scheme on hybrid production wind energy meshes. Specifically, the Kokkos-based Nalu Kernel construct outlined in the FY17/Q4 ExaWind milestone has been transitioned to the hybrid mesh regime. This code base is exercised within a full V27 production run. Simulation timings for parallel search and custom ghosting are presented. As the low-Mach application space requires implicit matrix solves, the cost of matrix reinitialization has been evaluated on a variety of production meshes. Results indicate that at low element counts, i.e., fewer than 100 million elements, matrix graph initialization and preconditioner setup times are small. However, as mesh sizes increase, e.g., 500 million elements, simulation time associated with \\setup-up" costs can increase to nearly 50% of overall simulation time when using the full Tpetra solver stack and nearly 35% when using a mixed Tpetra- Hypre-based solver stack. The report also highlights the project achievement of surpassing the 1 billion element mesh scale for a production V27 hybrid mesh. A detailed timing breakdown is presented that again suggests work to be done in the setup events associated with the linear system. In order to mitigate these initialization costs, several application paths have been explored, all of which are designed to reduce the frequency of matrix reinitialization. Methods such as removing Jacobian entries on the dynamic matrix columns (in concert with increased inner equation iterations), and lagging of Jacobian entries have reduced setup times at the cost of numerical stability. Artificially increasing, or bloating, the matrix stencil to ensure that full Jacobians are included is developed with results suggesting that this methodology is useful in decreasing reinitialization events without loss of matrix contributions. With the above foundational advances in computational capability, the project is well positioned to begin scientific inquiry on a variety of wind-farm physics such as turbine/turbine wake interactions.« less
Decoding the encoding of functional brain networks: An fMRI classification comparison of non-negative matrix factorization (NMF), independent component analysis (ICA), and sparse coding algorithms.

PubMed

Xie, Jianwen; Douglas, Pamela K; Wu, Ying Nian; Brody, Arthur L; Anderson, Ariana E

2017-04-15

Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet other mathematical constraints provide alternate biologically-plausible frameworks for generating brain networks. Non-negative matrix factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks within scan for different constraints are used as basis functions to encode observed functional activity. These encodings are then decoded using machine learning, by using the time series weights to predict within scan whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. The sparse coding algorithm of L1 Regularized Learning outperformed 4 variations of ICA (p<0.001) for predicting the task being performed within each scan using artifact-cleaned components. The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy compared to the ICA and sparse coding algorithms. Holding constant the effect of the extraction algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<0.001). Lower classification accuracy occurred when the extracted spatial maps contained more CSF regions (p<0.001). The success of sparse coding algorithms suggests that algorithms which enforce sparsity, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA. Negative BOLD signal may capture task-related activations. Copyright © 2017 Elsevier B.V. All rights reserved.
beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types

PubMed Central

Pagès, Hervé

2018-01-01

Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set. PMID:29723188
beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types.

PubMed

Lun, Aaron T L; Pagès, Hervé; Smith, Mike L

2018-05-01

Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.
(EDMUNDS, WA) WILDLAND FIRE EMISSIONS MODELING: INTEGRATING BLUESKY AND SMOKE

EPA Science Inventory

This presentation is a status update of the BlueSky emissions modeling system. BlueSky-EM has been coupled with the Sparse Matrix Operational Kernel Emissions (SMOKE) system, and is now available as a tool for estimating emissions from wildland fires
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Domino, Stefan P.; Ananthan, Shreyas; Knaus, Robert C.

The former Nalu interior heterogeneous algorithm design, which was originally designed to manage matrix assembly operations over all elemental topology types, has been modified to operate over homogeneous collections of mesh entities. This newly templated kernel design allows for removal of workset variable resize operations that were formerly required at each loop over a Sierra ToolKit (STK) bucket (nominally, 512 entities in size). Extensive usage of the Standard Template Library (STL) std::vector has been removed in favor of intrinsic Kokkos memory views. In this milestone effort, the transition to Kokkos as the underlying infrastructure to support performance and portability onmore » many-core architectures has been deployed for key matrix algorithmic kernels. A unit-test driven design effort has developed a homogeneous entity algorithm that employs a team-based thread parallelism construct. The STK Single Instruction Multiple Data (SIMD) infrastructure is used to interleave data for improved vectorization. The collective algorithm design, which allows for concurrent threading and SIMD management, has been deployed for the core low-Mach element- based algorithm. Several tests to ascertain SIMD performance on Intel KNL and Haswell architectures have been carried out. The performance test matrix includes evaluation of both low- and higher-order methods. The higher-order low-Mach methodology builds on polynomial promotion of the core low-order control volume nite element method (CVFEM). Performance testing of the Kokkos-view/SIMD design indicates low-order matrix assembly kernel speed-up ranging between two and four times depending on mesh loading and node count. Better speedups are observed for higher-order meshes (currently only P=2 has been tested) especially on KNL. The increased workload per element on higher-order meshes bene ts from the wide SIMD width on KNL machines. Combining multiple threads with SIMD on KNL achieves a 4.6x speedup over the baseline, with assembly timings faster than that observed on Haswell architecture. The computational workload of higher-order meshes, therefore, seems ideally suited for the many-core architecture and justi es further exploration of higher-order on NGP platforms. A Trilinos/Tpetra-based multi-threaded GMRES preconditioned by symmetric Gauss Seidel (SGS) represents the core solver infrastructure for the low-Mach advection/diffusion implicit solves. The threaded solver stack has been tested on small problems on NREL's Peregrine system using the newly developed and deployed Kokkos-view/SIMD kernels. fforts are underway to deploy the Tpetra-based solver stack on NERSC Cori system to benchmark its performance at scale on KNL machines.« less
Spherical space Bessel-Legendre-Fourier localized modes solver for electromagnetic waves.

PubMed

Alzahrani, Mohammed A; Gauthier, Robert C

2015-10-05

Maxwell's vector wave equations are solved for dielectric configurations that match the symmetry of a spherical computational domain. The electric or magnetic field components and the inverse of the dielectric profile are series expansion defined using basis functions composed of the lowest order spherical Bessel function, polar angle single index dependant Legendre polynomials and azimuthal complex exponential (BLF). The series expressions and non-traditional form of the basis functions result in an eigenvalue matrix formulation of Maxwell's equations that are relatively compact and accurately solvable on a desktop PC. The BLF matrix returns the frequencies and field profiles for steady states modes. The key steps leading to the matrix populating expressions are provided. The validity of the numerical technique is confirmed by comparing the results of computations to those published using complementary techniques.
Scalable Nonparametric Low-Rank Kernel Learning Using Block Coordinate Descent.

PubMed

Hu, En-Liang; Kwok, James T

2015-09-01

Nonparametric kernel learning (NPKL) is a flexible approach to learn the kernel matrix directly without assuming any parametric form. It can be naturally formulated as a semidefinite program (SDP), which, however, is not very scalable. To address this problem, we propose the combined use of low-rank approximation and block coordinate descent (BCD). Low-rank approximation avoids the expensive positive semidefinite constraint in the SDP by replacing the kernel matrix variable with V(T)V, where V is a low-rank matrix. The resultant nonlinear optimization problem is then solved by BCD, which optimizes each column of V sequentially. It can be shown that the proposed algorithm has nice convergence properties and low computational complexities. Experiments on a number of real-world data sets show that the proposed algorithm outperforms state-of-the-art NPKL solvers.
Nonleachable Imidazolium-Incorporated Composite for Disruption of Bacterial Clustering, Exopolysaccharide-Matrix Assembly, and Enhanced Biofilm Removal.

PubMed

Hwang, Geelsu; Koltisko, Bernard; Jin, Xiaoming; Koo, Hyun

2017-11-08

Surface-grown bacteria and production of an extracellular polymeric matrix modulate the assembly of highly cohesive and firmly attached biofilms, making them difficult to remove from solid surfaces. Inhibition of cell growth and inactivation of matrix-producing bacteria can impair biofilm formation and facilitate removal. Here, we developed a novel nonleachable antibacterial composite with potent antibiofilm activity by directly incorporating polymerizable imidazolium-containing resin (antibacterial resin with carbonate linkage; ABR-C) into a methacrylate-based scaffold (ABR-modified composite; ABR-MC) using an efficient yet simplified chemistry. Low-dose inclusion of imidazolium moiety (∼2 wt %) resulted in bioactivity with minimal cytotoxicity without compromising mechanical integrity of the restorative material. The antibiofilm properties of ABR-MC were assessed using an exopolysaccharide-matrix-producing (EPS-matrix-producing) oral pathogen (Streptococcus mutans) in an experimental biofilm model. Using high-resolution confocal fluorescence imaging and biophysical methods, we observed remarkable disruption of bacterial accumulation and defective 3D matrix structure on the surface of ABR-MC. Specifically, the antibacterial composite impaired the ability of S. mutans to form organized bacterial clusters on the surface, resulting in altered biofilm architecture with sparse cell accumulation and reduced amounts of EPS matrix (versus control composite). Biofilm topology analyses on the control composite revealed a highly organized and weblike EPS structure that tethers the bacterial clusters to each other and to the surface, forming a highly cohesive unit. In contrast, such a structured matrix was absent on the surface of ABR-MC with mostly sparse and amorphous EPS, indicating disruption in the biofilm physical stability. Consistent with lack of structural organization, the defective biofilm on the surface of ABR-MC was readily detached when subjected to low shear stress, while most of the biofilm biomass remained on the control surface. Altogether, we demonstrate a new nonleachable antibacterial composite with excellent antibiofilm activity without affecting its mechanical properties, which may serve as a platform for development of alternative antifouling biomaterials.
Sparse matrix beamforming and image reconstruction for 2-D HIFU monitoring using harmonic motion imaging for focused ultrasound (HMIFU) with in vitro validation.

PubMed

Hou, Gary Y; Provost, Jean; Grondin, Julien; Wang, Shutao; Marquet, Fabrice; Bunting, Ethan; Konofagou, Elisa E

2014-11-01

Harmonic motion imaging for focused ultrasound (HMIFU) utilizes an amplitude-modulated HIFU beam to induce a localized focal oscillatory motion simultaneously estimated. The objective of this study is to develop and show the feasibility of a novel fast beamforming algorithm for image reconstruction using GPU-based sparse-matrix operation with real-time feedback. In this study, the algorithm was implemented onto a fully integrated, clinically relevant HMIFU system. A single divergent transmit beam was used while fast beamforming was implemented using a GPU-based delay-and-sum method and a sparse-matrix operation. Axial HMI displacements were then estimated from the RF signals using a 1-D normalized cross-correlation method and streamed to a graphic user interface with frame rates up to 15 Hz, a 100-fold increase compared to conventional CPU-based processing. The real-time feedback rate does not require interrupting the HIFU treatment. Results in phantom experiments showed reproducible HMI images and monitoring of 22 in vitro HIFU treatments using the new 2-D system demonstrated reproducible displacement imaging, and monitoring of 22 in vitro HIFU treatments using the new 2-D system showed a consistent average focal displacement decrease of 46.7 ±14.6% during lesion formation. Complementary focal temperature monitoring also indicated an average rate of displacement increase and decrease with focal temperature at 0.84±1.15%/(°)C, and 2.03±0.93%/(°)C , respectively. These results reinforce the HMIFU capability of estimating and monitoring stiffness related changes in real time. Current ongoing studies include clinical translation of the presented system for monitoring of HIFU treatment for breast and pancreatic tumor applications.
Interpretation of the Precision Matrix and Its Application in Estimating Sparse Brain Connectivity during Sleep Spindles from Human Electrocorticography Recordings

PubMed Central

Das, Anup; Sampson, Aaron L.; Lainscsek, Claudia; Muller, Lyle; Lin, Wutu; Doyle, John C.; Cash, Sydney S.; Halgren, Eric; Sejnowski, Terrence J.

2017-01-01

The correlation method from brain imaging has been used to estimate functional connectivity in the human brain. However, brain regions might show very high correlation even when the two regions are not directly connected due to the strong interaction of the two regions with common input from a third region. One previously proposed solution to this problem is to use a sparse regularized inverse covariance matrix or precision matrix (SRPM) assuming that the connectivity structure is sparse. This method yields partial correlations to measure strong direct interactions between pairs of regions while simultaneously removing the influence of the rest of the regions, thus identifying regions that are conditionally independent. To test our methods, we first demonstrated conditions under which the SRPM method could indeed find the true physical connection between a pair of nodes for a spring-mass example and an RC circuit example. The recovery of the connectivity structure using the SRPM method can be explained by energy models using the Boltzmann distribution. We then demonstrated the application of the SRPM method for estimating brain connectivity during stage 2 sleep spindles from human electrocorticography (ECoG) recordings using an 8 × 8 electrode array. The ECoG recordings that we analyzed were from a 32-year-old male patient with long-standing pharmaco-resistant left temporal lobe complex partial epilepsy. Sleep spindles were automatically detected using delay differential analysis and then analyzed with SRPM and the Louvain method for community detection. We found spatially localized brain networks within and between neighboring cortical areas during spindles, in contrast to the case when sleep spindles were not present. PMID:28095202
Improvements in sparse matrix operations of NASTRAN

NASA Technical Reports Server (NTRS)

Harano, S.

1980-01-01

A "nontransmit" packing routine was added to NASTRAN to allow matrix data to be refered to directly from the input/output buffer. Use of the packing routine permits various routines for matrix handling to perform a direct reference to the input/output buffer if data addresses have once been received. The packing routine offers a buffer by buffer backspace feature for efficient backspacing in sequential access. Unlike a conventional backspacing that needs twice back record for a single read of one record (one column), this feature omits overlapping of READ operation and back record. It eliminates the necessity of writing, in decomposition of a symmetric matrix, of a portion of the matrix to its upper triangular matrix from the last to the first columns of the symmetric matrix, thus saving time for generating the upper triangular matrix. Only a lower triangular matrix must be written onto the secondary storage device, bringing 10 to 30% reduction in use of the disk space of the storage device.
Functional brain networks reconstruction using group sparsity-regularized learning.

PubMed

Zhao, Qinghua; Li, Will X Y; Jiang, Xi; Lv, Jinglei; Lu, Jianfeng; Liu, Tianming

2018-06-01

Investigating functional brain networks and patterns using sparse representation of fMRI data has received significant interests in the neuroimaging community. It has been reported that sparse representation is effective in reconstructing concurrent and interactive functional brain networks. To date, most of data-driven network reconstruction approaches rarely take consideration of anatomical structures, which are the substrate of brain function. Furthermore, it has been rarely explored whether structured sparse representation with anatomical guidance could facilitate functional networks reconstruction. To address this problem, in this paper, we propose to reconstruct brain networks utilizing the structure guided group sparse regression (S2GSR) in which 116 anatomical regions from the AAL template, as prior knowledge, are employed to guide the network reconstruction when performing sparse representation of whole-brain fMRI data. Specifically, we extract fMRI signals from standard space aligned with the AAL template. Then by learning a global over-complete dictionary, with the learned dictionary as a set of features (regressors), the group structured regression employs anatomical structures as group information to regress whole brain signals. Finally, the decomposition coefficients matrix is mapped back to the brain volume to represent functional brain networks and patterns. We use the publicly available Human Connectome Project (HCP) Q1 dataset as the test bed, and the experimental results indicate that the proposed anatomically guided structure sparse representation is effective in reconstructing concurrent functional brain networks.
The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGhee, J.M.; Roberts, R.M.; Morel, J.E.

1997-06-01

A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
Parallel filtering in global gyrokinetic simulations

NASA Astrophysics Data System (ADS)

Jolliet, S.; McMillan, B. F.; Villard, L.; Vernay, T.; Angelino, P.; Tran, T. M.; Brunner, S.; Bottino, A.; Idomura, Y.

2012-02-01

In this work, a Fourier solver [B.F. McMillan, S. Jolliet, A. Bottino, P. Angelino, T.M. Tran, L. Villard, Comp. Phys. Commun. 181 (2010) 715] is implemented in the global Eulerian gyrokinetic code GT5D [Y. Idomura, H. Urano, N. Aiba, S. Tokuda, Nucl. Fusion 49 (2009) 065029] and in the global Particle-In-Cell code ORB5 [S. Jolliet, A. Bottino, P. Angelino, R. Hatzky, T.M. Tran, B.F. McMillan, O. Sauter, K. Appert, Y. Idomura, L. Villard, Comp. Phys. Commun. 177 (2007) 409] in order to reduce the memory of the matrix associated with the field equation. This scheme is verified with linear and nonlinear simulations of turbulence. It is demonstrated that the straight-field-line angle is the coordinate that optimizes the Fourier solver, that both linear and nonlinear turbulent states are unaffected by the parallel filtering, and that the k∥ spectrum is independent of plasma size at fixed normalized poloidal wave number.
Factorizing the factorization - a spectral-element solver for elliptic equations with linear operation count

NASA Astrophysics Data System (ADS)

Huismann, Immo; Stiller, Jörg; Fröhlich, Jochen

2017-10-01

The paper proposes a novel factorization technique for static condensation of a spectral-element discretization matrix that yields a linear operation count of just 13N multiplications for the residual evaluation, where N is the total number of unknowns. In comparison to previous work it saves a factor larger than 3 and outpaces unfactored variants for all polynomial degrees. Using the new technique as a building block for a preconditioned conjugate gradient method yields linear scaling of the runtime with N which is demonstrated for polynomial degrees from 2 to 32. This makes the spectral-element method cost effective even for low polynomial degrees. Moreover, the dependence of the iterative solution on the element aspect ratio is addressed, showing only a slight increase in the number of iterations for aspect ratios up to 128. Hence, the solver is very robust for practical applications.
An interior-point method-based solver for simulation of aircraft parts riveting

NASA Astrophysics Data System (ADS)

Stefanova, Maria; Yakunin, Sergey; Petukhova, Margarita; Lupuleac, Sergey; Kokkolaras, Michael

2018-05-01

The particularities of the aircraft parts riveting process simulation necessitate the solution of a large amount of contact problems. A primal-dual interior-point method-based solver is proposed for solving such problems efficiently. The proposed method features a worst case polynomial complexity bound ? on the number of iterations, where n is the dimension of the problem and ε is a threshold related to desired accuracy. In practice, the convergence is often faster than this worst case bound, which makes the method applicable to large-scale problems. The computational challenge is solving the system of linear equations because the associated matrix is ill conditioned. To that end, the authors introduce a preconditioner and a strategy for determining effective initial guesses based on the physics of the problem. Numerical results are compared with ones obtained using the Goldfarb-Idnani algorithm. The results demonstrate the efficiency of the proposed method.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.

2016-01-01

Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.

Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frinks, Neal T.

2016-01-01

Several improvements to the mixed-elementUSM3Ddiscretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Weighted low-rank sparse model via nuclear norm minimization for bearing fault detection

NASA Astrophysics Data System (ADS)

Du, Zhaohui; Chen, Xuefeng; Zhang, Han; Yang, Boyuan; Zhai, Zhi; Yan, Ruqiang

2017-07-01

It is a fundamental task in the machine fault diagnosis community to detect impulsive signatures generated by the localized faults of bearings. The main goal of this paper is to exploit the low-rank physical structure of periodic impulsive features and further establish a weighted low-rank sparse model for bearing fault detection. The proposed model mainly consists of three basic components: an adaptive partition window, a nuclear norm regularization and a weighted sequence. Firstly, due to the periodic repetition mechanism of impulsive feature, an adaptive partition window could be designed to transform the impulsive feature into a data matrix. The highlight of partition window is to accumulate all local feature information and align them. Then, all columns of the data matrix share similar waveforms and a core physical phenomenon arises, i.e., these singular values of the data matrix demonstrates a sparse distribution pattern. Therefore, a nuclear norm regularization is enforced to capture that sparse prior. However, the nuclear norm regularization treats all singular values equally and thus ignores one basic fact that larger singular values have more information volume of impulsive features and should be preserved as much as possible. Therefore, a weighted sequence with adaptively tuning weights inversely proportional to singular amplitude is adopted to guarantee the distribution consistence of large singular values. On the other hand, the proposed model is difficult to solve due to its non-convexity and thus a new algorithm is developed to search one satisfying stationary solution through alternatively implementing one proximal operator operation and least-square fitting. Moreover, the sensitivity analysis and selection principles of algorithmic parameters are comprehensively investigated through a set of numerical experiments, which shows that the proposed method is robust and only has a few adjustable parameters. Lastly, the proposed model is applied to the wind turbine (WT) bearing fault detection and its effectiveness is sufficiently verified. Compared with the current popular bearing fault diagnosis techniques, wavelet analysis and spectral kurtosis, our model achieves a higher diagnostic accuracy.
COMPUTATION OF GLOBAL PHOTOCHEMISTRY WITH SMVGEAR II (R823186)

EPA Science Inventory

A computer model was developed to simulate global gas-phase photochemistry. The model solves chemical equations with SMVGEAR II, a sparse-matrix, vectorized Gear-type code. To obtain SMVGEAR II, the original SMVGEAR code was modified to allow computation of different sets of chem...
High efficient optical remote sensing images acquisition for nano-satellite: reconstruction algorithms

NASA Astrophysics Data System (ADS)

Liu, Yang; Li, Feng; Xin, Lei; Fu, Jie; Huang, Puming

2017-10-01

Large amount of data is one of the most obvious features in satellite based remote sensing systems, which is also a burden for data processing and transmission. The theory of compressive sensing(CS) has been proposed for almost a decade, and massive experiments show that CS has favorable performance in data compression and recovery, so we apply CS theory to remote sensing images acquisition. In CS, the construction of classical sensing matrix for all sparse signals has to satisfy the Restricted Isometry Property (RIP) strictly, which limits applying CS in practical in image compression. While for remote sensing images, we know some inherent characteristics such as non-negative, smoothness and etc.. Therefore, the goal of this paper is to present a novel measurement matrix that breaks RIP. The new sensing matrix consists of two parts: the standard Nyquist sampling matrix for thumbnails and the conventional CS sampling matrix. Since most of sun-synchronous based satellites fly around the earth 90 minutes and the revisit cycle is also short, lots of previously captured remote sensing images of the same place are available in advance. This drives us to reconstruct remote sensing images through a deep learning approach with those measurements from the new framework. Therefore, we propose a novel deep convolutional neural network (CNN) architecture which takes in undersampsing measurements as input and outputs an intermediate reconstruction image. It is well known that the training procedure to the network costs long time, luckily, the training step can be done only once, which makes the approach attractive for a host of sparse recovery problems.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Oliker, Leonid; Vuduc, Richard

2007-01-01

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Multitasking the Davidson algorithm for the large, sparse eigenvalue problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Umar, V.M.; Fischer, C.F.

1989-01-01

The authors report how the Davidson algorithm, developed for handling the eigenvalue problem for large and sparse matrices arising in quantum chemistry, was modified for use in atomic structure calculations. To date these calculations have used traditional eigenvalue methods, which limit the range of feasible calculations because of their excessive memory requirements and unsatisfactory performance attributed to time-consuming and costly processing of zero valued elements. The replacement of a traditional matrix eigenvalue method by the Davidson algorithm reduced these limitations. Significant speedup was found, which varied with the size of the underlying problem and its sparsity. Furthermore, the range ofmore » matrix sizes that can be manipulated efficiently was expended by more than one order or magnitude. On the CRAY X-MP the code was vectorized and the importance of gather/scatter analyzed. A parallelized version of the algorithm obtained an additional 35% reduction in execution time. Speedup due to vectorization and concurrency was also measured on the Alliant FX/8.« less
Holographic implementation of a binary associative memory for improved recognition

NASA Astrophysics Data System (ADS)

Bandyopadhyay, Somnath; Ghosh, Ajay; Datta, Asit K.

1998-03-01

Neural network associate memory has found wide application sin pattern recognition techniques. We propose an associative memory model for binary character recognition. The interconnection strengths of the memory are binary valued. The concept of sparse coding is sued to enhance the storage efficiency of the model. The question of imposed preconditioning of pattern vectors, which is inherent in a sparsely coded conventional memory, is eliminated by using a multistep correlation technique an the ability of correct association is enhanced in a real-time application. A potential optoelectronic implementation of the proposed associative memory is also described. The learning and recall is possible by using digital optical matrix-vector multiplication, where full use of parallelism and connectivity of optics is made. A hologram is used in the experiment as a longer memory (LTM) for storing all input information. The short-term memory or the interconnection weight matrix required during the recall process is configured by retrieving the necessary information from the holographic LTM.
Multiprocessor sparse L/U decomposition with controlled fill-in

NASA Technical Reports Server (NTRS)

Alaghband, G.; Jordan, H. F.

1985-01-01

Generation of the maximal compatibles of pivot elements for a class of small sparse matrices is studied. The algorithm involves a binary tree search and has a complexity exponential in the order of the matrix. Different strategies for selection of a set of compatible pivots based on the Markowitz criterion are investigated. The competing issues of parallelism and fill-in generation are studied and results are provided. A technque for obtaining an ordered compatible set directly from the ordered incompatible table is given. This technique generates a set of compatible pivots with the property of generating few fills. A new hueristic algorithm is then proposed that combines the idea of an ordered compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. Finally, an elimination set to reduce the matrix is selected. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices are presented and analyzed.
A method of vehicle license plate recognition based on PCANet and compressive sensing

NASA Astrophysics Data System (ADS)

Ye, Xianyi; Min, Feng

2018-03-01

The manual feature extraction of the traditional method for vehicle license plates has no good robustness to change in diversity. And the high feature dimension that is extracted with Principal Component Analysis Network (PCANet) leads to low classification efficiency. For solving these problems, a method of vehicle license plate recognition based on PCANet and compressive sensing is proposed. First, PCANet is used to extract the feature from the images of characters. And then, the sparse measurement matrix which is a very sparse matrix and consistent with Restricted Isometry Property (RIP) condition of the compressed sensing is used to reduce the dimensions of extracted features. Finally, the Support Vector Machine (SVM) is used to train and recognize the features whose dimension has been reduced. Experimental results demonstrate that the proposed method has better performance than Convolutional Neural Network (CNN) in the recognition and time. Compared with no compression sensing, the proposed method has lower feature dimension for the increase of efficiency.
Three dimensional iterative beam propagation method for optical waveguide devices

NASA Astrophysics Data System (ADS)

Ma, Changbao; Van Keuren, Edward

2006-10-01

The finite difference beam propagation method (FD-BPM) is an effective model for simulating a wide range of optical waveguide structures. The classical FD-BPMs are based on the Crank-Nicholson scheme, and in tridiagonal form can be solved using the Thomas method. We present a different type of algorithm for 3-D structures. In this algorithm, the wave equation is formulated into a large sparse matrix equation which can be solved using iterative methods. The simulation window shifting scheme and threshold technique introduced in our earlier work are utilized to overcome the convergence problem of iterative methods for large sparse matrix equation and wide-angle simulations. This method enables us to develop higher-order 3-D wide-angle (WA-) BPMs based on Pade approximant operators and the multistep method, which are commonly used in WA-BPMs for 2-D structures. Simulations using the new methods will be compared to the analytical results to assure its effectiveness and applicability.
Fragmented habitats of traditional fruit orchards are important for dead wood-dependent beetles associated with open canopy deciduous woodlands.

PubMed

Horak, Jakub

2014-06-01

The conservation of traditional fruit orchards might be considered to be a fashion, and many people might find it difficult to accept that these artificial habitats can be significant for overall biodiversity. The main aim of this study was to identify possible roles of traditional fruit orchards for dead wood-dependent (saproxylic) beetles. The study was performed in the Central European landscape in the Czech Republic, which was historically covered by lowland sparse deciduous woodlands. Window traps were used to catch saproxylic beetles in 25 traditional fruit orchards. The species richness, as one of the best indicators of biodiversity, was positively driven by very high canopy openness and the rising proportion of deciduous woodlands in the matrix of the surrounding landscape. Due to the disappearance of natural and semi-natural habitats (i.e., sparse deciduous woodlands) of saproxylic beetles, orchards might complement the functions of suitable habitat fragments as the last biotic islands in the matrix of the cultural Central European landscape.
Harnessing data structure for recovery of randomly missing structural vibration responses time history: Sparse representation versus low-rank structure

NASA Astrophysics Data System (ADS)

Yang, Yongchao; Nagarajaiah, Satish

2016-06-01

Randomly missing data of structural vibration responses time history often occurs in structural dynamics and health monitoring. For example, structural vibration responses are often corrupted by outliers or erroneous measurements due to sensor malfunction; in wireless sensing platforms, data loss during wireless communication is a common issue. Besides, to alleviate the wireless data sampling or communication burden, certain accounts of data are often discarded during sampling or before transmission. In these and other applications, recovery of the randomly missing structural vibration responses from the available, incomplete data, is essential for system identification and structural health monitoring; it is an ill-posed inverse problem, however. This paper explicitly harnesses the data structure itself-of the structural vibration responses-to address this (inverse) problem. What is relevant is an empirical, but often practically true, observation, that is, typically there are only few modes active in the structural vibration responses; hence a sparse representation (in frequency domain) of the single-channel data vector, or, a low-rank structure (by singular value decomposition) of the multi-channel data matrix. Exploiting such prior knowledge of data structure (intra-channel sparse or inter-channel low-rank), the new theories of ℓ1-minimization sparse recovery and nuclear-norm-minimization low-rank matrix completion enable recovery of the randomly missing or corrupted structural vibration response data. The performance of these two alternatives, in terms of recovery accuracy and computational time under different data missing rates, is investigated on a few structural vibration response data sets-the seismic responses of the super high-rise Canton Tower and the structural health monitoring accelerations of a real large-scale cable-stayed bridge. Encouraging results are obtained and the applicability and limitation of the presented methods are discussed.
Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection.

PubMed

Wang, Haoran; Yuan, Chunfeng; Hu, Weiming; Ling, Haibin; Yang, Wankou; Sun, Changyin

2014-02-01

In this paper, we propose using high-level action units to represent human actions in videos and, based on such units, a novel sparse model is developed for human action recognition. There are three interconnected components in our approach. First, we propose a new context-aware spatial-temporal descriptor, named locally weighted word context, to improve the discriminability of the traditionally used local spatial-temporal descriptors. Second, from the statistics of the context-aware descriptors, we learn action units using the graph regularized nonnegative matrix factorization, which leads to a part-based representation and encodes the geometrical information. These units effectively bridge the semantic gap in action recognition. Third, we propose a sparse model based on a joint l2,1-norm to preserve the representative items and suppress noise in the action units. Intuitively, when learning the dictionary for action representation, the sparse model captures the fact that actions from the same class share similar units. The proposed approach is evaluated on several publicly available data sets. The experimental results and analysis clearly demonstrate the effectiveness of the proposed approach.
Assessing the effects of cocaine dependence and pathological gambling using group-wise sparse representation of natural stimulus FMRI data.

PubMed

Ren, Yudan; Fang, Jun; Lv, Jinglei; Hu, Xintao; Guo, Cong Christine; Guo, Lei; Xu, Jiansong; Potenza, Marc N; Liu, Tianming

2017-08-01

Assessing functional brain activation patterns in neuropsychiatric disorders such as cocaine dependence (CD) or pathological gambling (PG) under naturalistic stimuli has received rising interest in recent years. In this paper, we propose and apply a novel group-wise sparse representation framework to assess differences in neural responses to naturalistic stimuli across multiple groups of participants (healthy control, cocaine dependence, pathological gambling). Specifically, natural stimulus fMRI (N-fMRI) signals from all three groups of subjects are aggregated into a big data matrix, which is then decomposed into a common signal basis dictionary and associated weight coefficient matrices via an effective online dictionary learning and sparse coding method. The coefficient matrices associated with each common dictionary atom are statistically assessed for each group separately. With the inter-group comparisons based on the group-wise correspondence established by the common dictionary, our experimental results demonstrated that the group-wise sparse coding and representation strategy can effectively and specifically detect brain networks/regions affected by different pathological conditions of the brain under naturalistic stimuli.
Sparsity of the normal matrix in the refinement of macromolecules at atomic and subatomic resolution.

PubMed

Jelsch, C

2001-09-01

The normal matrix in the least-squares refinement of macromolecules is very sparse when the resolution reaches atomic and subatomic levels. The elements of the normal matrix, related to coordinates, thermal motion and charge-density parameters, have a global tendency to decrease rapidly with the interatomic distance between the atoms concerned. For instance, in the case of the protein crambin at 0.54 A resolution, the elements are reduced by two orders of magnitude for distances above 1.5 A. The neglect a priori of most of the normal-matrix elements according to a distance criterion represents an approximation in the refinement of macromolecules, which is particularly valid at very high resolution. The analytical expressions of the normal-matrix elements, which have been derived for the coordinates and the thermal parameters, show that the degree of matrix sparsity increases with the diffraction resolution and the size of the asymmetric unit.
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Heber, Gerd; Biswas, Rupak

2000-01-01

The ability of computers to solve hitherto intractable problems and simulate complex processes using mathematical models makes them an indispensable part of modern science and engineering. Computer simulations of large-scale realistic applications usually require solving a set of non-linear partial differential equations (PDES) over a finite region. For example, one thrust area in the DOE Grand Challenge projects is to design future accelerators such as the SpaHation Neutron Source (SNS). Our colleagues at SLAC need to model complex RFQ cavities with large aspect ratios. Unstructured grids are currently used to resolve the small features in a large computational domain; dynamic mesh adaptation will be added in the future for additional efficiency. The PDEs for electromagnetics are discretized by the FEM method, which leads to a generalized eigenvalue problem Kx = AMx, where K and M are the stiffness and mass matrices, and are very sparse. In a typical cavity model, the number of degrees of freedom is about one million. For such large eigenproblems, direct solution techniques quickly reach the memory limits. Instead, the most widely-used methods are Krylov subspace methods, such as Lanczos or Jacobi-Davidson. In all the Krylov-based algorithms, sparse matrix-vector multiplication (SPMV) must be performed repeatedly. Therefore, the efficiency of SPMV usually determines the eigensolver speed. SPMV is also one of the most heavily used kernels in large-scale numerical simulations.
Structure-preserving and rank-revealing QR-factorizations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bischof, C.H.; Hansen, P.C.

1991-11-01

The rank-revealing QR-factorization (RRQR-factorization) is a special QR-factorization that is guaranteed to reveal the numerical rank of the matrix under consideration. This makes the RRQR-factorization a useful tool in the numerical treatment of many rank-deficient problems in numerical linear algebra. In this paper, a framework is presented for the efficient implementation of RRQR algorithms, in particular, for sparse matrices. A sparse RRQR-algorithm should seek to preserve the structure and sparsity of the matrix as much as possible while retaining the ability to capture safely the numerical rank. To this end, the paper proposes to compute an initial QR-factorization using amore » restricted pivoting strategy guarded by incremental condition estimation (ICE), and then applies the algorithm suggested by Chan and Foster to this QR-factorization. The column exchange strategy used in the initial QR factorization will exploit the fact that certain column exchanges do not change the sparsity structure, and compute a sparse QR-factorization that is a good approximation of the sought-after RRQR-factorization. Due to quantities produced by ICE, the Chan/Foster RRQR algorithm can be implemented very cheaply, thus verifying that the sought-after RRQR-factorization has indeed been computed. Experimental results on a model problem show that the initial QR-factorization is indeed very likely to produce RRQR-factorization.« less
Efficient Parallel Formulations of Hierarchical Methods and Their Applications

NASA Astrophysics Data System (ADS)

Grama, Ananth Y.

1996-01-01

Hierarchical methods such as the Fast Multipole Method (FMM) and Barnes-Hut (BH) are used for rapid evaluation of potential (gravitational, electrostatic) fields in particle systems. They are also used for solving integral equations using boundary element methods. The linear systems arising from these methods are dense and are solved iteratively. Hierarchical methods reduce the complexity of the core matrix-vector product from O(n^2) to O(n log n) and the memory requirement from O(n^2) to O(n). We have developed highly scalable parallel formulations of a hybrid FMM/BH method that are capable of handling arbitrarily irregular distributions. We apply these formulations to astrophysical simulations of Plummer and Gaussian galaxies. We have used our parallel formulations to solve the integral form of the Laplace equation. We show that our parallel hierarchical mat-vecs yield high efficiency and overall performance even on relatively small problems. A problem containing approximately 200K nodes takes under a second to compute on 256 processors and yet yields over 85% efficiency. The efficiency and raw performance is expected to increase for bigger problems. For the 200K node problem, our code delivers about 5 GFLOPS of performance on a 256 processor T3D. This is impressive considering the fact that the problem has floating point divides and roots, and very little locality resulting in poor cache performance. A dense matrix-vector product of the same dimensions would require about 0.5 TeraBytes of memory and about 770 TeraFLOPS of computing speed. Clearly, if the loss in accuracy resulting from the use of hierarchical methods is acceptable, our code yields significant savings in time and memory. We also study the convergence of a GMRES solver built around this mat-vec. We accelerate the convergence of the solver using three preconditioning techniques: diagonal scaling, block-diagonal preconditioning, and inner-outer preconditioning. We study the performance and parallel efficiency of these preconditioned solvers. Using this solver, we solve dense linear systems with hundreds of thousands of unknowns. Solving a 105K unknown problem takes about 10 minutes on a 64 processor T3D. Until very recently, boundary element problems of this magnitude could not even be generated, let alone solved.
Extending substructure based iterative solvers to multiple load and repeated analyses

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1993-01-01

Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
Solving groundwater flow problems by conjugate-gradient methods and the strongly implicit procedure

USGS Publications Warehouse

Hill, Mary C.

1990-01-01

The performance of the preconditioned conjugate-gradient method with three preconditioners is compared with the strongly implicit procedure (SIP) using a scalar computer. The preconditioners considered are the incomplete Cholesky (ICCG) and the modified incomplete Cholesky (MICCG), which require the same computer storage as SIP as programmed for a problem with a symmetric matrix, and a polynomial preconditioner (POLCG), which requires less computer storage than SIP. Although POLCG is usually used on vector computers, it is included here because of its small storage requirements. In this paper, published comparisons of the solvers are evaluated, all four solvers are compared for the first time, and new test cases are presented to provide a more complete basis by which the solvers can be judged for typical groundwater flow problems. Based on nine test cases, the following conclusions are reached: (1) SIP is actually as efficient as ICCG for some of the published, linear, two-dimensional test cases that were reportedly solved much more efficiently by ICCG; (2) SIP is more efficient than other published comparisons would indicate when common convergence criteria are used; and (3) for problems that are three-dimensional, nonlinear, or both, and for which common convergence criteria are used, SIP is often more efficient than ICCG, and is sometimes more efficient than MICCG.

RF Wave Simulation Using the MFEM Open Source FEM Package

NASA Astrophysics Data System (ADS)

Stillerman, J.; Shiraiwa, S.; Bonoli, P. T.; Wright, J. C.; Green, D. L.; Kolev, T.

2016-10-01

A new plasma wave simulation environment based on the finite element method is presented. MFEM, a scalable open-source FEM library, is used as the basis for this capability. MFEM allows for assembling an FEM matrix of arbitrarily high order in a parallel computing environment. A 3D frequency domain RF physics layer was implemented using a python wrapper for MFEM and a cold collisional plasma model was ported. This physics layer allows for defining the plasma RF wave simulation model without user knowledge of the FEM weak-form formulation. A graphical user interface is built on πScope, a python-based scientific workbench, such that a user can build a model definition file interactively. Benchmark cases have been ported to this new environment, with results being consistent with those obtained using COMSOL multiphysics, GENRAY, and TORIC/TORLH spectral solvers. This work is a first step in bringing to bear the sophisticated computational tool suite that MFEM provides (e.g., adaptive mesh refinement, solver suite, element types) to the linear plasma-wave interaction problem, and within more complicated integrated workflows, such as coupling with core spectral solver, or incorporating additional physics such as an RF sheath potential model or kinetic effects. USDoE Awards DE-FC02-99ER54512, DE-FC02-01ER54648.
A SEMI-LAGRANGIAN TWO-LEVEL PRECONDITIONED NEWTON-KRYLOV SOLVER FOR CONSTRAINED DIFFEOMORPHIC IMAGE REGISTRATION.

PubMed

Mang, Andreas; Biros, George

2017-01-01

We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations. We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20 × speedup for a two dimensional, real world multi-subject medical image registration problem.
Color Sparse Representations for Image Processing: Review, Models, and Prospects.

PubMed

Barthélemy, Quentin; Larue, Anthony; Mars, Jérôme I

2015-11-01

Sparse representations have been extended to deal with color images composed of three channels. A review of dictionary-learning-based sparse representations for color images is made here, detailing the differences between the models, and comparing their results on the real and simulated data. These models are considered in a unifying framework that is based on the degrees of freedom of the linear filtering/transformation of the color channels. Moreover, this allows it to be shown that the scalar quaternionic linear model is equivalent to constrained matrix-based color filtering, which highlights the filtering implicitly applied through this model. Based on this reformulation, the new color filtering model is introduced, using unconstrained filters. In this model, spatial morphologies of color images are encoded by atoms, and colors are encoded by color filters. Color variability is no longer captured in increasing the dictionary size, but with color filters, this gives an efficient color representation.
A sparse equivalent source method for near-field acoustic holography.

PubMed

Fernandez-Grande, Efren; Xenaki, Angeliki; Gerstoft, Peter

2017-01-01

This study examines a near-field acoustic holography method consisting of a sparse formulation of the equivalent source method, based on the compressive sensing (CS) framework. The method, denoted Compressive-Equivalent Source Method (C-ESM), encourages spatially sparse solutions (based on the superposition of few waves) that are accurate when the acoustic sources are spatially localized. The importance of obtaining a non-redundant representation, i.e., a sensing matrix with low column coherence, and the inherent ill-conditioning of near-field reconstruction problems is addressed. Numerical and experimental results on a classical guitar and on a highly reactive dipole-like source are presented. C-ESM is valid beyond the conventional sampling limits, making wide-band reconstruction possible. Spatially extended sources can also be addressed with C-ESM, although in this case the obtained solution does not recover the spatial extent of the source.
Sparse matrix beamforming and image reconstruction for real-time 2D HIFU monitoring using Harmonic Motion Imaging for Focused Ultrasound (HMIFU) with in vitro validation

PubMed Central

Hou, Gary Y.; Provost, Jean; Grondin, Julien; Wang, Shutao; Marquet, Fabrice; Bunting, Ethan; Konofagou, Elisa E.

2015-01-01

Harmonic Motion Imaging for Focused Ultrasound (HMIFU) is a recently developed High-Intensity Focused Ultrasound (HIFU) treatment monitoring method. HMIFU utilizes an Amplitude-Modulated (fAM = 25 Hz) HIFU beam to induce a localized focal oscillatory motion, which is simultaneously estimated and imaged by confocally-aligned imaging transducer. HMIFU feasibilities have been previously shown in silico, in vitro, and in vivo in 1-D or 2-D monitoring of HIFU treatment. The objective of this study is to develop and show the feasibility of a novel fast beamforming algorithm for image reconstruction using GPU-based sparse-matrix operation with real-time feedback. In this study, the algorithm was implemented onto a fully integrated, clinically relevant HMIFU system composed of a 93-element HIFU transducer (fcenter = 4.5MHz) and coaxially-aligned 64-element phased array (fcenter = 2.5MHz) for displacement excitation and motion estimation, respectively. A single transmit beam with divergent beam transmit was used while fast beamforming was implemented using a GPU-based delay-and-sum method and a sparse-matrix operation. Axial HMI displacements were then estimated from the RF signals using a 1-D normalized cross-correlation method and streamed to a graphic user interface. The present work developed and implemented a sparse matrix beamforming onto a fully-integrated, clinically relevant system, which can stream displacement images up to 15 Hz using a GPU-based processing, an increase of 100 fold in rate of streaming displacement images compared to conventional CPU-based conventional beamforming and reconstruction processing. The achieved feedback rate is also currently the fastest and only approach that does not require interrupting the HIFU treatment amongst the acoustic radiation force based HIFU imaging techniques. Results in phantom experiments showed reproducible displacement imaging, and monitoring of twenty two in vitro HIFU treatments using the new 2D system showed a consistent average focal displacement decrease of 46.7±14.6% during lesion formation. Complementary focal temperature monitoring also indicated an average rate of displacement increase and decrease with focal temperature at 0.84±1.15 %/ °C, and 2.03± 0.93%/ °C, respectively. These results reinforce the HMIFU capability of estimating and monitoring stiffness related changes in real time. Current ongoing studies include clinical translation of the presented system for monitoring of HIFU treatment for breast and pancreatic tumor applications. PMID:24960528
On the sparseness of 1-norm support vector machines.

PubMed

Zhang, Li; Zhou, Weida

2010-04-01

There is some empirical evidence available showing that 1-norm Support Vector Machines (1-norm SVMs) have good sparseness; however, both how good sparseness 1-norm SVMs can reach and whether they have a sparser representation than that of standard SVMs are not clear. In this paper we take into account the sparseness of 1-norm SVMs. Two upper bounds on the number of nonzero coefficients in the decision function of 1-norm SVMs are presented. First, the number of nonzero coefficients in 1-norm SVMs is at most equal to the number of only the exact support vectors lying on the +1 and -1 discriminating surfaces, while that in standard SVMs is equal to the number of support vectors, which implies that 1-norm SVMs have better sparseness than that of standard SVMs. Second, the number of nonzero coefficients is at most equal to the rank of the sample matrix. A brief review of the geometry of linear programming and the primal steepest edge pricing simplex method are given, which allows us to provide the proof of the two upper bounds and evaluate their tightness by experiments. Experimental results on toy data sets and the UCI data sets illustrate our analysis. Copyright 2009 Elsevier Ltd. All rights reserved.
Improving M-SBL for Joint Sparse Recovery Using a Subspace Penalty

NASA Astrophysics Data System (ADS)

Ye, Jong Chul; Kim, Jong Min; Bresler, Yoram

2015-12-01

The multiple measurement vector problem (MMV) is a generalization of the compressed sensing problem that addresses the recovery of a set of jointly sparse signal vectors. One of the important contributions of this paper is to reveal that the seemingly least related state-of-art MMV joint sparse recovery algorithms - M-SBL (multiple sparse Bayesian learning) and subspace-based hybrid greedy algorithms - have a very important link. More specifically, we show that replacing the $\\log\\det(\\cdot)$ term in M-SBL by a rank proxy that exploits the spark reduction property discovered in subspace-based joint sparse recovery algorithms, provides significant improvements. In particular, if we use the Schatten-$p$ quasi-norm as the corresponding rank proxy, the global minimiser of the proposed algorithm becomes identical to the true solution as $p \\rightarrow 0$. Furthermore, under the same regularity conditions, we show that the convergence to a local minimiser is guaranteed using an alternating minimization algorithm that has closed form expressions for each of the minimization steps, which are convex. Numerical simulations under a variety of scenarios in terms of SNR, and condition number of the signal amplitude matrix demonstrate that the proposed algorithm consistently outperforms M-SBL and other state-of-the art algorithms.
A hierarchical wavefront reconstruction algorithm for gradient sensors

NASA Astrophysics Data System (ADS)

Bharmal, Nazim; Bitenc, Urban; Basden, Alastair; Myers, Richard

2013-12-01

ELT-scale extreme adaptive optics systems will require new approaches tocompute the wavefront suitably quickly, when the computational burden ofapplying a MVM is no longer practical. An approach is demonstrated here whichis hierarchical in transforming wavefront slopes from a WFS into a wavefront,and then to actuator values. First, simple integration in 1D is used to create1D-wavefront estimates with unknown starting points at the edges of independentspatial domains. Second, these starting points are estimated globally. By thesestarting points are a sub-set of the overall grid where wavefront values are tobe estimated, sparse representations are produced and numerical complexity canbe chosen by the spacing of the starting point grid relative to the overallgrid. Using a combination of algebraic expressions, sparse representation, anda conjugate gradient solver, the number of non-parallelized operations forreconstruction on a 100x100 sub-aperture sized problem is ~600,000 or O(N^3/2),which is approximately the same as for each thread of a MVM solutionparallelized over 100 threads. To reduce the effects of noise propagationwithin each domain, a noise reduction algorithm can be applied which ensuresthe continuity of the wavefront. To apply this additional step has a cost of~1,200,000 operations. We conclude by briefly discussing how the final step ofconverting from wavefront to actuator values can be achieved.
Sparse recovery of undersampled intensity patterns for coherent diffraction imaging at high X-ray energies

DOE PAGES

Maddali, S.; Calvo-Almazan, I.; Almer, J.; ...

2018-03-21

Coherent X-ray photons with energies higher than 50 keV offer new possibilities for imaging nanoscale lattice distortions in bulk crystalline materials using Bragg peak phase retrieval methods. However, the compression of reciprocal space at high energies typically results in poorly resolved fringes on an area detector, rendering the diffraction data unsuitable for the three-dimensional reconstruction of compact crystals. To address this problem, we propose a method by which to recover fine fringe detail in the scattered intensity. This recovery is achieved in two steps: multiple undersampled measurements are made by in-plane sub-pixel motion of the area detector, then this datamore » set is passed to a sparsity-based numerical solver that recovers fringe detail suitable for standard Bragg coherent diffraction imaging (BCDI) reconstruction methods of compact single crystals. The key insight of this paper is that sparsity in a BCDI data set can be enforced by recognising that the signal in the detector, though poorly resolved, is band-limited. This requires fewer in-plane detector translations for complete signal recovery, while adhering to information theory limits. Lastly, we use simulated BCDI data sets to demonstrate the approach, outline our sparse recovery strategy, and comment on future opportunities.« less
Sparse recovery of undersampled intensity patterns for coherent diffraction imaging at high X-ray energies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maddali, S.; Calvo-Almazan, I.; Almer, J.

Coherent X-ray photons with energies higher than 50 keV offer new possibilities for imaging nanoscale lattice distortions in bulk crystalline materials using Bragg peak phase retrieval methods. However, the compression of reciprocal space at high energies typically results in poorly resolved fringes on an area detector, rendering the diffraction data unsuitable for the three-dimensional reconstruction of compact crystals. To address this problem, we propose a method by which to recover fine fringe detail in the scattered intensity. This recovery is achieved in two steps: multiple undersampled measurements are made by in-plane sub-pixel motion of the area detector, then this datamore » set is passed to a sparsity-based numerical solver that recovers fringe detail suitable for standard Bragg coherent diffraction imaging (BCDI) reconstruction methods of compact single crystals. The key insight of this paper is that sparsity in a BCDI data set can be enforced by recognising that the signal in the detector, though poorly resolved, is band-limited. This requires fewer in-plane detector translations for complete signal recovery, while adhering to information theory limits. Lastly, we use simulated BCDI data sets to demonstrate the approach, outline our sparse recovery strategy, and comment on future opportunities.« less
Sparse recovery of undersampled intensity patterns for coherent diffraction imaging at high X-ray energies.

PubMed

Maddali, S; Calvo-Almazan, I; Almer, J; Kenesei, P; Park, J-S; Harder, R; Nashed, Y; Hruszkewycz, S O

2018-03-21

Coherent X-ray photons with energies higher than 50 keV offer new possibilities for imaging nanoscale lattice distortions in bulk crystalline materials using Bragg peak phase retrieval methods. However, the compression of reciprocal space at high energies typically results in poorly resolved fringes on an area detector, rendering the diffraction data unsuitable for the three-dimensional reconstruction of compact crystals. To address this problem, we propose a method by which to recover fine fringe detail in the scattered intensity. This recovery is achieved in two steps: multiple undersampled measurements are made by in-plane sub-pixel motion of the area detector, then this data set is passed to a sparsity-based numerical solver that recovers fringe detail suitable for standard Bragg coherent diffraction imaging (BCDI) reconstruction methods of compact single crystals. The key insight of this paper is that sparsity in a BCDI data set can be enforced by recognising that the signal in the detector, though poorly resolved, is band-limited. This requires fewer in-plane detector translations for complete signal recovery, while adhering to information theory limits. We use simulated BCDI data sets to demonstrate the approach, outline our sparse recovery strategy, and comment on future opportunities.
Dynamic graph system for a semantic database

DOEpatents

Mizell, David

2016-04-12

A method and system in a computer system for dynamically providing a graphical representation of a data store of entries via a matrix interface is disclosed. A dynamic graph system provides a matrix interface that exposes to an application program a graphical representation of data stored in a data store such as a semantic database storing triples. To the application program, the matrix interface represents the graph as a sparse adjacency matrix that is stored in compressed form. Each entry of the data store is considered to represent a link between nodes of the graph. Each entry has a first field and a second field identifying the nodes connected by the link and a third field with a value for the link that connects the identified nodes. The first, second, and third fields represent the rows, column, and elements of the adjacency matrix.
Dynamic graph system for a semantic database

DOEpatents

Mizell, David

2015-01-27

A method and system in a computer system for dynamically providing a graphical representation of a data store of entries via a matrix interface is disclosed. A dynamic graph system provides a matrix interface that exposes to an application program a graphical representation of data stored in a data store such as a semantic database storing triples. To the application program, the matrix interface represents the graph as a sparse adjacency matrix that is stored in compressed form. Each entry of the data store is considered to represent a link between nodes of the graph. Each entry has a first field and a second field identifying the nodes connected by the link and a third field with a value for the link that connects the identified nodes. The first, second, and third fields represent the rows, column, and elements of the adjacency matrix.
LiDAR point classification based on sparse representation

NASA Astrophysics Data System (ADS)

Li, Nan; Pfeifer, Norbert; Liu, Chun

2017-04-01

In order to combine the initial spatial structure and features of LiDAR data for accurate classification. The LiDAR data is represented as a 4-order tensor. Sparse representation for classification(SRC) method is used for LiDAR tensor classification. It turns out SRC need only a few of training samples from each class, meanwhile can achieve good classification result. Multiple features are extracted from raw LiDAR points to generate a high-dimensional vector at each point. Then the LiDAR tensor is built by the spatial distribution and feature vectors of the point neighborhood. The entries of LiDAR tensor are accessed via four indexes. Each index is called mode: three spatial modes in direction X ,Y ,Z and one feature mode. Sparse representation for classification(SRC) method is proposed in this paper. The sparsity algorithm is to find the best represent the test sample by sparse linear combination of training samples from a dictionary. To explore the sparsity of LiDAR tensor, the tucker decomposition is used. It decomposes a tensor into a core tensor multiplied by a matrix along each mode. Those matrices could be considered as the principal components in each mode. The entries of core tensor show the level of interaction between the different components. Therefore, the LiDAR tensor can be approximately represented by a sparse tensor multiplied by a matrix selected from a dictionary along each mode. The matrices decomposed from training samples are arranged as initial elements in the dictionary. By dictionary learning, a reconstructive and discriminative structure dictionary along each mode is built. The overall structure dictionary composes of class-specified sub-dictionaries. Then the sparse core tensor is calculated by tensor OMP(Orthogonal Matching Pursuit) method based on dictionaries along each mode. It is expected that original tensor should be well recovered by sub-dictionary associated with relevant class, while entries in the sparse tensor associated with other classed should be nearly zero. Therefore, SRC use the reconstruction error associated with each class to do data classification. A section of airborne LiDAR points of Vienna city is used and classified into 6classes: ground, roofs, vegetation, covered ground, walls and other points. Only 6 training samples from each class are taken. For the final classification result, ground and covered ground are merged into one same class(ground). The classification accuracy for ground is 94.60%, roof is 95.47%, vegetation is 85.55%, wall is 76.17%, other object is 20.39%.
Bundle block adjustment of large-scale remote sensing data with Block-based Sparse Matrix Compression combined with Preconditioned Conjugate Gradient

NASA Astrophysics Data System (ADS)

Zheng, Maoteng; Zhang, Yongjun; Zhou, Shunping; Zhu, Junfeng; Xiong, Xiaodong

2016-07-01

In recent years, new platforms and sensors in photogrammetry, remote sensing and computer vision areas have become available, such as Unmanned Aircraft Vehicles (UAV), oblique camera systems, common digital cameras and even mobile phone cameras. Images collected by all these kinds of sensors could be used as remote sensing data sources. These sensors can obtain large-scale remote sensing data which consist of a great number of images. Bundle block adjustment of large-scale data with conventional algorithm is very time and space (memory) consuming due to the super large normal matrix arising from large-scale data. In this paper, an efficient Block-based Sparse Matrix Compression (BSMC) method combined with the Preconditioned Conjugate Gradient (PCG) algorithm is chosen to develop a stable and efficient bundle block adjustment system in order to deal with the large-scale remote sensing data. The main contribution of this work is the BSMC-based PCG algorithm which is more efficient in time and memory than the traditional algorithm without compromising the accuracy. Totally 8 datasets of real data are used to test our proposed method. Preliminary results have shown that the BSMC method can efficiently decrease the time and memory requirement of large-scale data.
Nonnegative matrix factorization and sparse representation for the automated detection of periodic limb movements in sleep.

PubMed

Shokrollahi, Mehrnaz; Krishnan, Sridhar; Dopsa, Dustin D; Muir, Ryan T; Black, Sandra E; Swartz, Richard H; Murray, Brian J; Boulos, Mark I

2016-11-01

Stroke is a leading cause of death and disability in adults, and incurs a significant economic burden to society. Periodic limb movements (PLMs) in sleep are repetitive movements involving the great toe, ankle, and hip. Evolving evidence suggests that PLMs may be associated with high blood pressure and stroke, but this relationship remains underexplored. Several issues limit the study of PLMs including the need to manually score them, which is time-consuming and costly. For this reason, we developed a novel automated method for nocturnal PLM detection, which was shown to be correlated with (a) the manually scored PLM index on polysomnography, and (b) white matter hyperintensities on brain imaging, which have been demonstrated to be associated with PLMs. Our proposed algorithm consists of three main stages: (1) representing the signal in the time-frequency plane using time-frequency matrices (TFM), (2) applying K-nonnegative matrix factorization technique to decompose the TFM matrix into its significant components, and (3) applying kernel sparse representation for classification (KSRC) to the decomposed signal. Our approach was applied to a dataset that consisted of 65 subjects who underwent polysomnography. An overall classification of 97 % was achieved for discrimination of the aforementioned signals, demonstrating the potential of the presented method.
Acoustic 3D modeling by the method of integral equations

NASA Astrophysics Data System (ADS)

Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.

2018-02-01

This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

NASA Astrophysics Data System (ADS)

Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

2016-01-01

Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
Toward an optimal solver for time-spectral fluid-dynamic and aeroelastic solutions on unstructured meshes

NASA Astrophysics Data System (ADS)

Mundis, Nathan L.; Mavriplis, Dimitri J.

2017-09-01

The time-spectral method applied to the Euler and coupled aeroelastic equations theoretically offers significant computational savings for purely periodic problems when compared to standard time-implicit methods. However, attaining superior efficiency with time-spectral methods over traditional time-implicit methods hinges on the ability rapidly to solve the large non-linear system resulting from time-spectral discretizations which become larger and stiffer as more time instances are employed or the period of the flow becomes especially short (i.e. the maximum resolvable wave-number increases). In order to increase the efficiency of these solvers, and to improve robustness, particularly for large numbers of time instances, the Generalized Minimal Residual Method (GMRES) is used to solve the implicit linear system over all coupled time instances. The use of GMRES as the linear solver makes time-spectral methods more robust, allows them to be applied to a far greater subset of time-accurate problems, including those with a broad range of harmonic content, and vastly improves the efficiency of time-spectral methods. In previous work, a wave-number independent preconditioner that mitigates the increased stiffness of the time-spectral method when applied to problems with large resolvable wave numbers has been developed. This preconditioner, however, directly inverts a large matrix whose size increases in proportion to the number of time instances. As a result, the computational time of this method scales as the cube of the number of time instances. In the present work, this preconditioner has been reworked to take advantage of an approximate-factorization approach that effectively decouples the spatial and temporal systems. Once decoupled, the time-spectral matrix can be inverted in frequency space, where it has entries only on the main diagonal and therefore can be inverted quite efficiently. This new GMRES/preconditioner combination is shown to be over an order of magnitude more efficient than the previous wave-number independent preconditioner for problems with large numbers of time instances and/or large reduced frequencies.
Structural performance analysis and redesign

NASA Technical Reports Server (NTRS)

Whetstone, W. D.

1978-01-01

Program performs stress buckling and vibrational analysis of large, linear, finite-element systems in excess of 50,000 degrees of freedom. Cost, execution time, and storage requirements are kept reasonable through use of sparse matrix solution techniques, and other computational and data management procedures designed for problems of very large size.

SMOKE TOOL FOR MODELS-3 VERSION 4.1 STRUCTURE AND OPERATION DOCUMENTATION

EPA Science Inventory

The SMOKE Tool is a part of the Models-3 system, a flexible software system designed to simplify the development and use of air quality models and other environmental decision support tools. The SMOKE Tool is an input processor for SMOKE, (Sparse Matrix Operator Kernel Emissio...
Comparing implementations of penalized weighted least-squares sinogram restoration.

PubMed

Forthmann, Peter; Koehler, Thomas; Defrise, Michel; La Riviere, Patrick

2010-11-01

A CT scanner measures the energy that is deposited in each channel of a detector array by x rays that have been partially absorbed on their way through the object. The measurement process is complex and quantitative measurements are always and inevitably associated with errors, so CT data must be preprocessed prior to reconstruction. In recent years, the authors have formulated CT sinogram preprocessing as a statistical restoration problem in which the goal is to obtain the best estimate of the line integrals needed for reconstruction from the set of noisy, degraded measurements. The authors have explored both penalized Poisson likelihood (PL) and penalized weighted least-squares (PWLS) objective functions. At low doses, the authors found that the PL approach outperforms PWLS in terms of resolution-noise tradeoffs, but at standard doses they perform similarly. The PWLS objective function, being quadratic, is more amenable to computational acceleration than the PL objective. In this work, the authors develop and compare two different methods for implementing PWLS sinogram restoration with the hope of improving computational performance relative to PL in the standard-dose regime. Sinogram restoration is still significant in the standard-dose regime since it can still outperform standard approaches and it allows for correction of effects that are not usually modeled in standard CT preprocessing. The authors have explored and compared two implementation strategies for PWLS sinogram restoration: (1) A direct matrix-inversion strategy based on the closed-form solution to the PWLS optimization problem and (2) an iterative approach based on the conjugate-gradient algorithm. Obtaining optimal performance from each strategy required modifying the naive off-the-shelf implementations of the algorithms to exploit the particular symmetry and sparseness of the sinogram-restoration problem. For the closed-form approach, the authors subdivided the large matrix inversion into smaller coupled problems and exploited sparseness to minimize matrix operations. For the conjugate-gradient approach, the authors exploited sparseness and preconditioned the problem to speed up convergence. All methods produced qualitatively and quantitatively similar images as measured by resolution-variance tradeoffs and difference images. Despite the acceleration strategies, the direct matrix-inversion approach was found to be uncompetitive with iterative approaches, with a computational burden higher by an order of magnitude or more. The iterative conjugate-gradient approach, however, does appear promising, with computation times half that of the authors' previous penalized-likelihood implementation. Iterative conjugate-gradient based PWLS sinogram restoration with careful matrix optimizations has computational advantages over direct matrix PWLS inversion and over penalized-likelihood sinogram restoration and can be considered a good alternative in standard-dose regimes.
A Sparsity-Promoted Method Based on Majorization-Minimization for Weak Fault Feature Enhancement

PubMed Central

Hao, Yansong; Song, Liuyang; Tang, Gang; Yuan, Hongfang

2018-01-01

Fault transient impulses induced by faulty components in rotating machinery usually contain substantial interference. Fault features are comparatively weak in the initial fault stage, which renders fault diagnosis more difficult. In this case, a sparse representation method based on the Majorzation-Minimization (MM) algorithm is proposed to enhance weak fault features and extract the features from strong background noise. However, the traditional MM algorithm suffers from two issues, which are the choice of sparse basis and complicated calculations. To address these challenges, a modified MM algorithm is proposed in which a sparse optimization objective function is designed firstly. Inspired by the Basis Pursuit (BP) model, the optimization function integrates an impulsive feature-preserving factor and a penalty function factor. Second, a modified Majorization iterative method is applied to address the convex optimization problem of the designed function. A series of sparse coefficients can be achieved through iterating, which only contain transient components. It is noteworthy that there is no need to select the sparse basis in the proposed iterative method because it is fixed as a unit matrix. Then the reconstruction step is omitted, which can significantly increase detection efficiency. Eventually, envelope analysis of the sparse coefficients is performed to extract weak fault features. Simulated and experimental signals including bearings and gearboxes are employed to validate the effectiveness of the proposed method. In addition, comparisons are made to prove that the proposed method outperforms the traditional MM algorithm in terms of detection results and efficiency. PMID:29597280
A Sparsity-Promoted Method Based on Majorization-Minimization for Weak Fault Feature Enhancement.

PubMed

Ren, Bangyue; Hao, Yansong; Wang, Huaqing; Song, Liuyang; Tang, Gang; Yuan, Hongfang

2018-03-28

Fault transient impulses induced by faulty components in rotating machinery usually contain substantial interference. Fault features are comparatively weak in the initial fault stage, which renders fault diagnosis more difficult. In this case, a sparse representation method based on the Majorzation-Minimization (MM) algorithm is proposed to enhance weak fault features and extract the features from strong background noise. However, the traditional MM algorithm suffers from two issues, which are the choice of sparse basis and complicated calculations. To address these challenges, a modified MM algorithm is proposed in which a sparse optimization objective function is designed firstly. Inspired by the Basis Pursuit (BP) model, the optimization function integrates an impulsive feature-preserving factor and a penalty function factor. Second, a modified Majorization iterative method is applied to address the convex optimization problem of the designed function. A series of sparse coefficients can be achieved through iterating, which only contain transient components. It is noteworthy that there is no need to select the sparse basis in the proposed iterative method because it is fixed as a unit matrix. Then the reconstruction step is omitted, which can significantly increase detection efficiency. Eventually, envelope analysis of the sparse coefficients is performed to extract weak fault features. Simulated and experimental signals including bearings and gearboxes are employed to validate the effectiveness of the proposed method. In addition, comparisons are made to prove that the proposed method outperforms the traditional MM algorithm in terms of detection results and efficiency.
A geometric multigrid preconditioning strategy for DPG system matrices

DOE PAGES

Roberts, Nathan V.; Chan, Jesse

2017-08-23

Here, the discontinuous Petrov–Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan (2010, 2011) guarantees the optimality of the solution in an energy norm, and provides several features facilitating adaptive schemes. A key question that has not yet been answered in general – though there are some results for Poisson, e.g.– is how best to precondition the DPG system matrix, so that iterative solvers may be used to allow solution of large-scale problems.
Nonlinear study of the parallel velocity/tearing instability using an implicit, nonlinear resistive MHD solver

NASA Astrophysics Data System (ADS)

Chacon, L.; Finn, J. M.; Knoll, D. A.

2000-10-01

Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
Improve Problem Solving Skills through Adapting Programming Tools

NASA Technical Reports Server (NTRS)

Shaykhian, Linda H.; Shaykhian, Gholam Ali

2007-01-01

There are numerous ways for engineers and students to become better problem-solvers. The use of command line and visual programming tools can help to model a problem and formulate a solution through visualization. The analysis of problem attributes and constraints provide insight into the scope and complexity of the problem. The visualization aspect of the problem-solving approach tends to make students and engineers more systematic in their thought process and help them catch errors before proceeding too far in the wrong direction. The problem-solver identifies and defines important terms, variables, rules, and procedures required for solving a problem. Every step required to construct the problem solution can be defined in program commands that produce intermediate output. This paper advocates improved problem solving skills through using a programming tool. MatLab created by MathWorks, is an interactive numerical computing environment and programming language. It is a matrix-based system that easily lends itself to matrix manipulation, and plotting of functions and data. MatLab can be used as an interactive command line or a sequence of commands that can be saved in a file as a script or named functions. Prior programming experience is not required to use MatLab commands. The GNU Octave, part of the GNU project, a free computer program for performing numerical computations, is comparable to MatLab. MatLab visual and command programming are presented here.
A Partitioning Algorithm for Block-Diagonal Matrices With Overlap

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guy Antoine Atenekeng Kahou; Laura Grigori; Masha Sosonkina

2008-02-02

We present a graph partitioning algorithm that aims at partitioning a sparse matrix into a block-diagonal form, such that any two consecutive blocks overlap. We denote this form of the matrix as the overlapped block-diagonal matrix. The partitioned matrix is suitable for applying the explicit formulation of Multiplicative Schwarz preconditioner (EFMS) described in [3]. The graph partitioning algorithm partitions the graph of the input matrix into K partitions, such that every partition {Omega}{sub i} has at most two neighbors {Omega}{sub i-1} and {Omega}{sub i+1}. First, an ordering algorithm, such as the reverse Cuthill-McKee algorithm, that reduces the matrix profile ismore » performed. An initial overlapped block-diagonal partition is obtained from the profile of the matrix. An iterative strategy is then used to further refine the partitioning by allowing nodes to be transferred between neighboring partitions. Experiments are performed on matrices arising from real-world applications to show the feasibility and usefulness of this approach.« less
A path-oriented matrix-based knowledge representation system

NASA Technical Reports Server (NTRS)

Feyock, Stefan; Karamouzis, Stamos T.

1993-01-01

Experience has shown that designing a good representation is often the key to turning hard problems into simple ones. Most AI (Artificial Intelligence) search/representation techniques are oriented toward an infinite domain of objects and arbitrary relations among them. In reality much of what needs to be represented in AI can be expressed using a finite domain and unary or binary predicates. Well-known vector- and matrix-based representations can efficiently represent finite domains and unary/binary predicates, and allow effective extraction of path information by generalized transitive closure/path matrix computations. In order to avoid space limitations a set of abstract sparse matrix data types was developed along with a set of operations on them. This representation forms the basis of an intelligent information system for representing and manipulating relational data.
Identification of Successive ``Unobservable'' Cyber Data Attacks in Power Systems Through Matrix Decomposition

NASA Astrophysics Data System (ADS)

Gao, Pengzhi; Wang, Meng; Chow, Joe H.; Ghiocel, Scott G.; Fardanesh, Bruce; Stefopoulos, George; Razanousky, Michael P.

2016-11-01

This paper presents a new framework of identifying a series of cyber data attacks on power system synchrophasor measurements. We focus on detecting "unobservable" cyber data attacks that cannot be detected by any existing method that purely relies on measurements received at one time instant. Leveraging the approximate low-rank property of phasor measurement unit (PMU) data, we formulate the identification problem of successive unobservable cyber attacks as a matrix decomposition problem of a low-rank matrix plus a transformed column-sparse matrix. We propose a convex-optimization-based method and provide its theoretical guarantee in the data identification. Numerical experiments on actual PMU data from the Central New York power system and synthetic data are conducted to verify the effectiveness of the proposed method.
Selection of Polynomial Chaos Bases via Bayesian Model Uncertainty Methods with Applications to Sparse Approximation of PDEs with Stochastic Inputs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karagiannis, Georgios; Lin, Guang

2014-02-15

Generalized polynomial chaos (gPC) expansions allow the representation of the solution of a stochastic system as a series of polynomial terms. The number of gPC terms increases dramatically with the dimension of the random input variables. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs if the evaluations of the system are expensive, the evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solution, both in spacial and random domains, by coupling Bayesianmore » model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spacial points via (1) Bayesian model average or (2) medial probability model, and their construction as functions on the spacial domain via spline interpolation. The former accounts the model uncertainty and provides Bayes-optimal predictions; while the latter, additionally, provides a sparse representation of the solution by evaluating the expansion on a subset of dominating gPC bases when represented as a gPC expansion. Moreover, the method quantifies the importance of the gPC bases through inclusion probabilities. We design an MCMC sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed method is suitable for, but not restricted to, problems whose stochastic solution is sparse at the stochastic level with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the good performance of the proposed method and make comparisons with others on 1D, 14D and 40D in random space elliptic stochastic partial differential equations.« less
Progress on a generalized coordinates tensor product finite element 3DPNS algorithm for subsonic

NASA Technical Reports Server (NTRS)

Baker, A. J.; Orzechowski, J. A.

1983-01-01

A generalized coordinates form of the penalty finite element algorithm for the 3-dimensional parabolic Navier-Stokes equations for turbulent subsonic flows was derived. This algorithm formulation requires only three distinct hypermatrices and is applicable using any boundary fitted coordinate transformation procedure. The tensor matrix product approximation to the Jacobian of the Newton linear algebra matrix statement was also derived. Tne Newton algorithm was restructured to replace large sparse matrix solution procedures with grid sweeping using alpha-block tridiagonal matrices, where alpha equals the number of dependent variables. Numerical experiments were conducted and the resultant data gives guidance on potentially preferred tensor product constructions for the penalty finite element 3DPNS algorithm.
Iris recognition based on robust principal component analysis

NASA Astrophysics Data System (ADS)

Karn, Pradeep; He, Xiao Hai; Yang, Shuai; Wu, Xiao Hong

2014-11-01

Iris images acquired under different conditions often suffer from blur, occlusion due to eyelids and eyelashes, specular reflection, and other artifacts. Existing iris recognition systems do not perform well on these types of images. To overcome these problems, we propose an iris recognition method based on robust principal component analysis. The proposed method decomposes all training images into a low-rank matrix and a sparse error matrix, where the low-rank matrix is used for feature extraction. The sparsity concentration index approach is then applied to validate the recognition result. Experimental results using CASIA V4 and IIT Delhi V1iris image databases showed that the proposed method achieved competitive performances in both recognition accuracy and computational efficiency.
Sparse Covariance Matrix Estimation With Eigenvalue Constraints

PubMed Central

LIU, Han; WANG, Lie; ZHAO, Tuo

2014-01-01

We propose a new approach for estimating high-dimensional, positive-definite covariance matrices. Our method extends the generalized thresholding operator by adding an explicit eigenvalue constraint. The estimated covariance matrix simultaneously achieves sparsity and positive definiteness. The estimator is rate optimal in the minimax sense and we develop an efficient iterative soft-thresholding and projection algorithm based on the alternating direction method of multipliers. Empirically, we conduct thorough numerical experiments on simulated datasets as well as real data examples to illustrate the usefulness of our method. Supplementary materials for the article are available online. PMID:25620866
Preliminary results in implementing a model of the world economy on the CYBER 205: A case of large sparse nonsymmetric linear equations

NASA Technical Reports Server (NTRS)

Szyld, D. B.

1984-01-01

A brief description of the Model of the World Economy implemented at the Institute for Economic Analysis is presented, together with our experience in converting the software to vector code. For each time period, the model is reduced to a linear system of over 2000 variables. The matrix of coefficients has a bordered block diagonal structure, and we show how some of the matrix operations can be carried out on all diagonal blocks at once.
A monolithic homotopy continuation algorithm with application to computational fluid dynamics

NASA Astrophysics Data System (ADS)

Brown, David A.; Zingg, David W.

2016-09-01

A new class of homotopy continuation methods is developed suitable for globalizing quasi-Newton methods for large sparse nonlinear systems of equations. The new continuation methods, described as monolithic homotopy continuation, differ from the classical predictor-corrector algorithm in that the predictor and corrector phases are replaced with a single phase which includes both a predictor and corrector component. Conditional convergence and stability are proved analytically. Using a Laplacian-like operator to construct the homotopy, the new algorithm is shown to be more efficient than the predictor-corrector homotopy continuation algorithm as well as an implementation of the widely-used pseudo-transient continuation algorithm for some inviscid and turbulent, subsonic and transonic external aerodynamic flows over the ONERA M6 wing and the NACA 0012 airfoil using a parallel implicit Newton-Krylov finite-difference flow solver.
Numerical simulations of microwave heating of liquids: enhancements using Krylov subspace methods

NASA Astrophysics Data System (ADS)

Lollchund, M. R.; Dookhitram, K.; Sunhaloo, M. S.; Boojhawon, R.

2013-04-01

In this paper, we compare the performances of three iterative solvers for large sparse linear systems arising in the numerical computations of incompressible Navier-Stokes (NS) equations. These equations are employed mainly in the simulation of microwave heating of liquids. The emphasis of this work is on the application of Krylov projection techniques such as Generalized Minimal Residual (GMRES) to solve the Pressure Poisson Equations that result from discretisation of the NS equations. The performance of the GMRES method is compared with the traditional Gauss-Seidel (GS) and point successive over relaxation (PSOR) techniques through their application to simulate the dynamics of water housed inside a vertical cylindrical vessel which is subjected to microwave radiation. It is found that as the mesh size increases, GMRES gives the fastest convergence rate in terms of computational times and number of iterations.
Application of a sparseness constraint in multivariate curve resolution - Alternating least squares.

PubMed

Hugelier, Siewert; Piqueras, Sara; Bedia, Carmen; de Juan, Anna; Ruckebusch, Cyril

2018-02-13

The use of sparseness in chemometrics is a concept that has increased in popularity. The advantage is, above all, a better interpretability of the results obtained. In this work, sparseness is implemented as a constraint in multivariate curve resolution - alternating least squares (MCR-ALS), which aims at reproducing raw (mixed) data by a bilinear model of chemically meaningful profiles. In many cases, the mixed raw data analyzed are not sparse by nature, but their decomposition profiles can be, as it is the case in some instrumental responses, such as mass spectra, or in concentration profiles linked to scattered distribution maps of powdered samples in hyperspectral images. To induce sparseness in the constrained profiles, one-dimensional and/or two-dimensional numerical arrays can be fitted using a basis of Gaussian functions with a penalty on the coefficients. In this work, a least squares regression framework with L 0 -norm penalty is applied. This L 0 -norm penalty constrains the number of non-null coefficients in the fit of the array constrained without having an a priori on the number and their positions. It has been shown that the sparseness constraint induces the suppression of values linked to uninformative channels and noise in MS spectra and improves the location of scattered compounds in distribution maps, resulting in a better interpretability of the constrained profiles. An additional benefit of the sparseness constraint is a lower ambiguity in the bilinear model, since the major presence of null coefficients in the constrained profiles also helps to limit the solutions for the profiles in the counterpart matrix of the MCR bilinear model. Copyright © 2017 Elsevier B.V. All rights reserved.
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pinski, Peter; Riplinger, Christoph; Neese, Frank, E-mail: evaleev@vt.edu, E-mail: frank.neese@cec.mpg.de

2015-07-21

In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implementsmore » sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.« less
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals.

PubMed

Pinski, Peter; Riplinger, Christoph; Valeev, Edward F; Neese, Frank

2015-07-21

In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implements sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.