parallel structurally-symmetric sparse: Topics by Science.gov

Sample records for parallel structurally-symmetric sparse

Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

2005-01-01

The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
A Performance Comparison of the Parallel Preconditioners for Iterative Methods for Large Sparse Linear Systems Arising from Partial Differential Equations on Structured Grids

NASA Astrophysics Data System (ADS)

Ma, Sangback

In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
Algorithms and Application of Sparse Matrix Assembly and Equation Solvers for Aeroacoustics

NASA Technical Reports Server (NTRS)

Watson, W. R.; Nguyen, D. T.; Reddy, C. J.; Vatsa, V. N.; Tang, W. H.

2001-01-01

An algorithm for symmetric sparse equation solutions on an unstructured grid is described. Efficient, sequential sparse algorithms for degree-of-freedom reordering, supernodes, symbolic/numerical factorization, and forward backward solution phases are reviewed. Three sparse algorithms for the generation and assembly of symmetric systems of matrix equations are presented. The accuracy and numerical performance of the sequential version of the sparse algorithms are evaluated over the frequency range of interest in a three-dimensional aeroacoustics application. Results show that the solver solutions are accurate using a discretization of 12 points per wavelength. Results also show that the first assembly algorithm is impractical for high-frequency noise calculations. The second and third assembly algorithms have nearly equal performance at low values of source frequencies, but at higher values of source frequencies the third algorithm saves CPU time and RAM. The CPU time and the RAM required by the second and third assembly algorithms are two orders of magnitude smaller than that required by the sparse equation solver. A sequential version of these sparse algorithms can, therefore, be conveniently incorporated into a substructuring for domain decomposition formulation to achieve parallel computation, where different substructures are handles by different parallel processors.
A new scheduling algorithm for parallel sparse LU factorization with static pivoting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigori, Laura; Li, Xiaoye S.

2002-08-20

In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.
Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

NASA Astrophysics Data System (ADS)

Galiatsatos, P. G.; Tennyson, J.

2012-11-01

The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

DOE PAGES

Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; ...

2015-07-14

In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important featuresmore » of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.« less
An M-step preconditioned conjugate gradient method for parallel computation

NASA Technical Reports Server (NTRS)

Adams, L.

1983-01-01

This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE PAGES

Li, Ruipeng; Saad, Yousef

2017-08-01

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Ruipeng; Saad, Yousef

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
An Efficient Scheme for Updating Sparse Cholesky Factors

NASA Technical Reports Server (NTRS)

Raghavan, Padma

2002-01-01

Raghavan had earlier developed the software package DCSPACK which can be used for solving sparse linear systems where the coefficient matrix is symmetric and positive definite (this project was not funded by NASA but by agencies such as NSF). DSCPACK-S is the serial code and DSCPACK-P is a parallel implementation suitable for multiprocessors or networks-of-workstations with message passing using MCI. The main algorithm used is the Cholesky factorization of a sparse symmetric positive positive definite matrix A = LL(T). The code can also compute the factorization A = LDL(T). The complexity of the software arises from several factors relating to the sparsity of the matrix A. A sparse N x N matrix A has typically less that cN nonzeroes where c is a small constant. If the matrix were dense, it would have O(N2) nonzeroes. The most complicated part of such sparse Cholesky factorization relates to fill-in, i.e., zeroes in the original matrix that become nonzeroes in the factor L. An efficient implementation depends to a large extent on complex data structures and on techniques from graph theory to reduce, identify, and manage fill. DSCPACK is based on an efficient multifrontal implementation with fill-managing algorithms and implementation arising from earlier research by Raghavan and others. Sparse Cholesky factorization is typically a four step process: (1) ordering to compute a fill-reducing numbering, (2) symbolic factorization to determine the nonzero structure of L, (3) numeric factorization to compute L, and, (4) triangular solution to solve L(T)x = y and Ly = b. The first two steps are symbolic and are performed using the graph of the matrix. The numeric factorization step is of dominant cost and there are several schemes for improving performance by exploiting the nested and dense structure of groups of columns in the factor. The latter are aimed at better utilization of the cache-memory hierarchy on modem processors to prevent cache-misses and provide execution rates (operations/second) that are close to the peak rates for dense matrix computations. Currently, EPISCOPACY is being used in an application at NASA directed by J. Newman and M. James. We propose the implementation of efficient schemes for updating the LL(T) or LDL(T) factors computed in DSCPACK-S to meet the computational requirements of their project. A brief description is provided in the next section.
Modeling an in-register, parallel "iowa" aβ fibril structure using solid-state NMR data from labeled samples with rosetta.

PubMed

Sgourakis, Nikolaos G; Yau, Wai-Ming; Qiang, Wei

2015-01-06

Determining the structures of amyloid fibrils is an important first step toward understanding the molecular basis of neurodegenerative diseases. For β-amyloid (Aβ) fibrils, conventional solid-state NMR structure determination using uniform labeling is limited by extensive peak overlap. We describe the characterization of a distinct structural polymorph of Aβ using solid-state NMR, transmission electron microscopy (TEM), and Rosetta model building. First, the overall fibril arrangement is established using mass-per-length measurements from TEM. Then, the fibril backbone arrangement, stacking registry, and "steric zipper" core interactions are determined using a number of solid-state NMR techniques on sparsely (13)C-labeled samples. Finally, we perform Rosetta structure calculations with an explicitly symmetric representation of the system. We demonstrate the power of the hybrid Rosetta/NMR approach by modeling the in-register, parallel "Iowa" mutant (D23N) at high resolution (1.2Å backbone rmsd). The final models are validated using an independent set of NMR experiments that confirm key features. Copyright © 2015 Elsevier Ltd. All rights reserved.
Highly parallel sparse Cholesky factorization

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Schreiber, Robert

1990-01-01

Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
Brief announcement: Hypergraph parititioning for parallel sparse matrix-matrix multiplication

DOE PAGES

Ballard, Grey; Druinsky, Alex; Knight, Nicholas; ...

2015-01-01

The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Furthermore, our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computationmore » to improve application-specific algorithms for multiplying sparse matrices.« less
Storage of sparse files using parallel log-structured file system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Grider, Gary

A sparse file is stored without holes by storing a data portion of the sparse file using a parallel log-structured file system; and generating an index entry for the data portion, the index entry comprising a logical offset, physical offset and length of the data portion. The holes can be restored to the sparse file upon a reading of the sparse file. The data portion can be stored at a logical end of the sparse file. Additional storage efficiency can optionally be achieved by (i) detecting a write pattern for a plurality of the data portions and generating a singlemore » patterned index entry for the plurality of the patterned data portions; and/or (ii) storing the patterned index entries for a plurality of the sparse files in a single directory, wherein each entry in the single directory comprises an identifier of a corresponding sparse file.« less
Iterative algorithms for large sparse linear systems on parallel computers

NASA Technical Reports Server (NTRS)

Adams, L. M.

1982-01-01

Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Solving large sparse eigenvalue problems on supercomputers

NASA Technical Reports Server (NTRS)

Philippe, Bernard; Saad, Youcef

1988-01-01

An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
Tensor Sparse Coding for Positive Definite Matrices.

PubMed

Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikos

2013-08-02

In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for e.g., image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.

Tensor sparse coding for positive definite matrices.

PubMed

Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikolaos

2014-03-01

In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for example, image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

B. Hendrickson; T.G. Kolda

1998-09-01

A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
Community Detection in Sparse Random Networks

DTIC Science & Technology

2013-08-13

if, (i, j) ∈ E , meaning there is an edge between nodes i, j ∈ V. Note that W is symmetric, and we assume that Wii = 0 for all i. Under the null... Wii = 0.) Our arguments are parallel to those we used under P0, the only difficulty being that Wi is not binomial anymore. Indeed, WSi ∼ Bin(n − 1, p1...Berlin: Springer. Alon, N. and S. Gutner (2010). Balanced families of perfect hash functions and their applications. ACM Trans. Algorithms 6 (3), Art
A performance study of sparse Cholesky factorization on INTEL iPSC/860

NASA Technical Reports Server (NTRS)

Zubair, M.; Ghose, M.

1992-01-01

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
Object-Oriented Design for Sparse Direct Solvers

NASA Technical Reports Server (NTRS)

Dobrian, Florin; Kumfert, Gary; Pothen, Alex

1999-01-01

We discuss the object-oriented design of a software package for solving sparse, symmetric systems of equations (positive definite and indefinite) by direct methods. At the highest layers, we decouple data structure classes from algorithmic classes for flexibility. We describe the important structural and algorithmic classes in our design, and discuss the trade-offs we made for high performance. The kernels at the lower layers were optimized by hand. Our results show no performance loss from our object-oriented design, while providing flexibility, case of use, and extensibility over solvers using procedural design.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.

2005-01-01

A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.
FPGA implementation of sparse matrix algorithm for information retrieval

NASA Astrophysics Data System (ADS)

Bojanic, Slobodan; Jevtic, Ruzica; Nieto-Taladriz, Octavio

2005-06-01

Information text data retrieval requires a tremendous amount of processing time because of the size of the data and the complexity of information retrieval algorithms. In this paper the solution to this problem is proposed via hardware supported information retrieval algorithms. Reconfigurable computing may adopt frequent hardware modifications through its tailorable hardware and exploits parallelism for a given application through reconfigurable and flexible hardware units. The degree of the parallelism can be tuned for data. In this work we implemented standard BLAS (basic linear algebra subprogram) sparse matrix algorithm named Compressed Sparse Row (CSR) that is showed to be more efficient in terms of storage space requirement and query-processing timing over the other sparse matrix algorithms for information retrieval application. Although inverted index algorithm is treated as the de facto standard for information retrieval for years, an alternative approach to store the index of text collection in a sparse matrix structure gains more attention. This approach performs query processing using sparse matrix-vector multiplication and due to parallelization achieves a substantial efficiency over the sequential inverted index. The parallel implementations of information retrieval kernel are presented in this work targeting the Virtex II Field Programmable Gate Arrays (FPGAs) board from Xilinx. A recent development in scientific applications is the use of FPGA to achieve high performance results. Computational results are compared to implementations on other platforms. The design achieves a high level of parallelism for the overall function while retaining highly optimised hardware within processing unit.
Sparse matrix-vector multiplication on network-on-chip

NASA Astrophysics Data System (ADS)

Sun, C.-C.; Götze, J.; Jheng, H.-Y.; Ruan, S.-J.

2010-12-01

In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.
Using Chebyshev polynomials and approximate inverse triangular factorizations for preconditioning the conjugate gradient method

NASA Astrophysics Data System (ADS)

Kaporin, I. E.

2012-02-01

In order to precondition a sparse symmetric positive definite matrix, its approximate inverse is examined, which is represented as the product of two sparse mutually adjoint triangular matrices. In this way, the solution of the corresponding system of linear algebraic equations (SLAE) by applying the preconditioned conjugate gradient method (CGM) is reduced to performing only elementary vector operations and calculating sparse matrix-vector products. A method for constructing the above preconditioner is described and analyzed. The triangular factor has a fixed sparsity pattern and is optimal in the sense that the preconditioned matrix has a minimum K-condition number. The use of polynomial preconditioning based on Chebyshev polynomials makes it possible to considerably reduce the amount of scalar product operations (at the cost of an insignificant increase in the total number of arithmetic operations). The possibility of an efficient massively parallel implementation of the resulting method for solving SLAEs is discussed. For a sequential version of this method, the results obtained by solving 56 test problems from the Florida sparse matrix collection (which are large-scale and ill-conditioned) are presented. These results show that the method is highly reliable and has low computational costs.
Investigation of wall-bounded turbulence over sparsely distributed roughness

NASA Astrophysics Data System (ADS)

Placidi, Marco; Ganapathisubramani, Bharath

2011-11-01

The effects of sparsely distributed roughness elements on the structure of a turbulent boundary layer are examined by performing a series of Particle Image Velocimetry (PIV) experiments in a wind tunnel. From the literature, the best way to characterise a rough wall, especially one where the density of roughness elements is sparse, is unclear. In this study, rough surfaces consisting of sparsely and uniformly distributed LEGO® blocks are used. Five different patterns are adopted in order to examine the effects of frontal solidity (λf, frontal area of the roughness elements per unit wall-parallel area), plan solidity (λp, plan area of roughness elements per unit wall-parallel area) and the geometry of the roughness element (square and cylindrical elements), on the turbulence structure. The Karman number, Reτ , has been matched, at the value of approximately 2300, in order to compare across the different cases. In the talk, we will present detailed analysis of mean and rms velocity profiles, Reynolds stresses and quadrant decomposition.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Laplace-domain waveform modeling and inversion for the 3D acoustic-elastic coupled media

NASA Astrophysics Data System (ADS)

Shin, Jungkyun; Shin, Changsoo; Calandra, Henri

2016-06-01

Laplace-domain waveform inversion reconstructs long-wavelength subsurface models by using the zero-frequency component of damped seismic signals. Despite the computational advantages of Laplace-domain waveform inversion over conventional frequency-domain waveform inversion, an acoustic assumption and an iterative matrix solver have been used to invert 3D marine datasets to mitigate the intensive computing cost. In this study, we develop a Laplace-domain waveform modeling and inversion algorithm for 3D acoustic-elastic coupled media by using a parallel sparse direct solver library (MUltifrontal Massively Parallel Solver, MUMPS). We precisely simulate a real marine environment by coupling the 3D acoustic and elastic wave equations with the proper boundary condition at the fluid-solid interface. In addition, we can extract the elastic properties of the Earth below the sea bottom from the recorded acoustic pressure datasets. As a matrix solver, the parallel sparse direct solver is used to factorize the non-symmetric impedance matrix in a distributed memory architecture and rapidly solve the wave field for a number of shots by using the lower and upper matrix factors. Using both synthetic datasets and real datasets obtained by a 3D wide azimuth survey, the long-wavelength component of the P-wave and S-wave velocity models is reconstructed and the proposed modeling and inversion algorithm are verified. A cluster of 80 CPU cores is used for this study.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains.

PubMed

Jha, Ashwani; Flurchick, K M; Bikdash, Marwan; Kc, Dukka B

2016-01-01

Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10-15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains

PubMed Central

Jha, Ashwani; Flurchick, K. M.; Bikdash, Marwan

2016-01-01

Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors. PMID:27747230
Two-stage bulk electron heating in the diffusion region of anti-parallel symmetric reconnection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Le, Ari Yitzchak; Egedal, Jan; Daughton, William Scott

2016-10-13

Electron bulk energization in the diffusion region during anti-parallel symmetric reconnection entails two stages. First, the inflowing electrons are adiabatically trapped and energized by an ambipolar parallel electric field. Next, the electrons gain energy from the reconnection electric field as they undergo meandering motion. These collisionless mechanisms have been described previously, and they lead to highly structured electron velocity distributions. Furthermore, a simplified control-volume analysis gives estimates for how the net effective heating scales with the upstream plasma conditions in agreement with fully kinetic simulations and spacecraft observations.
Communication requirements of sparse Cholesky factorization with nested dissection ordering

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Load distribution schemes for minimizing the communication requirements of the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems are presented. The total data traffic in factoring an n x n sparse symmetric positive definite matrix representing an n-vertex regular two-dimensional grid graph using n exp alpha, alpha not greater than 1, processors are shown to be O(n exp 1 + alpha/2). It is O(n), when n exp alpha, alpha not smaller than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal.
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Bézier B¯ projection

NASA Astrophysics Data System (ADS)

Miao, Di; Borden, Michael J.; Scott, Michael A.; Thomas, Derek C.

2018-06-01

In this paper we demonstrate the use of B\\'{e}zier projection to alleviate locking phenomena in structural mechanics applications of isogeometric analysis. Interpreting the well-known $\\bar{B}$ projection in two different ways we develop two formulations for locking problems in beams and nearly incompressible elastic solids. One formulation leads to a sparse symmetric symmetric system and the other leads to a sparse non-symmetric system. To demonstrate the utility of B\\'{e}zier projection for both geometry and material locking phenomena we focus on transverse shear locking in Timoshenko beams and volumetric locking in nearly compressible linear elasticity although the approach can be applied generally to other types of locking phenemona as well. B\\'{e}zier projection is a local projection technique with optimal approximation properties, which in many cases produces solutions that are comparable to global $L^2$ projection. In the context of $\\bar{B}$ methods, the use of B\\'ezier projection produces sparse stiffness matrices with only a slight increase in bandwidth when compared to standard displacement-based methods. Of particular importance is that the approach is applicable to any spline representation that can be written in B\\'ezier form like NURBS, T-splines, LR-splines, etc. We discuss in detail how to integrate this approach into an existing finite element framework with minimal disruption through the use of B\\'ezier extraction operators and a newly introduced dual basis for the B\\'{e}zierprojection operator. We then demonstrate the behavior of the two proposed formulations through several challenging benchmark problems.

Weakly Interacting Symmetric and Anti-Symmetric States in the Bilayer Systems

NASA Astrophysics Data System (ADS)

Marchewka, M.; Sheregii, E. M.; Tralle, I.; Tomaka, G.; Ploch, D.

We have studied the parallel magneto-transport in DQW-structures of two different potential shapes: quasi-rectangular and quasi-triangular. The quantum beats effect was observed in Shubnikov-de Haas (SdH) oscillations for both types of the DQW structures in perpendicular magnetic filed arrangement. We developed a special scheme for the Landau levels energies calculation by means of which we carried out the necessary simulations of beating effect. In order to obtain the agreement between our experimental data and the results of simulations, we introduced two different quasi-Fermi levels which characterize symmetric and anti-symmetric states in DQWs. The existence of two different quasi Fermi-Levels simply means, that one can treat two sub-systems (charge carriers characterized by symmetric and anti-symmetric wave functions) as weakly interacting and having their own rate of establishing the equilibrium state.
Liver segmentation from CT images using a sparse priori statistical shape model (SP-SSM).

PubMed

Wang, Xuehu; Zheng, Yongchang; Gan, Lan; Wang, Xuan; Sang, Xinting; Kong, Xiangfeng; Zhao, Jie

2017-01-01

This study proposes a new liver segmentation method based on a sparse a priori statistical shape model (SP-SSM). First, mark points are selected in the liver a priori model and the original image. Then, the a priori shape and its mark points are used to obtain a dictionary for the liver boundary information. Second, the sparse coefficient is calculated based on the correspondence between mark points in the original image and those in the a priori model, and then the sparse statistical model is established by combining the sparse coefficients and the dictionary. Finally, the intensity energy and boundary energy models are built based on the intensity information and the specific boundary information of the original image. Then, the sparse matching constraint model is established based on the sparse coding theory. These models jointly drive the iterative deformation of the sparse statistical model to approximate and accurately extract the liver boundaries. This method can solve the problems of deformation model initialization and a priori method accuracy using the sparse dictionary. The SP-SSM can achieve a mean overlap error of 4.8% and a mean volume difference of 1.8%, whereas the average symmetric surface distance and the root mean square symmetric surface distance can reach 0.8 mm and 1.4 mm, respectively.
Disentangling giant component and finite cluster contributions in sparse random matrix spectra.

PubMed

Kühn, Reimer

2016-04-01

We describe a method for disentangling giant component and finite cluster contributions to sparse random matrix spectra, using sparse symmetric random matrices defined on Erdős-Rényi graphs as an example and test bed. Our methods apply to sparse matrices defined in terms of arbitrary graphs in the configuration model class, as long as they have finite mean degree.
Inequality across consonantal contrasts in speech perception: evidence from mismatch negativity.

PubMed

Cornell, Sonia A; Lahiri, Aditi; Eulitz, Carsten

2013-06-01

The precise structure of speech sound representations is still a matter of debate. In the present neurobiological study, we compared predictions about differential sensitivity to speech contrasts between models that assume full specification of all phonological information in the mental lexicon with those assuming sparse representations (only contrastive or otherwise not predictable information is stored). In a passive oddball paradigm, we studied the contrast sensitivity as reflected in the mismatch negativity (MMN) response to changes in the manner of articulation, as well as place of articulation of consonants in intervocalic positions of nonwords (manner of articulation: [edi ~ eni], [ezi ~ eni]; place of articulation: [edi ~ egi]). Models that assume full specification of all phonological information in the mental lexicon posit equal MMNs within each contrast (symmetric MMNs), that is, changes from standard [edi] to deviant [eni] elicit a similar MMN response as changes from standard [eni] to deviant [edi]. In contrast, models that assume sparse representations predict that only the [ezi] ~ [eni] reversals will evoke symmetric MMNs because of their conflicting fully specified manner features. Asymmetric MMNs are predicted, however, for the reversals of [edi] ~ [eni] and [edi] ~ [egi] because either a manner or place property in each pair is not fully specified in the mental lexicon. Our results show a pattern of symmetric and asymmetric MMNs that is in line with predictions of the featurally underspecified lexicon model that assumes sparse phonological representations. We conclude that the brain refers to underspecified phonological representations during speech perception. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Cucheb: A GPU implementation of the filtered Lanczos procedure

NASA Astrophysics Data System (ADS)

Aurentz, Jared L.; Kalantzis, Vassilis; Saad, Yousef

2017-11-01

This paper describes the software package Cucheb, a GPU implementation of the filtered Lanczos procedure for the solution of large sparse symmetric eigenvalue problems. The filtered Lanczos procedure uses a carefully chosen polynomial spectral transformation to accelerate convergence of the Lanczos method when computing eigenvalues within a desired interval. This method has proven particularly effective for eigenvalue problems that arise in electronic structure calculations and density functional theory. We compare our implementation against an equivalent CPU implementation and show that using the GPU can reduce the computation time by more than a factor of 10. Program Summary Program title: Cucheb Program Files doi:http://dx.doi.org/10.17632/rjr9tzchmh.1 Licensing provisions: MIT Programming language: CUDA C/C++ Nature of problem: Electronic structure calculations require the computation of all eigenvalue-eigenvector pairs of a symmetric matrix that lie inside a user-defined real interval. Solution method: To compute all the eigenvalues within a given interval a polynomial spectral transformation is constructed that maps the desired eigenvalues of the original matrix to the exterior of the spectrum of the transformed matrix. The Lanczos method is then used to compute the desired eigenvectors of the transformed matrix, which are then used to recover the desired eigenvalues of the original matrix. The bulk of the operations are executed in parallel using a graphics processing unit (GPU). Runtime: Variable, depending on the number of eigenvalues sought and the size and sparsity of the matrix. Additional comments: Cucheb is compatible with CUDA Toolkit v7.0 or greater.
Fast sparsely synchronized brain rhythms in a scale-free neural network

NASA Astrophysics Data System (ADS)

Kim, Sang-Yoon; Lim, Woochang

2015-08-01

We consider a directed version of the Barabási-Albert scale-free network model with symmetric preferential attachment with the same in- and out-degrees and study the emergence of sparsely synchronized rhythms for a fixed attachment degree in an inhibitory population of fast-spiking Izhikevich interneurons. Fast sparsely synchronized rhythms with stochastic and intermittent neuronal discharges are found to appear for large values of J (synaptic inhibition strength) and D (noise intensity). For an intensive study we fix J at a sufficiently large value and investigate the population states by increasing D . For small D , full synchronization with the same population-rhythm frequency fp and mean firing rate (MFR) fi of individual neurons occurs, while for large D partial synchronization with fp> ( : ensemble-averaged MFR) appears due to intermittent discharge of individual neurons; in particular, the case of fp>4 is referred to as sparse synchronization. For the case of partial and sparse synchronization, MFRs of individual neurons vary depending on their degrees. As D passes a critical value D* (which is determined by employing an order parameter), a transition to unsynchronization occurs due to the destructive role of noise to spoil the pacing between sparse spikes. For D
New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1998-01-01

Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.
MMS Observations of Large Guide Field Symmetric Reconnection Between Colliding Reconnection Jets at the Center of a Magnetic Flux Rope at the Magnetopause

NASA Technical Reports Server (NTRS)

Oieroset, M.; Phan, T. D.; Haggerty, C.; Shay, M. A.; Eastwood, J. P.; Gershman, D. J.; Drake, J. F.; Fujimoto, M.; Ergun, R. E.; Mozer, F. S.;

2016-01-01

We report evidence for reconnection between colliding reconnection jets in a compressed current sheet at the center of a magnetic flux rope at Earth's magnetopause. The reconnection involved nearly symmetric Inflow boundary conditions with a strong guide field of two. The thin (2.5 ion-skin depth (d(sub i) width) current sheet (at approximately 12 d(sub i) downstream of the X line) was well resolved by MMS, which revealed large asymmetries in plasma and field structures in the exhaust. Ion perpendicular heating, electron parallel heating, and density compression occurred on one side of the exhaust, while ion parallel heating and density depression were shifted to the other side. The normal electric field and double out-of-plane (bifurcated) currents spanned almost the entire exhaust. These observations are in good agreement with a kinetic simulation for similar boundary conditions, demonstrating in new detail that the structure of large guide field symmetric reconnection is distinctly different from antiparallel reconnection.

MMS observations of large guide field symmetric reconnection between colliding reconnection jets at the center of a magnetic flux rope at the magnetopause

NASA Astrophysics Data System (ADS)

Øieroset, M.; Phan, T. D.; Haggerty, C.; Shay, M. A.; Eastwood, J. P.; Gershman, D. J.; Drake, J. F.; Fujimoto, M.; Ergun, R. E.; Mozer, F. S.; Oka, M.; Torbert, R. B.; Burch, J. L.; Wang, S.; Chen, L. J.; Swisdak, M.; Pollock, C.; Dorelli, J. C.; Fuselier, S. A.; Lavraud, B.; Giles, B. L.; Moore, T. E.; Saito, Y.; Avanov, L. A.; Paterson, W.; Strangeway, R. J.; Russell, C. T.; Khotyaintsev, Y.; Lindqvist, P. A.; Malakit, K.

2016-06-01

We report evidence for reconnection between colliding reconnection jets in a compressed current sheet at the center of a magnetic flux rope at Earth's magnetopause. The reconnection involved nearly symmetric inflow boundary conditions with a strong guide field of two. The thin (2.5 ion-skin depth (di) width) current sheet (at ~12 di downstream of the X line) was well resolved by MMS, which revealed large asymmetries in plasma and field structures in the exhaust. Ion perpendicular heating, electron parallel heating, and density compression occurred on one side of the exhaust, while ion parallel heating and density depression were shifted to the other side. The normal electric field and double out-of-plane (bifurcated) currents spanned almost the entire exhaust. These observations are in good agreement with a kinetic simulation for similar boundary conditions, demonstrating in new detail that the structure of large guide field symmetric reconnection is distinctly different from antiparallel reconnection.
MMS observations of large guide field symmetric reconnection between colliding reconnection jets at the center of a magnetic flux rope at the magnetopause

NASA Astrophysics Data System (ADS)

Oieroset, M.; Phan, T.; Haggerty, C. C.; Shay, M. A.; Eastwood, J. P.; Gershman, D. J.; Drake, J. F.; Fujimoto, M.; Ergun, R.; Mozer, F.; Oka, M.; Torbert, R. B.; Burch, J. L.; Wang, S.; Chen, L. J.; Swisdak, M.; Pollock, C.; Dorelli, J.; Fuselier, S. A.; Lavraud, B.; Giles, B. L.; Moore, T. E.; Saito, Y.; Avanov, L. A.; Paterson, W. R.; Strangeway, R. J.; Russell, C. T.; Khotyaintsev, Y. V.; Lindqvist, P. A.; Malakit, K.

2016-12-01

We report evidence for reconnection between colliding reconnection jets in a compressed current sheet at the center of a magnetic flux rope at Earth's magnetopause. The reconnection involved nearly symmetric inflow boundary conditions with a strong guide field of two. The thin (2.5 ion-skin depth (di) width) current sheet (at 12 di downstream of the X line) was well resolved by Magnetospheric Multiscale, which revealed large asymmetries in plasma and field structures in the exhaust. Ion perpendicular heating, electron parallel heating, and density compression occurred on one side of the exhaust, while ion parallel heating and density depression were shifted to the other side. The normal electric field and double out-of-plane (bifurcated) currents spanned almost the entire exhaust. These observations are in good agreement with a kinetic simulation for similar boundary conditions, demonstrating in new detail that the structure of large guide field symmetric reconnection is distinctly different from antiparallel reconnection.
Magnetospectroscopy of symmetric and anti-symmetric states in double quantum wells

NASA Astrophysics Data System (ADS)

Marchewka, M.; Sheregii, E. M.; Tralle, I.; Ploch, D.; Tomaka, G.; Furdak, M.; Kolek, A.; Stadler, A.; Mleczko, K.; Zak, D.; Strupinski, W.; Jasik, A.; Jakiela, R.

2008-02-01

The experimental results obtained for magnetotransport in the InGaAs/InAlAs double quantum well (DQW) structures of two different shapes of wells are reported. A beating effect occurring in the Shubnikov-de Haas (SdH) oscillations was observed for both types of structures at low temperatures in the parallel transport when the magnetic field was perpendicular to the layers. An approach for the calculation of the Landau level energies for DQW structures was developed and then applied to the analysis and interpretation of the experimental data related to the beating effect. We also argue that in order to account for the observed magnetotransport phenomena (SdH and integer quantum Hall effect), one should introduce two different quasi-Fermi levels characterizing two electron subsystems regarding the symmetry properties of their states, symmetric and anti-symmetric ones, which are not mixed by electron-electron interaction.
Defect classification in sparsity-based structural health monitoring

NASA Astrophysics Data System (ADS)

Golato, Andrew; Ahmad, Fauzia; Santhanam, Sridhar; Amin, Moeness G.

2017-05-01

Guided waves have gained popularity in structural health monitoring (SHM) due to their ability to inspect large areas with little attenuation, while providing rich interactions with defects. For thin-walled structures, the propagating waves are Lamb waves, which are a complex but well understood type of guided waves. Recent works have cast the defect localization problem of Lamb wave based SHM within the sparse reconstruction framework. These methods make use of a linear model relating the measurements with the scene reflectivity under the assumption of point-like defects. However, most structural defects are not perfect points but tend to assume specific forms, such as surface cracks or internal cracks. Knowledge of the "type" of defects is useful in the assessment phase of SHM. In this paper, we present a dual purpose sparsity-based imaging scheme which, in addition to accurately localizing defects, properly classifies the defects present simultaneously. The proposed approach takes advantage of the bias exhibited by certain types of defects toward a specific Lamb wave mode. For example, some defects strongly interact with the anti-symmetric modes, while others strongly interact with the symmetric modes. We build model based dictionaries for the fundamental symmetric and anti-symmetric wave modes, which are then utilized in unison to properly localize and classify the defects present. Simulated data of surface and internal defects in a thin Aluminum plate are used to validate the proposed scheme.
Optimal parallel solution of sparse triangular systems

NASA Technical Reports Server (NTRS)

Alvarado, Fernando L.; Schreiber, Robert

1990-01-01

A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.
Fast sparsely synchronized brain rhythms in a scale-free neural network.

PubMed

Kim, Sang-Yoon; Lim, Woochang

2015-08-01

We consider a directed version of the Barabási-Albert scale-free network model with symmetric preferential attachment with the same in- and out-degrees and study the emergence of sparsely synchronized rhythms for a fixed attachment degree in an inhibitory population of fast-spiking Izhikevich interneurons. Fast sparsely synchronized rhythms with stochastic and intermittent neuronal discharges are found to appear for large values of J (synaptic inhibition strength) and D (noise intensity). For an intensive study we fix J at a sufficiently large value and investigate the population states by increasing D. For small D, full synchronization with the same population-rhythm frequency fp and mean firing rate (MFR) fi of individual neurons occurs, while for large D partial synchronization with fp>〈fi〉 (〈fi〉: ensemble-averaged MFR) appears due to intermittent discharge of individual neurons; in particular, the case of fp>4〈fi〉 is referred to as sparse synchronization. For the case of partial and sparse synchronization, MFRs of individual neurons vary depending on their degrees. As D passes a critical value D* (which is determined by employing an order parameter), a transition to unsynchronization occurs due to the destructive role of noise to spoil the pacing between sparse spikes. For D
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems

NASA Astrophysics Data System (ADS)

Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel

Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
Parallel pivoting combined with parallel reduction

NASA Technical Reports Server (NTRS)

Alaghband, Gita

1987-01-01

Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
Discrete integration of continuous Kalman filtering equations for time invariant second-order structural systems

NASA Technical Reports Server (NTRS)

Park, K. C.; Belvin, W. Keith

1990-01-01

A general form for the first-order representation of the continuous second-order linear structural-dynamics equations is introduced to derive a corresponding form of first-order continuous Kalman filtering equations. Time integration of the resulting equations is carried out via a set of linear multistep integration formulas. It is shown that a judicious combined selection of computational paths and the undetermined matrices introduced in the general form of the first-order linear structural systems leads to a class of second-order discrete Kalman filtering equations involving only symmetric sparse N x N solution matrices.
AZTEC. Parallel Iterative method Software for Solving Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.; Shadid, J.; Tuminaro, R.

1995-07-01

AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
User's Manual for PCSMS (Parallel Complex Sparse Matrix Solver). Version 1.

NASA Technical Reports Server (NTRS)

Reddy, C. J.

2000-01-01

PCSMS (Parallel Complex Sparse Matrix Solver) is a computer code written to make use of the existing real sparse direct solvers to solve complex, sparse matrix linear equations. PCSMS converts complex matrices into real matrices and use real, sparse direct matrix solvers to factor and solve the real matrices. The solution vector is reconverted to complex numbers. Though, this utility is written for Silicon Graphics (SGI) real sparse matrix solution routines, it is general in nature and can be easily modified to work with any real sparse matrix solver. The User's Manual is written to make the user acquainted with the installation and operation of the code. Driver routines are given to aid the users to integrate PCSMS routines in their own codes.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.

A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE PAGES

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...

2017-06-01

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
Computing row and column counts for sparse QR and LU factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilbert, John R.; Li, Xiaoye S.; Ng, Esmond G.

2001-01-01

We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in the number of nonzeros in that matrix. They may be used to set up data structures or schedule parallel operations in advance of the numerical factorization. The row and column counts we compute are upper bounds on the actual counts. If the input matrix is strong Hall and theremore » is no coincidental numerical cancellation, the counts are exact for QR factorization and are the tightest bounds possible for LU factorization. These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.« less
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
LANZ: Software solving the large sparse symmetric generalized eigenproblem

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.

1990-01-01

A package, LANZ, for solving the large symmetric generalized eigenproblem is described. The package was tested on four different architectures: Convex 200, CRAY Y-MP, Sun-3, and Sun-4. The package uses a Lanczos' method and is based on recent research into solving the generalized eigenproblem.
A Shifted Block Lanczos Algorithm 1: The Block Recurrence

NASA Technical Reports Server (NTRS)

Grimes, Roger G.; Lewis, John G.; Simon, Horst D.

1990-01-01

In this paper we describe a block Lanczos algorithm that is used as the key building block of a software package for the extraction of eigenvalues and eigenvectors of large sparse symmetric generalized eigenproblems. The software package comprises: a version of the block Lanczos algorithm specialized for spectrally transformed eigenproblems; an adaptive strategy for choosing shifts, and efficient codes for factoring large sparse symmetric indefinite matrices. This paper describes the algorithmic details of our block Lanczos recurrence. This uses a novel combination of block generalizations of several features that have only been investigated independently in the past. In particular new forms of partial reorthogonalization, selective reorthogonalization and local reorthogonalization are used, as is a new algorithm for obtaining the M-orthogonal factorization of a matrix. The heuristic shifting strategy, the integration with sparse linear equation solvers and numerical experience with the code are described in a companion paper.
Discrete Kalman filtering equations of second-order form for control-structure interaction simulations

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, K. F.; Belvin, W. Keith

1991-01-01

A second-order form of discrete Kalman filtering equations is proposed as a candidate state estimator for efficient simulations of control-structure interactions in coupled physical coordinate configurations as opposed to decoupled modal coordinates. The resulting matrix equation of the present state estimator consists of the same symmetric, sparse N x N coupled matrices of the governing structural dynamics equations as opposed to unsymmetric 2N x 2N state space-based estimators. Thus, in addition to substantial computational efficiency improvement, the present estimator can be applied to control-structure design optimization for which the physical coordinates associated with the mass, damping and stiffness matrices of the structure are needed instead of modal coordinates.
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

DOE PAGES

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry; ...

2016-10-27

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

DOE PAGES

Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...

2016-06-30

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

PubMed

Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

2014-06-10

We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.
A generalized geologic map of Mars.

NASA Technical Reports Server (NTRS)

Carr, M. H.; Masursky, H.; Saunders, R. S.

1973-01-01

A geologic map of Mars has been constructed largely on the basis of photographic evidence. Four classes of units are recognized: (1) primitive cratered terrain, (2) sparsely cratered volcanic eolian plains, (3) circular radially symmetric volcanic constructs such as shield volcanoes, domes, and craters, and (4) tectonic erosional units such as chaotic and channel deposits. Grabens are the main structural features; compressional and strike slip features are almost completely absent. Most grabens are part of a set radial to the main volcanic area, Tharsis.
MUTILS - a set of efficient modeling tools for multi-core CPUs implemented in MEX

NASA Astrophysics Data System (ADS)

Krotkiewski, Marcin; Dabrowski, Marcin

2013-04-01

The need for computational performance is common in scientific applications, and in particular in numerical simulations, where high resolution models require efficient processing of large amounts of data. Especially in the context of geological problems the need to increase the model resolution to resolve physical and geometrical complexities seems to have no limits. Alas, the performance of new generations of CPUs does not improve any longer by simply increasing clock speeds. Current industrial trends are to increase the number of computational cores. As a result, parallel implementations are required in order to fully utilize the potential of new processors, and to study more complex models. We target simulations on small to medium scale shared memory computers: laptops and desktop PCs with ~8 CPU cores and up to tens of GB of memory to high-end servers with ~50 CPU cores and hundereds of GB of memory. In this setting MATLAB is often the environment of choice for scientists that want to implement their own models with little effort. It is a useful general purpose mathematical software package, but due to its versatility some of its functionality is not as efficient as it could be. In particular, the challanges of modern multi-core architectures are not fully addressed. We have developed MILAMIN 2 - an efficient FEM modeling environment written in native MATLAB. Amongst others, MILAMIN provides functions to define model geometry, generate and convert structured and unstructured meshes (also through interfaces to external mesh generators), compute element and system matrices, apply boundary conditions, solve the system of linear equations, address non-linear and transient problems, and perform post-processing. MILAMIN strives to combine the ease of code development and the computational efficiency. Where possible, the code is optimized and/or parallelized within the MATLAB framework. Native MATLAB is augmented with the MUTILS library - a set of MEX functions that implement the computationally intensive, performance critical parts of the code, which we have identified to be bottlenecks. Here, we discuss the functionality and performance of the MUTILS library. Currently, it includes: 1. time and memory efficient assembly of sparse matrices for FEM simulations 2. parallel sparse matrix - vector product with optimizations speficic to symmetric matrices and multiple degrees of freedom per node 3. parallel point in triangle location and point in tetrahedron location for unstructured, adaptive 2D and 3D meshes (useful for 'marker in cell' type of methods) 4. parallel FEM interpolation for 2D and 3D meshes of elements of different types and orders, and for different number of degrees of freedom per node 5. a stand-alone, MEX implementation of the Conjugate Gradients iterative solver 6. interface to METIS graph partitioning and a fast implementation of RCM reordering
Data traffic reduction schemes for sparse Cholesky factorizations

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1988-01-01

Load distribution schemes are presented which minimize the total data traffic in the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems with local and shared memory. The total data traffic in factoring an n x n sparse, symmetric, positive definite matrix representing an n-vertex regular 2-D grid graph using n (sup alpha), alpha is equal to or less than 1, processors are shown to be O(n(sup 1 + alpha/2)). It is O(n(sup 3/2)), when n (sup alpha), alpha is equal to or greater than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal. The schemes allow efficient use of up to O(n) processors before the total data traffic reaches the maximum value of O(n(sup 3/2)). The partitioning employed within the scheme, allows a better utilization of the data accessed from shared memory than those of previously published methods.
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Parallel processing in finite element structural analysis

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1987-01-01

A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
Redundantly piezo-actuated XYθ z compliant mechanism for nano-positioning featuring simple kinematics, bi-directional motion and enlarged workspace

NASA Astrophysics Data System (ADS)

Zhu, Wu-Le; Zhu, Zhiwei; To, Suet; Liu, Qiang; Ju, Bing-Feng; Zhou, Xiaoqin

2016-12-01

This paper presents a novel redundantly piezo-actuated three-degree-of-freedom XYθ z compliant mechanism for nano-positioning, driven by four mirror-symmetrically configured piezoelectric actuators (PEAs). By means of differential motion principle, linearized kinematics and physically bi-directional motions in all the three directions are achieved. Meanwhile, the decoupled delivering of three-directional independent motions at the output end is accessible, and the essential parallel and mirror symmetric configuration guarantees large output stiffness, high natural frequencies, high accuracy as well as high structural compactness of the mechanism. Accurate kinematics analysis with consideration of input coupling indicates that the proposed redundantly actuated compliant mechanism can generate three-dimensional (3D) symmetric polyhedral workspace envelope with enlarged reachable workspace, as compared with the most common parallel XYθ z mechanism driven by three PEAs. Keeping a high consistence with both analytical and numerical models, the experimental results show the working ranges of ±6.21 μm and ±12.41 μm in X- and Y-directions, and that of ±873.2 μrad in θ z-direction with nano-positioning capability can be realized. The superior performances and easily achievable structure well facilitate practical applications of the proposed XYθ z compliant mechanism in nano-positioning systems.
Partitioning sparse matrices with eigenvectors of graphs

NASA Technical Reports Server (NTRS)

Pothen, Alex; Simon, Horst D.; Liou, Kang-Pu

1990-01-01

The problem of computing a small vertex separator in a graph arises in the context of computing a good ordering for the parallel factorization of sparse, symmetric matrices. An algebraic approach for computing vertex separators is considered in this paper. It is shown that lower bounds on separator sizes can be obtained in terms of the eigenvalues of the Laplacian matrix associated with a graph. The Laplacian eigenvectors of grid graphs can be computed from Kronecker products involving the eigenvectors of path graphs, and these eigenvectors can be used to compute good separators in grid graphs. A heuristic algorithm is designed to compute a vertex separator in a general graph by first computing an edge separator in the graph from an eigenvector of the Laplacian matrix, and then using a maximum matching in a subgraph to compute the vertex separator. Results on the quality of the separators computed by the spectral algorithm are presented, and these are compared with separators obtained from other algorithms for computing separators. Finally, the time required to compute the Laplacian eigenvector is reported, and the accuracy with which the eigenvector must be computed to obtain good separators is considered. The spectral algorithm has the advantage that it can be implemented on a medium-size multiprocessor in a straightforward manner.

An intercalation-locked parallel-stranded DNA tetraplex

DOE PAGES

Tripathi, S.; Zhang, D.; Paukstelis, P. J.

2015-01-27

DNA has proved to be an excellent material for nanoscale construction because complementary DNA duplexes are programmable and structurally predictable. However, in the absence of Watson–Crick pairings, DNA can be structurally more diverse. Here, we describe the crystal structures of d(ACTCGGATGAT) and the brominated derivative, d(AC BrUCGGA BrUGAT). These oligonucleotides form parallel-stranded duplexes with a crystallographically equivalent strand, resulting in the first examples of DNA crystal structures that contains four different symmetric homo base pairs. Two of the parallel-stranded duplexes are coaxially stacked in opposite directions and locked together to form a tetraplex through intercalation of the 5'-most A–A basemore » pairs between adjacent G–G pairs in the partner duplex. The intercalation region is a new type of DNA tertiary structural motif with similarities to the i-motif. 1H– 1H nuclear magnetic resonance and native gel electrophoresis confirmed the formation of a parallel-stranded duplex in solution. Finally, we modified specific nucleotide positions and added d(GAY) motifs to oligonucleotides and were readily able to obtain similar crystals. This suggests that this parallel-stranded DNA structure may be useful in the rational design of DNA crystals and nanostructures.« less
Incomplete Sparse Approximate Inverses for Parallel Preconditioning

DOE PAGES

Anzt, Hartwig; Huckle, Thomas K.; Bräckle, Jürgen; ...

2017-10-28

In this study, we propose a new preconditioning method that can be seen as a generalization of block-Jacobi methods, or as a simplification of the sparse approximate inverse (SAI) preconditioners. The “Incomplete Sparse Approximate Inverses” (ISAI) is in particular efficient in the solution of sparse triangular linear systems of equations. Those arise, for example, in the context of incomplete factorization preconditioning. ISAI preconditioners can be generated via an algorithm providing fine-grained parallelism, which makes them attractive for hardware with a high concurrency level. Finally, in a study covering a large number of matrices, we identify the ISAI preconditioner as anmore » attractive alternative to exact triangular solves in the context of incomplete factorization preconditioning.« less
How pattern is selected in drift wave turbulence: Role of parallel flow shear

NASA Astrophysics Data System (ADS)

Kosuga, Y.

2017-12-01

The role of parallel shear flow in the pattern selection problem in drift wave turbulence is discussed. Patterns of interest here are E × B convective cells, which include poloidally symmetric zonal flows and radially elongated streamers. The competition between zonal flow formation and streamer formation is analyzed in the context of modulational instability analysis, with the parallel flow shear as a parameter. For drift wave turbulence with k⊥ρs ≲ O (1 ) and without parallel flow coupling, zonal flows are preferred structures. While increasing the magnitude of parallel flow shear, streamer growth overcomes zonal flow growth. This is because the self-focusing effect of the modulational instability becomes more effective for streamers through density and parallel velocity modulation. As a consequence, the bursty release of free energy may result as the parallel flow shear increases.
Mode structure symmetry breaking of energetic particle driven beta-induced Alfvén eigenmode

NASA Astrophysics Data System (ADS)

Lu, Z. X.; Wang, X.; Lauber, Ph.; Zonca, F.

2018-01-01

The mode structure symmetry breaking of energetic particle driven Beta-induced Alfvén Eigenmode (BAE) is studied based on global theory and simulation. The weak coupling formula gives a reasonable estimate of the local eigenvalue compared with global hybrid simulation using XHMGC. The non-perturbative effect of energetic particles on global mode structure symmetry breaking in radial and parallel (along B) directions is demonstrated. With the contribution from energetic particles, two dimensional (radial and poloidal) BAE mode structures with symmetric/asymmetric tails are produced using an analytical model. It is demonstrated that the symmetry breaking in radial and parallel directions is intimately connected. The effects of mode structure symmetry breaking on nonlinear physics, energetic particle transport, and the possible insight for experimental studies are discussed.
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX

NASA Astrophysics Data System (ADS)

Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.; Wilcox, R. S.; Anderson, D. T.

2018-05-01

The radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C+6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Indications are that the radial electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory

NASA Astrophysics Data System (ADS)

Challacombe, Matt

2000-06-01

A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.
Line and point defects in nonlinear anisotropic solids

NASA Astrophysics Data System (ADS)

Golgoon, Ashkan; Yavari, Arash

2018-06-01

In this paper, we present some analytical solutions for the stress fields of nonlinear anisotropic solids with distributed line and point defects. In particular, we determine the stress fields of (i) a parallel cylindrically symmetric distribution of screw dislocations in infinite orthotropic and monoclinic media, (ii) a cylindrically symmetric distribution of parallel wedge disclinations in an infinite orthotropic medium, (iii) a distribution of edge dislocations in an orthotropic medium, and (iv) a spherically symmetric distribution of point defects in a transversely isotropic spherical ball.
The effect of asymmetrical electrode form after negative bias illuminated stress in amorphous IGZO thin film transistors

NASA Astrophysics Data System (ADS)

Su, Wan-Ching; Chang, Ting-Chang; Liao, Po-Yung; Chen, Yu-Jia; Chen, Bo-Wei; Hsieh, Tien-Yu; Yang, Chung-I.; Huang, Yen-Yu; Chang, Hsi-Ming; Chiang, Shin-Chuan; Chang, Kuan-Chang; Tsai, Tsung-Ming

2017-03-01

This paper investigates the degradation behavior of InGaZnO thin film transistors (TFTs) under negative bias illumination stress (NBIS). TFT devices with two different source and drain layouts were exanimated: one having a parallel format electrode and the other with UI format electrode. UI means that source/drain electrodes shapes is defined as a forked-shaped structure. The I-V curve of the parallel electrode exhibited a symmetric degradation under forward and reverse sweeping in the saturation region after 1000 s NBIS. In contrast, the I-V curve of the UI electrode structure under similar conditions was asymmetric. The UI electrode structure also shows a stretch-out phenomenon in its C-V measurement. Finally, this work utilizes the ISE-Technology Computer Aided Design (ISE-TCAD) system simulations, which simulate the electron field and IV curves, to analyze the mechanisms dominating the parallel and UI device degradation behaviors.
Array signal recovery algorithm for a single-RF-channel DBF array

NASA Astrophysics Data System (ADS)

Zhang, Duo; Wu, Wen; Fang, Da Gang

2016-12-01

An array signal recovery algorithm based on sparse signal reconstruction theory is proposed for a single-RF-channel digital beamforming (DBF) array. A single-RF-channel antenna array is a low-cost antenna array in which signals are obtained from all antenna elements by only one microwave digital receiver. The spatially parallel array signals are converted into time-sequence signals, which are then sampled by the system. The proposed algorithm uses these time-sequence samples to recover the original parallel array signals by exploiting the second-order sparse structure of the array signals. Additionally, an optimization method based on the artificial bee colony (ABC) algorithm is proposed to improve the reconstruction performance. Using the proposed algorithm, the motion compensation problem for the single-RF-channel DBF array can be solved effectively, and the angle and Doppler information for the target can be simultaneously estimated. The effectiveness of the proposed algorithms is demonstrated by the results of numerical simulations.
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carleton, James Brian; Parks, Michael L.

Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Tripathi, S.; Zhang, D.; Paukstelis, P. J.

DNA has proved to be an excellent material for nanoscale construction because complementary DNA duplexes are programmable and structurally predictable. However, in the absence of Watson–Crick pairings, DNA can be structurally more diverse. Here, we describe the crystal structures of d(ACTCGGATGAT) and the brominated derivative, d(AC BrUCGGA BrUGAT). These oligonucleotides form parallel-stranded duplexes with a crystallographically equivalent strand, resulting in the first examples of DNA crystal structures that contains four different symmetric homo base pairs. Two of the parallel-stranded duplexes are coaxially stacked in opposite directions and locked together to form a tetraplex through intercalation of the 5'-most A–A basemore » pairs between adjacent G–G pairs in the partner duplex. The intercalation region is a new type of DNA tertiary structural motif with similarities to the i-motif. 1H– 1H nuclear magnetic resonance and native gel electrophoresis confirmed the formation of a parallel-stranded duplex in solution. Finally, we modified specific nucleotide positions and added d(GAY) motifs to oligonucleotides and were readily able to obtain similar crystals. This suggests that this parallel-stranded DNA structure may be useful in the rational design of DNA crystals and nanostructures.« less
Sparse-view photoacoustic tomography using virtual parallel-projections and spatially adaptive filtering

NASA Astrophysics Data System (ADS)

Wang, Yihan; Lu, Tong; Wan, Wenbo; Liu, Lingling; Zhang, Songhe; Li, Jiao; Zhao, Huijuan; Gao, Feng

2018-02-01

To fully realize the potential of photoacoustic tomography (PAT) in preclinical and clinical applications, rapid measurements and robust reconstructions are needed. Sparse-view measurements have been adopted effectively to accelerate the data acquisition. However, since the reconstruction from the sparse-view sampling data is challenging, both of the effective measurement and the appropriate reconstruction should be taken into account. In this study, we present an iterative sparse-view PAT reconstruction scheme where a virtual parallel-projection concept matching for the proposed measurement condition is introduced to help to achieve the "compressive sensing" procedure of the reconstruction, and meanwhile the spatially adaptive filtering fully considering the a priori information of the mutually similar blocks existing in natural images is introduced to effectively recover the partial unknown coefficients in the transformed domain. Therefore, the sparse-view PAT images can be reconstructed with higher quality compared with the results obtained by the universal back-projection (UBP) algorithm in the same sparse-view cases. The proposed approach has been validated by simulation experiments, which exhibits desirable performances in image fidelity even from a small number of measuring positions.
Algorithms and software for solving finite element equations on serial and parallel architectures

NASA Technical Reports Server (NTRS)

Chu, Eleanor; George, Alan

1988-01-01

The primary objective was to compare the performance of state-of-the-art techniques for solving sparse systems with those that are currently available in the Computational Structural Mechanics (MSC) testbed. One of the first tasks was to become familiar with the structure of the testbed, and to install some or all of the SPARSPAK package in the testbed. A brief overview of the CSM Testbed software and its usage is presented. An overview of the sparse matrix research for the Testbed currently employed in the CSM Testbed is given. An interface which was designed and implemented as a research tool for installing and appraising new matrix processors in the CSM Testbed is described. The results of numerical experiments performed in solving a set of testbed demonstration problems using the processor SPK and other experimental processors are contained.
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Linear-scaling density-functional simulations of charged point defects in Al2O3 using hierarchical sparse matrix algebra.

PubMed

Hine, N D M; Haynes, P D; Mostofi, A A; Payne, M C

2010-09-21

We present calculations of formation energies of defects in an ionic solid (Al(2)O(3)) extrapolated to the dilute limit, corresponding to a simulation cell of infinite size. The large-scale calculations required for this extrapolation are enabled by developments in the approach to parallel sparse matrix algebra operations, which are central to linear-scaling density-functional theory calculations. The computational cost of manipulating sparse matrices, whose sizes are determined by the large number of basis functions present, is greatly improved with this new approach. We present details of the sparse algebra scheme implemented in the ONETEP code using hierarchical sparsity patterns, and demonstrate its use in calculations on a wide range of systems, involving thousands of atoms on hundreds to thousands of parallel processes.
Crooked fingers and sparse hair: an interesting case of trichorhinophalangeal syndrome type 1.

PubMed

Narayanan, Ramakrishna; Chennareddy, Srinivasa

2015-01-27

Trichorhinophalangeal syndrome type 1 is a rare skeletal dysplasia of autosomal-dominant inheritance due to defects in the TRPS-1 gene. The syndrome is characterised by sparse slow-growing hair, a bulbous pear-shaped nose, cone-shaped epiphyses and deformities of the interphalangeal joints resembling those in rheumatoid arthritis. We present a case of trichorhinophalangeal syndrome in a 23-year-old man who presented with symmetrical painless progressive deformity of the fingers in both hands. 2015 BMJ Publishing Group Ltd.
In-Situ Subsurface Coating of Corroded Steel Sheet Pile Structures: Final Report on Project F08-AR06

DTIC Science & Technology

2017-09-01

scraped the sheet pile wall with an excavator. After scraping the out-pans with a flat edge bucket, the contractor welded a blade on the bucket...unusual striations were parallel grooves running at 30 – 45 degrees from the vertical. Some patterns cross each other symmetrically. The stria- tions
Relational Discrimination by Pigeons in a Go/No-Go Procedure with Compound Stimuli: A Methodological Note

ERIC Educational Resources Information Center

Campos, Heloisa Cursi; Debert, Paula; Barros, Romariz da Silva; McIlvane, William J.

2011-01-01

A go/no-go procedure with compound stimuli typically establishes emergent behavior that parallels in structure and typical outcome that of conventional tests for symmetric, transitive, and equivalence relations in normally capable adults. The present study employed a go/no-go compound stimulus procedure with pigeons. During training, pecks to…
Research on Synthesis of Concurrent Computing Systems.

DTIC Science & Technology

1982-09-01

20 1.5.1 An Informal Description of the Techniques ....... ..................... 20 1.5 2 Formal Definitions of Aggregation and Virtualisation ...sparsely interconnected networks . We have also developed techniques to create Kung’s systolic array parallel structure from a specification of matrix...resufts of the computation of that element. For example, if A,j is computed using a single enumeration, then virtualisation would produce a three
Matching Pursuit with Asymmetric Functions for Signal Decomposition and Parameterization

PubMed Central

Spustek, Tomasz; Jedrzejczak, Wiesław Wiktor; Blinowska, Katarzyna Joanna

2015-01-01

The method of adaptive approximations by Matching Pursuit makes it possible to decompose signals into basic components (called atoms). The approach relies on fitting, in an iterative way, functions from a large predefined set (called dictionary) to an analyzed signal. Usually, symmetric functions coming from the Gabor family (sine modulated Gaussian) are used. However Gabor functions may not be optimal in describing waveforms present in physiological and medical signals. Many biomedical signals contain asymmetric components, usually with a steep rise and slower decay. For the decomposition of this kind of signal we introduce a dictionary of functions of various degrees of asymmetry – from symmetric Gabor atoms to highly asymmetric waveforms. The application of this enriched dictionary to Otoacoustic Emissions and Steady-State Visually Evoked Potentials demonstrated the advantages of the proposed method. The approach provides more sparse representation, allows for correct determination of the latencies of the components and removes the "energy leakage" effect generated by symmetric waveforms that do not sufficiently match the structures of the analyzed signal. Additionally, we introduced a time-frequency-amplitude distribution that is more adequate for representation of asymmetric atoms than the conventional time-frequency-energy distribution. PMID:26115480

Solving very large, sparse linear systems on mesh-connected parallel computers

NASA Technical Reports Server (NTRS)

Opsahl, Torstein; Reif, John

1987-01-01

The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX

DOE PAGES

Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.; ...

2018-03-07

In this paper, the radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C +6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Finally, indications are that the radialmore » electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.« less
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.

In this paper, the radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C +6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Finally, indications are that the radialmore » electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.« less
An overview of NSPCG: A nonsymmetric preconditioned conjugate gradient package

NASA Astrophysics Data System (ADS)

Oppe, Thomas C.; Joubert, Wayne D.; Kincaid, David R.

1989-05-01

The most recent research-oriented software package developed as part of the ITPACK Project is called "NSPCG" since it contains many nonsymmetric preconditioned conjugate gradient procedures. It is designed to solve large sparse systems of linear algebraic equations by a variety of different iterative methods. One of the main purposes for the development of the package is to provide a common modular structure for research on iterative methods for nonsymmetric matrices. Another purpose for the development of the package is to investigate the suitability of several iterative methods for vector computers. Since the vectorizability of an iterative method depends greatly on the matrix structure, NSPCG allows great flexibility in the operator representation. The coefficient matrix can be passed in one of several different matrix data storage schemes. These sparse data formats allow matrices with a wide range of structures from highly structured ones such as those with all nonzeros along a relatively small number of diagonals to completely unstructured sparse matrices. Alternatively, the package allows the user to call the accelerators directly with user-supplied routines for performing certain matrix operations. In this case, one can use the data format from an application program and not be required to copy the matrix into one of the package formats. This is particularly advantageous when memory space is limited. Some of the basic preconditioners that are available are point methods such as Jacobi, Incomplete LU Decomposition and Symmetric Successive Overrelaxation as well as block and multicolor preconditioners. The user can select from a large collection of accelerators such as Conjugate Gradient (CG), Chebyshev (SI, for semi-iterative), Generalized Minimal Residual (GMRES), Biconjugate Gradient Squared (BCGS) and many others. The package is modular so that almost any accelerator can be used with almost any preconditioner.
Supercomputing on massively parallel bit-serial architectures

NASA Technical Reports Server (NTRS)

Iobst, Ken

1985-01-01

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Sparse Partial Equilibrium Tables in Chemically Resolved Reactive Flow

NASA Astrophysics Data System (ADS)

Vitello, Peter; Fried, Laurence E.; Pudliner, Brian; McAbee, Tom

2004-07-01

The detonation of an energetic material is the result of a complex interaction between kinetic chemical reactions and hydrodynamics. Unfortunately, little is known concerning the detailed chemical kinetics of detonations in energetic materials. CHEETAH uses rate laws to treat species with the slowest chemical reactions, while assuming other chemical species are in equilibrium. CHEETAH supports a wide range of elements and condensed detonation products and can also be applied to gas detonations. A sparse hash table of equation of state values is used in CHEETAH to enhance the efficiency of kinetic reaction calculations. For large-scale parallel hydrodynamic calculations, CHEETAH uses parallel communication to updates to the cache. We present here details of the sparse caching model used in the CHEETAH coupled to an ALE hydrocode. To demonstrate the efficiency of modeling using a sparse cache model we consider detonations in energetic materials.
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less
Symmetric and asymmetric capillary bridges between a rough surface and a parallel surface.

PubMed

Wang, Yongxin; Michielsen, Stephen; Lee, Hoon Joo

2013-09-03

Although the formation of a capillary bridge between two parallel surfaces has been extensively studied, the majority of research has described only symmetric capillary bridges between two smooth surfaces. In this work, an instrument was built to form a capillary bridge by squeezing a liquid drop on one surface with another surface. An analytical solution that describes the shape of symmetric capillary bridges joining two smooth surfaces has been extended to bridges that are asymmetric about the midplane and to rough surfaces. The solution, given by elliptical integrals of the first and second kind, is consistent with a constant Laplace pressure over the entire surface and has been verified for water, Kaydol, and dodecane drops forming symmetric and asymmetric bridges between parallel smooth surfaces. This solution has been applied to asymmetric capillary bridges between a smooth surface and a rough fabric surface as well as symmetric bridges between two rough surfaces. These solutions have been experimentally verified, and good agreement has been found between predicted and experimental profiles for small drops where the effect of gravity is negligible. Finally, a protocol for determining the profile from the volume and height of the capillary bridge has been developed and experimentally verified.
Sparse distributed memory overview

NASA Technical Reports Server (NTRS)

Raugh, Mike

1990-01-01

The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.
Atomic resolution characterization of a SrTiO{sub 3} grain boundary in the STEM

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGibbon, M.M.; Browning, N.D.; Chisholm, M.F.

This paper uses the complementary techniques of high resolution Z-contrast imaging and PEELS (parallel detection electron energy loss spectroscopy) to investigate the atomic structure and chemistry of a 25 degree symmetric tilt boundary in a bicrystal of the electroceramic SrTiO{sub 3}. The gain boundary is composed of two different boundary structural units which occur in about equal numbers: one which contains Ti-O columns and the other without.
A Short-Circuit Method for Networks.

ERIC Educational Resources Information Center

Ong, P. P.

1983-01-01

Describes a method of network analysis that allows avoidance of Kirchoff's Laws (providing the network is symmetrical) by reduction to simple series/parallel resistances. The method can be extended to symmetrical alternating current, capacitance or inductance if corresponding theorems are used. Symmetric cubic network serves as an example. (JM)
Accessing sparse arrays in parallel memories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Banerjee, U.; Gajski, D.; Kuck, D.

The concept of dense and sparse execution of arrays is introduced. Arrays themselves can be stored in a dense or sparse manner in a parallel memory with m memory modules. The paper proposes hardware for speeding up the execution of array operations of the form c(c/sub 0/+ci)=a(a/sub 0/+ai) op b(b/sub 0/+bi), where a/sub 0/, a, b/sub 0/, b, c/sub 0/, c are integer constants and i is an index variable. The hardware handles 'sparse execution', in which the operation op is not executed for every value of i. The hardware also makes provision for 'sparse storage', in which memory spacemore » is not provided for every array element. It is shown how to access array elements of the above form without conflict in an efficient way. The efficiency is obtained by using some specialised units which are basically smart memories with priority detection, one's counting or associative searching. Generalisation to multidimensional arrays is shown possible under restrictions defined in the paper. 12 references.« less
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
A GPU-paralleled implementation of an enhanced face recognition algorithm

NASA Astrophysics Data System (ADS)

Chen, Hao; Liu, Xiyang; Shao, Shuai; Zan, Jiguo

2013-03-01

Face recognition algorithm based on compressed sensing and sparse representation is hotly argued in these years. The scheme of this algorithm increases recognition rate as well as anti-noise capability. However, the computational cost is expensive and has become a main restricting factor for real world applications. In this paper, we introduce a GPU-accelerated hybrid variant of face recognition algorithm named parallel face recognition algorithm (pFRA). We describe here how to carry out parallel optimization design to take full advantage of many-core structure of a GPU. The pFRA is tested and compared with several other implementations under different data sample size. Finally, Our pFRA, implemented with NVIDIA GPU and Computer Unified Device Architecture (CUDA) programming model, achieves a significant speedup over the traditional CPU implementations.
SPLASH: structural pattern localization analysis by sequential histograms.

PubMed

Califano, A

2000-04-01

The discovery of sparse amino acid patterns that match repeatedly in a set of protein sequences is an important problem in computational biology. Statistically significant patterns, that is patterns that occur more frequently than expected, may identify regions that have been preserved by evolution and which may therefore play a key functional or structural role. Sparseness can be important because a handful of non-contiguous residues may play a key role, while others, in between, may be changed without significant loss of function or structure. Similar arguments may be applied to conserved DNA patterns. Available sparse pattern discovery algorithms are either inefficient or impose limitations on the type of patterns that can be discovered. This paper introduces a deterministic pattern discovery algorithm, called Splash, which can find sparse amino or nucleic acid patterns matching identically or similarly in a set of protein or DNA sequences. Sparse patterns of any length, up to the size of the input sequence, can be discovered without significant loss in performances. Splash is extremely efficient and embarrassingly parallel by nature. Large databases, such as a complete genome or the non-redundant SWISS-PROT database can be processed in a few hours on a typical workstation. Alternatively, a protein family or superfamily, with low overall homology, can be analyzed to discover common functional or structural signatures. Some examples of biologically interesting motifs discovered by Splash are reported for the histone I and for the G-Protein Coupled Receptor families. Due to its efficiency, Splash can be used to systematically and exhaustively identify conserved regions in protein family sets. These can then be used to build accurate and sensitive PSSM or HMM models for sequence analysis. Splash is available to non-commercial research centers upon request, conditional on the signing of a test field agreement. acal@us.ibm.com, Splash main page http://www.research.ibm.com/splash
The Use of Sparse Direct Solver in Vector Finite Element Modeling for Calculating Two Dimensional (2-D) Magnetotelluric Responses in Transverse Electric (TE) Mode

NASA Astrophysics Data System (ADS)

Yihaa Roodhiyah, Lisa’; Tjong, Tiffany; Nurhasan; Sutarno, D.

2018-04-01

The late research, linear matrices of vector finite element in two dimensional(2-D) magnetotelluric (MT) responses modeling was solved by non-sparse direct solver in TE mode. Nevertheless, there is some weakness which have to be improved especially accuracy in the low frequency (10-3 Hz-10-5 Hz) which is not achieved yet and high cost computation in dense mesh. In this work, the solver which is used is sparse direct solver instead of non-sparse direct solverto overcome the weaknesses of solving linear matrices of vector finite element metod using non-sparse direct solver. Sparse direct solver will be advantageous in solving linear matrices of vector finite element method because of the matrix properties which is symmetrical and sparse. The validation of sparse direct solver in solving linear matrices of vector finite element has been done for a homogen half-space model and vertical contact model by analytical solution. Thevalidation result of sparse direct solver in solving linear matrices of vector finite element shows that sparse direct solver is more stable than non-sparse direct solver in computing linear problem of vector finite element method especially in low frequency. In the end, the accuracy of 2D MT responses modelling in low frequency (10-3 Hz-10-5 Hz) has been reached out under the efficient allocation memory of array and less computational time consuming.
Traveling wave linear accelerator with RF power flow outside of accelerating cavities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dolgashev, Valery A.

A high power RF traveling wave accelerator structure includes a symmetric RF feed, an input matching cell coupled to the symmetric RF feed, a sequence of regular accelerating cavities coupled to the input matching cell at an input beam pipe end of the sequence, one or more waveguides parallel to and coupled to the sequence of regular accelerating cavities, an output matching cell coupled to the sequence of regular accelerating cavities at an output beam pipe end of the sequence, and output waveguide circuit or RF loads coupled to the output matching cell. Each of the regular accelerating cavities hasmore » a nose cone that cuts off field propagating into the beam pipe and therefore all power flows in a traveling wave along the structure in the waveguide.« less
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Design of tryptophan-containing mutants of the symmetrical Pizza protein for biophysical studies.

PubMed

Noguchi, Hiroki; Mylemans, Bram; De Zitter, Elke; Van Meervelt, Luc; Tame, Jeremy R H; Voet, Arnout

2018-03-18

β-propeller proteins are highly symmetrical, being composed of a repeated motif with four anti-parallel β-sheets arranged around a central axis. Recently we designed the first completely symmetrical β-propeller protein, Pizza6, consisting of six identical tandem repeats. Pizza6 is expected to prove a useful building block for bionanotechnology, and also a tool to investigate the folding and evolution of β-propeller proteins. Folding studies are made difficult by the high stability and the lack of buried Trp residues to act as monitor fluorophores, so we have designed and characterized several Trp-containing Pizza6 derivatives. In total four proteins were designed, of which three could be purified and characterized. Crystal structures confirm these mutant proteins maintain the expected structure, and a clear redshift of Trp fluorescence emission could be observed upon denaturation. Among the derivative proteins, Pizza6-AYW appears to be the most suitable model protein for future folding/unfolding kinetics studies as it has a comparable stability as natural β-propeller proteins. Copyright © 2018 Elsevier Inc. All rights reserved.
Aerial Observations of Symmetric Instability at the North Wall of the Gulf Stream

NASA Astrophysics Data System (ADS)

Savelyev, I.; Thomas, L. N.; Smith, G. B.; Wang, Q.; Shearman, R. K.; Haack, T.; Christman, A. J.; Blomquist, B.; Sletten, M.; Miller, W. D.; Fernando, H. J. S.

2018-01-01

An unusual spatial pattern on the ocean surface was captured by thermal airborne swaths taken across a strong sea surface temperature front at the North Wall of the Gulf Stream. The thermal pattern on the cold side of the front resembles a staircase consisting of tens of steps, each up to ˜200 m wide and up to ˜0.3°C warm. The steps are well organized, clearly separated by sharp temperature gradients, mostly parallel and aligned with the primary front. The interpretation of the airborne imagery is aided by oceanographic measurements from two research vessels. Analysis of the in situ observations indicates that the front was unstable to symmetric instability, a type of overturning instability that can generate coherent structures with similar dimensions to the temperature steps seen in the airborne imagery. It is concluded that the images capture, for the first time, the surface temperature field of symmetric instability turbulence.

Parallel algorithm of VLBI software correlator under multiprocessor environment

NASA Astrophysics Data System (ADS)

Zheng, Weimin; Zhang, Dong

2007-11-01

The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
Applications and accuracy of the parallel diagonal dominant algorithm

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1993-01-01

The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines.
Microscale assembly directed by liquid-based template.

PubMed

Chen, Pu; Luo, Zhengyuan; Güven, Sinan; Tasoglu, Savas; Ganesan, Adarsh Venkataraman; Weng, Andrew; Demirci, Utkan

2014-09-10

A liquid surface established by standing waves is used as a dynamically reconfigurable template to assemble microscale materials into ordered, symmetric structures in a scalable and parallel manner. The broad applicability of this technology is illustrated by assembling diverse materials from soft matter, rigid bodies, individual cells, cell spheroids and cell-seeded microcarrier beads. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Conjugate gradient type methods for linear systems with complex symmetric coefficient matrices

NASA Technical Reports Server (NTRS)

Freund, Roland

1989-01-01

We consider conjugate gradient type methods for the solution of large sparse linear system Ax equals b with complex symmetric coefficient matrices A equals A(T). Such linear systems arise in important applications, such as the numerical solution of the complex Helmholtz equation. Furthermore, most complex non-Hermitian linear systems which occur in practice are actually complex symmetric. We investigate conjugate gradient type iterations which are based on a variant of the nonsymmetric Lanczos algorithm for complex symmetric matrices. We propose a new approach with iterates defined by a quasi-minimal residual property. The resulting algorithm presents several advantages over the standard biconjugate gradient method. We also include some remarks on the obvious approach to general complex linear systems by solving equivalent real linear systems for the real and imaginary parts of x. Finally, numerical experiments for linear systems arising from the complex Helmholtz equation are reported.
Feature Clustering for Accelerating Parallel Coordinate Descent

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh

2012-12-06

We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Direct Observation of Parallel Folding Pathways Revealed Using a Symmetric Repeat Protein System

PubMed Central

Aksel, Tural; Barrick, Doug

2014-01-01

Although progress has been made to determine the native fold of a polypeptide from its primary structure, the diversity of pathways that connect the unfolded and folded states has not been adequately explored. Theoretical and computational studies predict that proteins fold through parallel pathways on funneled energy landscapes, although experimental detection of pathway diversity has been challenging. Here, we exploit the high translational symmetry and the direct length variation afforded by linear repeat proteins to directly detect folding through parallel pathways. By comparing folding rates of consensus ankyrin repeat proteins (CARPs), we find a clear increase in folding rates with increasing size and repeat number, although the size of the transition states (estimated from denaturant sensitivity) remains unchanged. The increase in folding rate with chain length, as opposed to a decrease expected from typical models for globular proteins, is a clear demonstration of parallel pathways. This conclusion is not dependent on extensive curve-fitting or structural perturbation of protein structure. By globally fitting a simple parallel-Ising pathway model, we have directly measured nucleation and propagation rates in protein folding, and have quantified the fluxes along each path, providing a detailed energy landscape for folding. This finding of parallel pathways differs from results from kinetic studies of repeat-proteins composed of sequence-variable repeats, where modest repeat-to-repeat energy variation coalesces folding into a single, dominant channel. Thus, for globular proteins, which have much higher variation in local structure and topology, parallel pathways are expected to be the exception rather than the rule. PMID:24988356
Fabrication of GaAs symmetric pyramidal mesas prepared by wet-chemical etching using AlAs interlayer

NASA Astrophysics Data System (ADS)

Kicin, S.; Cambel, V.; Kuliffayová, M.; Gregušová, D.; Kováčová, E.; Novák, J.; Kostič, I.; Förster, A.

2002-01-01

We present a wet-chemical-etching method developed for the preparation of GaAs four-sided pyramid-shaped mesas. The method uses a fast lateral etching of AlAs interlayer that influences the cross-sectional profiles of etched structures. We have tested the method using H3PO4:H2O2:H2O etchant for the (100) GaAs patterning. The sidewalls of the prepared pyramidal structures together with the (100) bottom facet formed the cross-sectional angles 25° and 42° for mask edges parallel, resp. perpendicular to {011} cleavage planes. For mask edges turned in 45° according to the cleavage planes, 42° cross-sectional angles were obtained. Using the method, symmetric and more than 10-μm-high GaAs "Egyptian" pyramids with smooth tilted facets were prepared.
Evolution method and ``differential hierarchy'' of colored knot polynomials

NASA Astrophysics Data System (ADS)

Mironov, A.; Morozov, A.; Morozov, And.

2013-10-01

We consider braids with repeating patterns inside arbitrary knots which provides a multi-parametric family of knots, depending on the "evolution" parameter, which controls the number of repetitions. The dependence of knot (super)polynomials on such evolution parameters is very easy to find. We apply this evolution method to study of the families of knots and links which include the cases with just two parallel and anti-parallel strands in the braid, like the ordinary twist and 2-strand torus knots/links and counter-oriented 2-strand links. When the answers were available before, they are immediately reproduced, and an essentially new example is added of the "double braid", which is a combination of parallel and anti-parallel 2-strand braids. This study helps us to reveal with the full clarity and partly investigate a mysterious hierarchical structure of the colored HOMFLY polynomials, at least, in (anti)symmetric representations, which extends the original observation for the figure-eight knot to many (presumably all) knots. We demonstrate that this structure is typically respected by the t-deformation to the superpolynomials.
Numerical Aspects of Nonhydrostatic Implementations Applied to a Parallel Finite Element Tsunami Model

NASA Astrophysics Data System (ADS)

Fuchs, A.; Androsov, A.; Harig, S.; Hiller, W.; Rakowsky, N.

2012-04-01

Based on the jeopardy of devastating tsunamis and the unpredictability of such events, tsunami modelling as part of warning systems is still a contemporary topic. The tsunami group of Alfred Wegener Institute developed the simulation tool TsunAWI as contribution to the Early Warning System in Indonesia. Although the precomputed scenarios for this purpose qualify for satisfying deliverables, the study of further improvements continues. While TsunAWI is governed by the Shallow Water Equations, an extension of the model is based on a nonhydrostatic approach. At the arrival of a tsunami wave in coastal regions with rough bathymetry, the term containing the nonhydrostatic part of pressure, that is neglected in the original hydrostatic model, gains in importance. In consideration of this term, a better approximation of the wave is expected. Differences of hydrostatic and nonhydrostatic model results are contrasted in the standard benchmark problem of a solitary wave runup on a plane beach. The observation data provided by Titov and Synolakis (1995) serves as reference. The nonhydrostatic approach implies a set of equations that are similar to the Shallow Water Equations, so the variation of the code can be implemented on top. However, this additional routines cause a lot of issues you have to cope with. So far the computations of the model were purely explicit. In the nonhydrostatic version the determination of an additional unknown and the solution of a large sparse system of linear equations is necessary. The latter constitutes the lion's share of computing time and memory requirement. Since the corresponding matrix is only symmetric in structure and not in values, an iterative Krylov Subspace Method is used, in particular the restarted Generalized Minimal Residual Algorithm GMRES(m). With regard to optimization, we present a comparison of several combinations of sequential and parallel preconditioning techniques respective number of iterations and setup/application time. Since the used software package pARMS 3.2, that provides solving and preconditioning techniques, works via MPI parallelism, in an auxiliary branch we adapted TsunAWI and switched from OpenMP to MPI with attached importance to internal partition management.
Using Perturbed QR Factorizations To Solve Linear Least-Squares Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avron, Haim; Ng, Esmond G.; Toledo, Sivan

2008-03-21

We propose and analyze a new tool to help solve sparse linear least-squares problems min{sub x} {parallel}Ax-b{parallel}{sub 2}. Our method is based on a sparse QR factorization of a low-rank perturbation {cflx A} of A. More precisely, we show that the R factor of {cflx A} is an effective preconditioner for the least-squares problem min{sub x} {parallel}Ax-b{parallel}{sub 2}, when solved using LSQR. We propose applications for the new technique. When A is rank deficient we can add rows to ensure that the preconditioner is well-conditioned without column pivoting. When A is sparse except for a few dense rows we canmore » drop these dense rows from A to obtain {cflx A}. Another application is solving an updated or downdated problem. If R is a good preconditioner for the original problem A, it is a good preconditioner for the updated/downdated problem {cflx A}. We can also solve what-if scenarios, where we want to find the solution if a column of the original matrix is changed/removed. We present a spectral theory that analyzes the generalized spectrum of the pencil (A*A,R*R) and analyze the applications.« less
Beyond union of subspaces: Subspace pursuit on Grassmann manifold for data representation

DOE PAGES

Shen, Xinyue; Krim, Hamid; Gu, Yuantao

2016-03-01

Discovering the underlying structure of a high-dimensional signal or big data has always been a challenging topic, and has become harder to tackle especially when the observations are exposed to arbitrary sparse perturbations. Here in this paper, built on the model of a union of subspaces (UoS) with sparse outliers and inspired by a basis pursuit strategy, we exploit the fundamental structure of a Grassmann manifold, and propose a new technique of pursuing the subspaces systematically by solving a non-convex optimization problem using the alternating direction method of multipliers. This problem as noted is further complicated by non-convex constraints onmore » the Grassmann manifold, as well as the bilinearity in the penalty caused by the subspace bases and coefficients. Nevertheless, numerical experiments verify that the proposed algorithm, which provides elegant solutions to the sub-problems in each step, is able to de-couple the subspaces and pursue each of them under time-efficient parallel computation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chow, Edmond

Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pieper, Andreas; Kreutzer, Moritz; Alvermann, Andreas, E-mail: alvermann@physik.uni-greifswald.de

2016-11-15

We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need formore » matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10{sup 2} innermost eigenpairs of a topological insulator matrix with dimension 10{sup 9} derived from quantum physics applications.« less
Revisiting Parallel Cyclic Reduction and Parallel Prefix-Based Algorithms for Block Tridiagonal System of Equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul

2013-01-01

Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajbhandari, Samyam; NIkam, Akshay; Lai, Pai-Wei

Tensor contractions represent the most compute-intensive core kernels in ab initio computational quantum chemistry and nuclear physics. Symmetries in these tensor contractions makes them difficult to load balance and scale to large distributed systems. In this paper, we develop an efficient and scalable algorithm to contract symmetric tensors. We introduce a novel approach that avoids data redistribution in contracting symmetric tensors while also avoiding redundant storage and maintaining load balance. We present experimental results on two parallel supercomputers for several symmetric contractions that appear in the CCSD quantum chemistry method. We also present a novel approach to tensor redistribution thatmore » can take advantage of parallel hyperplanes when the initial distribution has replicated dimensions, and use collective broadcast when the final distribution has replicated dimensions, making the algorithm very efficient.« less
SU-G-TeP1-15: Toward a Novel GPU Accelerated Deterministic Solution to the Linear Boltzmann Transport Equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, R; Fallone, B; Cross Cancer Institute, Edmonton, AB

Purpose: To develop a Graphic Processor Unit (GPU) accelerated deterministic solution to the Linear Boltzmann Transport Equation (LBTE) for accurate dose calculations in radiotherapy (RT). A deterministic solution yields the potential for major speed improvements due to the sparse matrix-vector and vector-vector multiplications and would thus be of benefit to RT. Methods: In order to leverage the massively parallel architecture of GPUs, the first order LBTE was reformulated as a second order self-adjoint equation using the Least Squares Finite Element Method (LSFEM). This produces a symmetric positive-definite matrix which is efficiently solved using a parallelized conjugate gradient (CG) solver. Themore » LSFEM formalism is applied in space, discrete ordinates is applied in angle, and the Multigroup method is applied in energy. The final linear system of equations produced is tightly coupled in space and angle. Our code written in CUDA-C was benchmarked on an Nvidia GeForce TITAN-X GPU against an Intel i7-6700K CPU. A spatial mesh of 30,950 tetrahedral elements was used with an S4 angular approximation. Results: To avoid repeating a full computationally intensive finite element matrix assembly at each Multigroup energy, a novel mapping algorithm was developed which minimized the operations required at each energy. Additionally, a parallelized memory mapping for the kronecker product between the sparse spatial and angular matrices, including Dirichlet boundary conditions, was created. Atomicity is preserved by graph-coloring overlapping nodes into separate kernel launches. The one-time mapping calculations for matrix assembly, kronecker product, and boundary condition application took 452±1ms on GPU. Matrix assembly for 16 energy groups took 556±3s on CPU, and 358±2ms on GPU using the mappings developed. The CG solver took 93±1s on CPU, and 468±2ms on GPU. Conclusion: Three computationally intensive subroutines in deterministically solving the LBTE have been formulated on GPU, resulting in two orders of magnitude speedup. Funding support from Natural Sciences and Engineering Research Council and Alberta Innovates Health Solutions. Dr. Fallone is a co-founder and CEO of MagnetTx Oncology Solutions (under discussions to license Alberta bi-planar linac MR for commercialization).« less
An O(log sup 2 N) parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix

NASA Technical Reports Server (NTRS)

Swarztrauber, Paul N.

1989-01-01

An O(log sup 2 N) parallel algorithm is presented for computing the eigenvalues of a symmetric tridiagonal matrix using a parallel algorithm for computing the zeros of the characteristic polynomial. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The exact behavior of the polynomials at the interval endpoints is used to eliminate the usual problems induced by finite precision arithmetic.
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction.

PubMed

Yang, C L; Wei, H Y; Adler, A; Soleimani, M

2013-06-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
High-Resolution DCE-MRI of the Pituitary Gland Using Radial k-Space Acquisition with Compressed Sensing Reconstruction.

PubMed

Rossi Espagnet, M C; Bangiyev, L; Haber, M; Block, K T; Babb, J; Ruggiero, V; Boada, F; Gonen, O; Fatterpekar, G M

2015-08-01

The pituitary gland is located outside of the blood-brain barrier. Dynamic T1 weighted contrast enhanced sequence is considered to be the gold standard to evaluate this region. However, it does not allow assessment of intrinsic permeability properties of the gland. Our aim was to demonstrate the utility of radial volumetric interpolated brain examination with the golden-angle radial sparse parallel technique to evaluate permeability characteristics of the individual components (anterior and posterior gland and the median eminence) of the pituitary gland and areas of differential enhancement and to optimize the study acquisition time. A retrospective study was performed in 52 patients (group 1, 25 patients with normal pituitary glands; and group 2, 27 patients with a known diagnosis of microadenoma). Radial volumetric interpolated brain examination sequences with golden-angle radial sparse parallel technique were evaluated with an ROI-based method to obtain signal-time curves and permeability measures of individual normal structures within the pituitary gland and areas of differential enhancement. Statistical analyses were performed to assess differences in the permeability parameters of these individual regions and optimize the study acquisition time. Signal-time curves from the posterior pituitary gland and median eminence demonstrated a faster wash-in and time of maximum enhancement with a lower peak of enhancement compared with the anterior pituitary gland (P < .005). Time-optimization analysis demonstrated that 120 seconds is ideal for dynamic pituitary gland evaluation. In the absence of a clinical history, differences in the signal-time curves allow easy distinction between a simple cyst and a microadenoma. This retrospective study confirms the ability of the golden-angle radial sparse parallel technique to evaluate the permeability characteristics of the pituitary gland and establishes 120 seconds as the ideal acquisition time for dynamic pituitary gland imaging. © 2015 by American Journal of Neuroradiology.
High-Resolution DCE-MRI of the Pituitary Gland Using Radial k-Space Acquisition with Compressed Sensing Reconstruction

PubMed Central

Rossi Espagnet, M.C.; Bangiyev, L.; Haber, M.; Block, K.T.; Babb, J.; Ruggiero, V.; Boada, F.; Gonen, O.; Fatterpekar, G.M.

2015-01-01

BACKGROUNDANDPURPOSE The pituitary gland is located outside of the blood-brain barrier. Dynamic T1 weighted contrast enhanced sequence is considered to be the gold standard to evaluate this region. However, it does not allow assessment of intrinsic permeability properties of the gland. Our aim was to demonstrate the utility of radial volumetric interpolated brain examination with the golden-angle radial sparse parallel technique to evaluate permeability characteristics of the individual components (anterior and posterior gland and the median eminence) of the pituitary gland and areas of differential enhancement and to optimize the study acquisition time. MATERIALS AND METHODS A retrospective study was performed in 52 patients (group 1, 25 patients with normal pituitary glands; and group 2, 27 patients with a known diagnosis of microadenoma). Radial volumetric interpolated brain examination sequences with golden-angle radial sparse parallel technique were evaluated with an ROI-based method to obtain signal-time curves and permeability measures of individual normal structures within the pituitary gland and areas of differential enhancement. Statistical analyses were performed to assess differences in the permeability parameters of these individual regions and optimize the study acquisition time. RESULTS Signal-time curves from the posterior pituitary gland and median eminence demonstrated a faster wash-in and time of maximum enhancement with a lower peak of enhancement compared with the anterior pituitary gland (P < .005). Time-optimization analysis demonstrated that 120 seconds is ideal for dynamic pituitary gland evaluation. In the absence of a clinical history, differences in the signal-time curves allow easy distinction between a simple cyst and a microadenoma. CONCLUSIONS This retrospective study confirms the ability of the golden-angle radial sparse parallel technique to evaluate the permeability characteristics of the pituitary gland and establishes 120 seconds as the ideal acquisition time for dynamic pituitary gland imaging. PMID:25953760

GPU-accelerated algorithms for compressed signals recovery with application to astronomical imagery deblurring

NASA Astrophysics Data System (ADS)

Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico

2018-04-01

Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
A Data Type for Efficient Representation of Other Data Types

NASA Technical Reports Server (NTRS)

James, Mark

2008-01-01

A self-organizing, monomorphic data type denoted a sequence has been conceived to address certain concerns that arise in programming parallel computers. A sequence in the present sense can be regarded abstractly as a vector, set, bag, queue, or other construct. Heretofore, in programming a parallel computer, it has been necessary for the programmer to state explicitly, at the outset, what parts of the program and the underlying data structures must be represented in parallel form. Not only is this requirement not optimal from the perspective of implementation; it entails an additional requirement that the programmer have intimate understanding of the underlying parallel structure. The present sequence data type overcomes both the implementation and parallel structure obstacles. In so doing, the sequence data type provides unified means by which the programmer can represent a data structure for natural and automatic decomposition to a parallel computing architecture. Sequences exhibit the behavioral and structural characteristics of vectors, but the underlying representations are automatically synthesized from combinations of programmers advice and execution use metrics. Sequences can vary bidirectionally between sparseness and density, making them excellent choices for many kinds of algorithms. The novelty and benefit of this behavior lies in the fact that it can relieve programmers of the details of implementations. The creation of a sequence enables decoupling of a conceptual representation from an implementation. The underlying representation of a sequence is a hybrid of representations composed of vectors, linked lists, connected blocks, and hash tables. The internal structure of a sequence can automatically change from time to time on the basis of how it is being used. Those portions of a sequence where elements have not been added or removed can be as efficient as vectors. As elements are inserted and removed in a given portion, then different methods are utilized to provide both an access and memory strategy that is optimized for that portion and the use to which it is put.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Analog system for computing sparse codes

DOEpatents

Rozell, Christopher John; Johnson, Don Herrick; Baraniuk, Richard Gordon; Olshausen, Bruno A.; Ortman, Robert Lowell

2010-08-24

A parallel dynamical system for computing sparse representations of data, i.e., where the data can be fully represented in terms of a small number of non-zero code elements, and for reconstructing compressively sensed images. The system is based on the principles of thresholding and local competition that solves a family of sparse approximation problems corresponding to various sparsity metrics. The system utilizes Locally Competitive Algorithms (LCAs), nodes in a population continually compete with neighboring units using (usually one-way) lateral inhibition to calculate coefficients representing an input in an over complete dictionary.
Type synthesis for 4-DOF parallel press mechanism using GF set theory

NASA Astrophysics Data System (ADS)

He, Jun; Gao, Feng; Meng, Xiangdun; Guo, Weizhong

2015-07-01

Parallel mechanisms is used in the large capacity servo press to avoid the over-constraint of the traditional redundant actuation. Currently, the researches mainly focus on the performance analysis for some specific parallel press mechanisms. However, the type synthesis and evaluation of parallel press mechanisms is seldom studied, especially for the four degrees of freedom(DOF) press mechanisms. The type synthesis of 4-DOF parallel press mechanisms is carried out based on the generalized function(GF) set theory. Five design criteria of 4-DOF parallel press mechanisms are firstly proposed. The general procedure of type synthesis of parallel press mechanisms is obtained, which includes number synthesis, symmetrical synthesis of constraint GF sets, decomposition of motion GF sets and design of limbs. Nine combinations of constraint GF sets of 4-DOF parallel press mechanisms, ten combinations of GF sets of active limbs, and eleven combinations of GF sets of passive limbs are synthesized. Thirty-eight kinds of press mechanisms are presented and then different structures of kinematic limbs are designed. Finally, the geometrical constraint complexity( GCC), kinematic pair complexity( KPC), and type complexity( TC) are proposed to evaluate the press types and the optimal press type is achieved. The general methodologies of type synthesis and evaluation for parallel press mechanism are suggested.
Analysis of Monte Carlo accelerated iterative methods for sparse linear systems: Analysis of Monte Carlo accelerated iterative methods for sparse linear systems

DOE PAGES

Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...

2017-03-05

Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
A Programming Language Supporting First-Class Parallel Environments

DTIC Science & Technology

1989-01-01

Symmetric Lisp later in the thesis. 1.5.1.2 Procedures as Data - Comparison with Lisp Classical Lisp[48, 54] has been altered and extended in many ways... manangement problems. A resource manager controls access to one or more resources shared by concurrently executing processes. Database transaction systems...symmetric languages are related to languages based on more classical models? 3. What are the kinds of uniformity that the symmetric model supports and what
Parallel solution of the symmetric tridiagonal eigenproblem. Research report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-10-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel solution of the symmetric tridiagonal eigenproblem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-01-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Multimodal sparse reconstruction in guided wave imaging of defects in plates

NASA Astrophysics Data System (ADS)

Golato, Andrew; Santhanam, Sridhar; Ahmad, Fauzia; Amin, Moeness G.

2016-07-01

A multimodal sparse reconstruction approach is proposed for localizing defects in thin plates in Lamb wave-based structural health monitoring. The proposed approach exploits both the sparsity of the defects and the multimodal nature of Lamb wave propagation in plates. It takes into account the variation of the defects' aspect angles across the various transducer pairs. At low operating frequencies, only the fundamental symmetric and antisymmetric Lamb modes emanate from a transmitting transducer. Asymmetric defects scatter these modes and spawn additional converted fundamental modes. Propagation models are developed for each of these scattered and spawned modes arriving at the various receiving transducers. This enables the construction of modal dictionary matrices spanning a two-dimensional array of pixels representing potential defect locations in the region of interest. Reconstruction of the region of interest is achieved by inverting the resulting linear model using the group sparsity constraint, where the groups extend across the various transducer pairs and the different modes. The effectiveness of the proposed approach is established with finite-element scattering simulations of the fundamental Lamb wave modes by crack-like defects in a plate. The approach is subsequently validated with experimental results obtained from an aluminum plate with asymmetric defects.
Parallel Processing of Adaptive Meshes with Load Balancing

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
Efficient Computation of Sparse Matrix Functions for Large-Scale Electronic Structure Calculations: The CheSS Library.

PubMed

Mohr, Stephan; Dawson, William; Wagner, Michael; Caliste, Damien; Nakajima, Takahito; Genovese, Luigi

2017-10-10

We present CheSS, the "Chebyshev Sparse Solvers" library, which has been designed to solve typical problems arising in large-scale electronic structure calculations using localized basis sets. The library is based on a flexible and efficient expansion in terms of Chebyshev polynomials and presently features the calculation of the density matrix, the calculation of matrix powers for arbitrary powers, and the extraction of eigenvalues in a selected interval. CheSS is able to exploit the sparsity of the matrices and scales linearly with respect to the number of nonzero entries, making it well-suited for large-scale calculations. The approach is particularly adapted for setups leading to small spectral widths of the involved matrices and outperforms alternative methods in this regime. By coupling CheSS to the DFT code BigDFT, we show that such a favorable setup is indeed possible in practice. In addition, the approach based on Chebyshev polynomials can be massively parallelized, and CheSS exhibits excellent scaling up to thousands of cores even for relatively small matrix sizes.
Determining building interior structures using compressive sensing

NASA Astrophysics Data System (ADS)

Lagunas, Eva; Amin, Moeness G.; Ahmad, Fauzia; Nájar, Montse

2013-04-01

We consider imaging of the building interior structures using compressive sensing (CS) with applications to through-the-wall imaging and urban sensing. We consider a monostatic synthetic aperture radar imaging system employing stepped frequency waveform. The proposed approach exploits prior information of building construction practices to form an appropriate sparse representation of the building interior layout. We devise a dictionary of possible wall locations, which is consistent with the fact that interior walls are typically parallel or perpendicular to the front wall. The dictionary accounts for the dominant normal angle reflections from exterior and interior walls for the monostatic imaging system. CS is applied to a reduced set of observations to recover the true positions of the walls. Additional information about interior walls can be obtained using a dictionary of possible corner reflectors, which is the response of the junction of two walls. Supporting results based on simulation and laboratory experiments are provided. It is shown that the proposed sparsifying basis outperforms the conventional through-the-wall CS model, the wavelet sparsifying basis, and the block sparse model for building interior layout detection.
Local structure preserving sparse coding for infrared target recognition

PubMed Central

Han, Jing; Yue, Jiang; Zhang, Yi; Bai, Lianfa

2017-01-01

Sparse coding performs well in image classification. However, robust target recognition requires a lot of comprehensive template images and the sparse learning process is complex. We incorporate sparsity into a template matching concept to construct a local sparse structure matching (LSSM) model for general infrared target recognition. A local structure preserving sparse coding (LSPSc) formulation is proposed to simultaneously preserve the local sparse and structural information of objects. By adding a spatial local structure constraint into the classical sparse coding algorithm, LSPSc can improve the stability of sparse representation for targets and inhibit background interference in infrared images. Furthermore, a kernel LSPSc (K-LSPSc) formulation is proposed, which extends LSPSc to the kernel space to weaken the influence of the linear structure constraint in nonlinear natural data. Because of the anti-interference and fault-tolerant capabilities, both LSPSc- and K-LSPSc-based LSSM can implement target identification based on a simple template set, which just needs several images containing enough local sparse structures to learn a sufficient sparse structure dictionary of a target class. Specifically, this LSSM approach has stable performance in the target detection with scene, shape and occlusions variations. High performance is demonstrated on several datasets, indicating robust infrared target recognition in diverse environments and imaging conditions. PMID:28323824
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

DOE PAGES

Azad, Ariful; Ballard, Grey; Buluc, Aydin; ...

2016-11-08

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achievingmore » significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.« less
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm

NASA Astrophysics Data System (ADS)

Backes, Werner; Wetzel, Susanne

In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.
Density-matrix-based algorithm for solving eigenvalue problems

NASA Astrophysics Data System (ADS)

Polizzi, Eric

2009-03-01

A fast and stable numerical algorithm for solving the symmetric eigenvalue problem is presented. The technique deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques and takes its inspiration from the contour integration and density-matrix representation in quantum mechanics. It will be shown that this algorithm—named FEAST—exhibits high efficiency, robustness, accuracy, and scalability on parallel architectures. Examples from electronic structure calculations of carbon nanotubes are presented, and numerical performances and capabilities are discussed.
Symmetrical metallic and magnetic edge states of nanoribbon from semiconductive monolayer PtS2

NASA Astrophysics Data System (ADS)

Liu, Shan; Zhu, Heyu; Liu, Ziran; Zhou, Guanghui

2018-03-01

Transition metal dichalcogenides (TMD) MoS2 or graphene could be designed to metallic nanoribbons, which always have only one edge show metallic properties due to symmetric protection. In present work, a nanoribbon with two parallel metallic and magnetic edges was designed from a noble TMD PtS2 by employing first-principles calculations based on density functional theory (DFT). Edge energy, bonding charge density, band structure, density of states (DOS) and simulated scanning tunneling microscopy (STM) of four possible edge states of monolayer semiconductive PtS2 were systematically studied. Detailed calculations show that only Pt-terminated edge state among four edge states was relatively stable, metallic and magnetic. Those metallic and magnetic properties mainly contributed from 5d orbits of Pt atoms located at edges. What's more, two of those central symmetric edges coexist in one zigzag nanoribbon, which providing two atomic metallic wires thus may have promising application for the realization of quantum effects, such as Aharanov-Bohm effect and atomic power transmission lines in single nanoribbon.
Artificial dielectric stepped-refractive-index lens for the terahertz region.

PubMed

Hernandez-Serrano, A I; Mendis, Rajind; Reichel, Kimberly S; Zhang, Wei; Castro-Camus, E; Mittleman, Daniel M

2018-02-05

In this paper we theoretically and experimentally demonstrate a stepped-refractive-index convergent lens made of a parallel stack of metallic plates for terahertz frequencies based on artificial dielectrics. The lens consist of a non-uniformly spaced stack of metallic plates, forming a mirror-symmetric array of parallel-plate waveguides (PPWGs). The operation of the device is based on the TE 1 mode of the PPWG. The effective refractive index of the TE 1 mode is a function of the frequency of operation and the spacing between the plates of the PPWG. By varying the spacing between the plates, we can modify the local refractive index of the structure in every individual PPWG that constitutes the lens producing a stepped refractive index profile across the multi stack structure. The theoretical and experimental results show that this structure is capable of focusing a 1 cm diameter beam to a line focus of less than 4 mm for the design frequency of 0.18 THz. This structure shows that this artificial-dielectric concept is an important technology for the fabrication of next generation terahertz devices.

Symmetric Absorber-Coupled Far-Infrared Microwave Kinetic Inductance Detector

NASA Technical Reports Server (NTRS)

U-yen, Kongpop (Inventor); Wollack, Edward J. (Inventor); Brown, Ari D. (Inventor); Stevenson, Thomas R. (Inventor); Patel, Amil A. (Inventor)

2016-01-01

The present invention relates to a symmetric absorber-coupled far-infrared microwave kinetic inductance detector including: a membrane having an absorber disposed thereon in a symmetric cross bar pattern; and a microstrip including a plurality of conductor microstrip lines disposed along all edges of the membrane, and separated from a ground plane by the membrane. The conducting microstrip lines are made from niobium, and the pattern is made from a superconducting material with a transition temperature below niobium, including one of aluminum, titanium nitride, or molybdenum nitride. The pattern is disposed on both a top and a bottom of the membrane, and creates a parallel-plate coupled transmission line on the membrane that acts as a half-wavelength resonator at readout frequencies. The parallel-plate coupled transmission line and the conductor microstrip lines form a stepped impedance resonator. The pattern provides identical power absorption for both horizontal and vertical polarization signals.
Automatic Camera Orientation and Structure Recovery with Samantha

NASA Astrophysics Data System (ADS)

Gherardi, R.; Toldo, R.; Garro, V.; Fusiello, A.

2011-09-01

SAMANTHA is a software capable of computing camera orientation and structure recovery from a sparse block of casual images without human intervention. It can process both calibrated images or uncalibrated, in which case an autocalibration routine is run. Pictures are organized into a hierarchical tree which has single images as leaves and partial reconstructions as internal nodes. The method proceeds bottom up until it reaches the root node, corresponding to the final result. This framework is one order of magnitude faster than sequential approaches, inherently parallel, less sensitive to the error accumulation causing drift. We have verified the quality of our reconstructions both qualitatively producing compelling point clouds and quantitatively, comparing them with laser scans serving as ground truth.
Interaction of non-radially symmetric camphor particles

NASA Astrophysics Data System (ADS)

Ei, Shin-Ichiro; Kitahata, Hiroyuki; Koyano, Yuki; Nagayama, Masaharu

2018-03-01

In this study, the interaction between two non-radially symmetric camphor particles is theoretically investigated and the equation describing the motion is derived as an ordinary differential system for the locations and the rotations. In particular, slightly modified non-radially symmetric cases from radial symmetry are extensively investigated and explicit motions are obtained. For example, it is theoretically shown that elliptically deformed camphor particles interact so as to be parallel with major axes. Such predicted motions are also checked by real experiments and numerical simulations.
A symmetrical subtraction combined with interpolated values for eliminating scattering from fluorescence EEM data

NASA Astrophysics Data System (ADS)

Xu, Jing; Liu, Xiaofei; Wang, Yutian

2016-08-01

Parallel factor analysis is a widely used method to extract qualitative and quantitative information of the analyte of interest from fluorescence emission-excitation matrix containing unknown components. Big amplitude of scattering will influence the results of parallel factor analysis. Many methods of eliminating scattering have been proposed. Each of these methods has its advantages and disadvantages. The combination of symmetrical subtraction and interpolated values has been discussed. The combination refers to both the combination of results and the combination of methods. Nine methods were used for comparison. The results show the combination of results can make a better concentration prediction for all the components.
M-step preconditioned conjugate gradient methods

NASA Technical Reports Server (NTRS)

Adams, L.

1983-01-01

Preconditioned conjugate gradient methods for solving sparse symmetric and positive finite systems of linear equations are described. Necessary and sufficient conditions are given for when these preconditioners can be used and an analysis of their effectiveness is given. Efficient computer implementations of these methods are discussed and results on the CYBER 203 and the Finite Element Machine under construction at NASA Langley Research Center are included.
Tay's syndrome: MRI.

PubMed

Porto, L; Weis, R; Schulz, C; Reichel, P; Lanfermann, H; Zanella, F E

2000-11-01

Tay's syndrome is a trichothiodystrophy associated with congenital ichthyosis. We report the findings on MRI and spectroscopy in a young girl with sparse, short, ruffled hair, dry skin and delayed milestones. T2-weighted images showed prominent diffuse confluent increase in signal symmetrically in all the supratentorial white matter. These findings are similar to those in a previously described case, and consistent with dysmyelination. Spectroscopy showed increased myoinositol and decreased choline.
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*

PubMed Central

Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.

2014-01-01

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094
The immunity-related GTPase Irga6 dimerizes in a parallel head-to-head fashion.

PubMed

Schulte, Kathrin; Pawlowski, Nikolaus; Faelber, Katja; Fröhlich, Chris; Howard, Jonathan; Daumke, Oliver

2016-03-02

The immunity-related GTPases (IRGs) constitute a powerful cell-autonomous resistance system against several intracellular pathogens. Irga6 is a dynamin-like protein that oligomerizes at the parasitophorous vacuolar membrane (PVM) of Toxoplasma gondii leading to its vesiculation. Based on a previous biochemical analysis, it has been proposed that the GTPase domains of Irga6 dimerize in an antiparallel fashion during oligomerization. We determined the crystal structure of an oligomerization-impaired Irga6 mutant bound to a non-hydrolyzable GTP analog. Contrary to the previous model, the structure shows that the GTPase domains dimerize in a parallel fashion. The nucleotides in the center of the interface participate in dimerization by forming symmetric contacts with each other and with the switch I region of the opposing Irga6 molecule. The latter contact appears to activate GTP hydrolysis by stabilizing the position of the catalytic glutamate 106 in switch I close to the active site. Further dimerization contacts involve switch II, the G4 helix and the trans stabilizing loop. The Irga6 structure features a parallel GTPase domain dimer, which appears to be a unifying feature of all dynamin and septin superfamily members. This study contributes important insights into the assembly and catalytic mechanisms of IRG proteins as prerequisite to understand their anti-microbial action.
Sparse Gaussian elimination with controlled fill-in on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Alaghband, Gita; Jordan, Harry F.

1989-01-01

It is shown that in sparse matrices arising from electronic circuits, it is possible to do computations on many diagonal elements simultaneously. A technique for obtaining an ordered compatible set directly from the ordered incompatible table is given. The ordering is based on the Markowitz number of the pivot candidates. This technique generates a set of compatible pivots with the property of generating few fills. A novel heuristic algorithm is presented that combines the idea of an order-compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. An elimination set for reducing the matrix is generated and selected on the basis of a minimum Markowitz sum number. The parallel pivoting technique presented is a stepwise algorithm and can be applied to any submatrix of the original matrix. Thus, it is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices using the HEP multiprocessor (Kowalik, 1985) are presented and analyzed.
Design, Fabrication and Characterization of a MEMS-Based Three-Dimensional Electric Field Sensor with Low Cross-Axis Coupling Interference

PubMed Central

Ling, Biyun; Peng, Chunrong; Ren, Ren; Chu, Zhaozhi; Zhang, Zhouwei; Lei, Hucheng; Xia, Shanhong

2018-01-01

One of the major concerns in the development of three-dimensional (3D) electric field sensors (EFSs) is their susceptibility to cross-axis coupling interference. The output signal for each sensing axis of a 3D EFS is often coupled by electric field components from the two other orthogonal sensing axes. In this paper, a one-dimensional (1D) electric field sensor chip (EFSC) with low cross-axis coupling interference is presented. It is designed to be symmetrical, forming a pair of in-plane symmetrically-located sensing structures. Using a difference circuit, the 1D EFSC is capable of sensing parallel electric fields along symmetrical structures and eliminating cross-axis coupling interference, which is contrast to previously reported 1D EFSCs designed for perpendicular electric field component measurement. Thus, a 3D EFS with low cross-axis coupling interference can be realized using three proposed 1D EFSCs. This 3D EFS has the advantages of low cross-axis coupling interference, small size, and high integration. The testing and calibration systems of the proposed 3D EFS were developed. Experimental results show that in the range of 0–120 kV/m, cross-axis sensitivities are within 5.48%, and the total measurement errors of this 3D EFS are within 6.16%. PMID:29543744
Development and Implementation of GPS Correlator Structures in MATLAB and Simulink with Focus on SDR Applications: Implementation of a Standard GPS Correlator Architecture (Baseline) Implementation of the MIT Quicksynch Sparse Algorithm Development and Implementation of Parallel Circular Correlator Constructs

DTIC Science & Technology

2014-05-01

software is available for a wide variety of operating systems , including Unix, FreeBSD, Linux, Solaris, Novell NetWare, OS X, Microsoft Windows, OS/2, TPF...Word for Xenix systems . Subsequent versions were later written for several other platforms including IBM PCs running DOS (1983), Apple Macintosh ...this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204
Morphological Simulation of Phase Separation Coupled Oscillation Shear and Varying Temperature Fields

NASA Astrophysics Data System (ADS)

Wang, Heping; Li, Xiaoguang; Lin, Kejun; Geng, Xingguo

2018-05-01

This paper explores the effect of the shear frequency and Prandtl number ( Pr) on the procedure and pattern formation of phase separation in symmetric and asymmetric systems. For the symmetric system, the periodic shear significantly prolongs the spinodal decomposition stage and enlarges the separated domain in domain growth stage. By adjusting the Pr and shear frequency, the number and orientation of separated steady layer structures can be controlled during domain stretch stage. The numerical results indicate that the increase in Pr and decrease in the shear frequency can significantly increase in the layer number of the lamellar structure, which relates to the decrease in domain size. Furthermore, the lamellar orientation parallel to the shear direction is altered into that perpendicular to the shear direction by further increasing the shear frequency, and also similar results for larger systems. For asymmetric system, the quantitative analysis shows that the decrease in the shear frequency enlarges the size of separated minority phases. These numerical results provide guidance for setting the optimum condition for the phase separation under periodic shear and slow cooling.
Multitasking the Davidson algorithm for the large, sparse eigenvalue problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Umar, V.M.; Fischer, C.F.

1989-01-01

The authors report how the Davidson algorithm, developed for handling the eigenvalue problem for large and sparse matrices arising in quantum chemistry, was modified for use in atomic structure calculations. To date these calculations have used traditional eigenvalue methods, which limit the range of feasible calculations because of their excessive memory requirements and unsatisfactory performance attributed to time-consuming and costly processing of zero valued elements. The replacement of a traditional matrix eigenvalue method by the Davidson algorithm reduced these limitations. Significant speedup was found, which varied with the size of the underlying problem and its sparsity. Furthermore, the range ofmore » matrix sizes that can be manipulated efficiently was expended by more than one order or magnitude. On the CRAY X-MP the code was vectorized and the importance of gather/scatter analyzed. A parallelized version of the algorithm obtained an additional 35% reduction in execution time. Speedup due to vectorization and concurrency was also measured on the Alliant FX/8.« less
Design of a space shuttle structural dynamics model

NASA Technical Reports Server (NTRS)

1972-01-01

A 1/8 scale structural dynamics model of a parallel burn space shuttle has been designed. Basic objectives were to represent the significant low frequency structural dynamic characteristics while keeping the fabrication costs low. The model was derived from the proposed Grumman Design 619 space shuttle. The design includes an orbiter, two solid rocket motors (SRM) and an external tank (ET). The ET consists of a monocoque LO2 tank an interbank skirt with three frames to accept SRM attachment members, an LH2 tank with 10 frames of which 3 provide for orbiter attachment members, and an aft skirt with on frame to provide for aft SRM attachment members. The frames designed for the SRM attachments are fitted with transverse struts to take symmetric loads.
The emergence of asymmetric normal fault systems under symmetric boundary conditions

NASA Astrophysics Data System (ADS)

Schöpfer, Martin P. J.; Childs, Conrad; Manzocchi, Tom; Walsh, John J.; Nicol, Andrew; Grasemann, Bernhard

2017-11-01

Many normal fault systems and, on a smaller scale, fracture boudinage often exhibit asymmetry with one fault dip direction dominating. It is a common belief that the formation of domino and shear band boudinage with a monoclinic symmetry requires a component of layer parallel shearing. Moreover, domains of parallel faults are frequently used to infer the presence of a décollement. Using Distinct Element Method (DEM) modelling we show, that asymmetric fault systems can emerge under symmetric boundary conditions. A statistical analysis of DEM models suggests that the fault dip directions and system polarities can be explained using a random process if the strength contrast between the brittle layer and the surrounding material is high. The models indicate that domino and shear band boudinage are unreliable shear-sense indicators. Moreover, the presence of a décollement should not be inferred on the basis of a domain of parallel faults alone.
JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure.

PubMed

Labschütz, Matthias; Bruckner, Stefan; Gröller, M Eduard; Hadwiger, Markus; Rautek, Peter

2016-01-01

Sparse volume data structures enable the efficient representation of large but sparse volumes in GPU memory for computation and visualization. However, the choice of a specific data structure for a given data set depends on several factors, such as the memory budget, the sparsity of the data, and data access patterns. In general, there is no single optimal sparse data structure, but a set of several candidates with individual strengths and drawbacks. One solution to this problem are hybrid data structures which locally adapt themselves to the sparsity. However, they typically suffer from increased traversal overhead which limits their utility in many applications. This paper presents JiTTree, a novel sparse hybrid volume data structure that uses just-in-time compilation to overcome these problems. By combining multiple sparse data structures and reducing traversal overhead we leverage their individual advantages. We demonstrate that hybrid data structures adapt well to a large range of data sets. They are especially superior to other sparse data structures for data sets that locally vary in sparsity. Possible optimization criteria are memory, performance and a combination thereof. Through just-in-time (JIT) compilation, JiTTree reduces the traversal overhead of the resulting optimal data structure. As a result, our hybrid volume data structure enables efficient computations on the GPU, while being superior in terms of memory usage when compared to non-hybrid data structures.
ML 3.0 smoothed aggregation user's guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sala, Marzio; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen

2004-05-01

ML is a multigrid preconditioning package intended to solve linear systems of equations Az = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package ormore » to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the AZTEC 2.1 and AZTECOO iterative package [15]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and non-symmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.« less
The Software Correlator of the Chinese VLBI Network

NASA Technical Reports Server (NTRS)

Zheng, Weimin; Quan, Ying; Shu, Fengchun; Chen, Zhong; Chen, Shanshan; Wang, Weihua; Wang, Guangli

2010-01-01

The software correlator of the Chinese VLBI Network (CVN) has played an irreplaceable role in the CVN routine data processing, e.g., in the Chinese lunar exploration project. This correlator will be upgraded to process geodetic and astronomical observation data. In the future, with several new stations joining the network, CVN will carry out crustal movement observations, quick UT1 measurements, astrophysical observations, and deep space exploration activities. For the geodetic or astronomical observations, we need a wide-band 10-station correlator. For spacecraft tracking, a realtime and highly reliable correlator is essential. To meet the scientific and navigation requirements of CVN, two parallel software correlators in the multiprocessor environments are under development. A high speed, 10-station prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm on a computer cluster platform is being developed. Another real-time software correlator for spacecraft tracking adopts the thread-parallel technology, and it runs on the SMP (Symmetric Multiple Processor) servers. Both correlators have the characteristic of flexible structure and scalability.
A symmetrical subtraction combined with interpolated values for eliminating scattering from fluorescence EEM data.

PubMed

Xu, Jing; Liu, Xiaofei; Wang, Yutian

2016-08-05

Parallel factor analysis is a widely used method to extract qualitative and quantitative information of the analyte of interest from fluorescence emission-excitation matrix containing unknown components. Big amplitude of scattering will influence the results of parallel factor analysis. Many methods of eliminating scattering have been proposed. Each of these methods has its advantages and disadvantages. The combination of symmetrical subtraction and interpolated values has been discussed. The combination refers to both the combination of results and the combination of methods. Nine methods were used for comparison. The results show the combination of results can make a better concentration prediction for all the components. Copyright © 2016 Elsevier B.V. All rights reserved.
Visual Tracking Based on Extreme Learning Machine and Sparse Representation

PubMed Central

Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

2015-01-01

The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359

An efficient implementation of a high-order filter for a cubed-sphere spectral element model

NASA Astrophysics Data System (ADS)

Kang, Hyun-Gyu; Cheong, Hyeong-Bin

2017-03-01

A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
A Spectral Algorithm for Envelope Reduction of Sparse Matrices

NASA Technical Reports Server (NTRS)

Barnard, Stephen T.; Pothen, Alex; Simon, Horst D.

1993-01-01

The problem of reordering a sparse symmetric matrix to reduce its envelope size is considered. A new spectral algorithm for computing an envelope-reducing reordering is obtained by associating a Laplacian matrix with the given matrix and then sorting the components of a specified eigenvector of the Laplacian. This Laplacian eigenvector solves a continuous relaxation of a discrete problem related to envelope minimization called the minimum 2-sum problem. The permutation vector computed by the spectral algorithm is a closest permutation vector to the specified Laplacian eigenvector. Numerical results show that the new reordering algorithm usually computes smaller envelope sizes than those obtained from the current standard algorithms such as Gibbs-Poole-Stockmeyer (GPS) or SPARSPAK reverse Cuthill-McKee (RCM), in some cases reducing the envelope by more than a factor of two.
MHD simulations of magnetic reconnection in a skewed three-dimensional tail configuration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Birn, J.; Hesse, M.

1991-01-01

Using the three-dimensional MHD code, the authors have studied the dynamic evolution of a non-symmetric magnetotail configuration, initiated by the sudden occurence of (anomalous) resistivity. The initial configuration included variations in all three space dimensions, consistent with average tail observations. In addition, it was skewed due to the presence of a net cross-tail magnetic field component B{sub yN} with a magnitude as typically observed, so that it lacked commonly assumed mirror symmetries around the midnight meridian and the equatorial planes. The field evolution was found to be very similar to that of a symmetric configuration studied earlier, indicating plasmoid formationmore » and ejection. The most noticeable new feature in the evolution of the individual field components is a reduction of B{sub y} on the reconnected dipole-like field lines earthward from the reconnection region. The topological structure of the magnetic field, however, defined by the field line connections, shows remarkable differences from the symmetric case, consistent with conclusions by Hughes and Sibeck (1987) and Birn et al. (1989). The plasmoid, which is a magnetically separate entity in the symmetric case, becomes open, connected initially with the Earth, but getting gradually connected with the interplanetary field, as reconnection of lobe field lines proceeds from the midnight region to the flanks of the tail. The separation of the plasmoid from the Earth is thus found to take a finite amount of time. When the plasmoid begins to separate from the Earth, a filamentary structure of field connections develops, not present in the spatial variation of the fields; this confirms predictions by Birn et al. (1989). A localization of the electric field parallel to the magnetic field is found consistent with conclusions on general magnetic reconnection.« less
Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Improved interior wall detection using designated dictionaries in compressive urban sensing problems

NASA Astrophysics Data System (ADS)

Lagunas, Eva; Amin, Moeness G.; Ahmad, Fauzia; Nájar, Montse

2013-05-01

In this paper, we address sparsity-based imaging of building interior structures for through-the-wall radar imaging and urban sensing applications. The proposed approach utilizes information about common building construction practices to form an appropriate sparse representation of the building layout. With a ground based SAR system, and considering that interior walls are either parallel or perpendicular to the exterior walls, the antenna at each position would receive reflections from the walls parallel to the radar's scan direction as well as from the corners between two meeting walls. We propose a two-step approach for wall detection and localization. In the first step, a dictionary of possible wall locations is used to recover the positions of both interior and exterior walls that are parallel to the scan direction. A follow-on step uses a dictionary of possible corner reflectors to locate wall-wall junctions along the detected wall segments, thereby determining the true wall extents and detecting walls perpendicular to the scan direction. The utility of the proposed approach is demonstrated using simulated data.
A generative model of whole-brain effective connectivity.

PubMed

Frässle, Stefan; Lomakina, Ekaterina I; Kasper, Lars; Manjaly, Zina M; Leff, Alex; Pruessmann, Klaas P; Buhmann, Joachim M; Stephan, Klaas E

2018-05-25

The development of whole-brain models that can infer effective (directed) connection strengths from fMRI data represents a central challenge for computational neuroimaging. A recently introduced generative model of fMRI data, regression dynamic causal modeling (rDCM), moves towards this goal as it scales gracefully to very large networks. However, large-scale networks with thousands of connections are difficult to interpret; additionally, one typically lacks information (data points per free parameter) for precise estimation of all model parameters. This paper introduces sparsity constraints to the variational Bayesian framework of rDCM as a solution to these problems in the domain of task-based fMRI. This sparse rDCM approach enables highly efficient effective connectivity analyses in whole-brain networks and does not require a priori assumptions about the network's connectivity structure but prunes fully (all-to-all) connected networks as part of model inversion. Following the derivation of the variational Bayesian update equations for sparse rDCM, we use both simulated and empirical data to assess the face validity of the model. In particular, we show that it is feasible to infer effective connection strengths from fMRI data using a network with more than 100 regions and 10,000 connections. This demonstrates the feasibility of whole-brain inference on effective connectivity from fMRI data - in single subjects and with a run-time below 1 min when using parallelized code. We anticipate that sparse rDCM may find useful application in connectomics and clinical neuromodeling - for example, for phenotyping individual patients in terms of whole-brain network structure. Copyright © 2018. Published by Elsevier Inc.
Folding Automaton for Trees

NASA Astrophysics Data System (ADS)

Subashini, N.; Thiagarajan, K.

2018-04-01

In this paper we observed the definition of folding technique in graph theory and we derived the corresponding automaton for trees. Also derived some propositions on symmetrical structure tree, non-symmetrical structure tree, point symmetrical structure tree, edge symmetrical structure tree along with finite number of points. This approach provides to derive one edge after n’ number of foldings.
The effect of earthquake on architecture geometry with non-parallel system irregularity configuration

NASA Astrophysics Data System (ADS)

Teddy, Livian; Hardiman, Gagoek; Nuroji; Tudjono, Sri

2017-12-01

Indonesia is an area prone to earthquake that may cause casualties and damage to buildings. The fatalities or the injured are not largely caused by the earthquake, but by building collapse. The collapse of the building is resulted from the building behaviour against the earthquake, and it depends on many factors, such as architectural design, geometry configuration of structural elements in horizontal and vertical plans, earthquake zone, geographical location (distance to earthquake center), soil type, material quality, and construction quality. One of the geometry configurations that may lead to the collapse of the building is irregular configuration of non-parallel system. In accordance with FEMA-451B, irregular configuration in non-parallel system is defined to have existed if the vertical lateral force-retaining elements are neither parallel nor symmetric with main orthogonal axes of the earthquake-retaining axis system. Such configuration may lead to torque, diagonal translation and local damage to buildings. It does not mean that non-parallel irregular configuration should not be formed on architectural design; however the designer must know the consequence of earthquake behaviour against buildings with irregular configuration of non-parallel system. The present research has the objective to identify earthquake behaviour in architectural geometry with irregular configuration of non-parallel system. The present research was quantitative with simulation experimental method. It consisted of 5 models, where architectural data and model structure data were inputted and analyzed using the software SAP2000 in order to find out its performance, and ETAB2015 to determine the eccentricity occurred. The output of the software analysis was tabulated, graphed, compared and analyzed with relevant theories. For areas of strong earthquake zones, avoid designing buildings which wholly form irregular configuration of non-parallel system. If it is inevitable to design a building with building parts containing irregular configuration of non-parallel system, make it more rigid by forming a triangle module, and use the formula.A good collaboration is needed between architects and structural experts in creating earthquake architecture.
Parallel iterative methods for sparse linear and nonlinear equations

NASA Technical Reports Server (NTRS)

Saad, Youcef

1989-01-01

As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.
Communication Optimal Parallel Multiplication of Sparse Random Matrices

DTIC Science & Technology

2013-02-21

Definition 2.1), and (2) the algorithm is sparsity- independent, where the computation is statically partitioned to processors independent of the sparsity...struc- ture of the input matrices (see Definition 2.5). The second assumption applies to nearly all existing al- gorithms for general sparse matrix-matrix...where A and B are n× n ER(d) matrices: Definition 2.1 An ER(d) matrix is an adjacency matrix of an Erdős-Rényi graph with parameters n and d/n. That
Notes on implementation of sparsely distributed memory

NASA Technical Reports Server (NTRS)

Keeler, J. D.; Denning, P. J.

1986-01-01

The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
Kanerva's sparse distributed memory with multiple hamming thresholds

NASA Technical Reports Server (NTRS)

Pohja, Seppo; Kaski, Kimmo

1992-01-01

If the stored input patterns of Kanerva's Sparse Distributed Memory (SDM) are highly correlated, utilization of the storage capacity is very low compared to the case of uniformly distributed random input patterns. We consider a variation of SDM that has a better storage capacity utilization for correlated input patterns. This approach uses a separate selection threshold for each physical storage address or hard location. The selection of the hard locations for reading or writing can be done in parallel of which SDM implementations can benefit.
Improved parallel data partitioning by nested dissection with applications to information retrieval.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Michael M.; Chevalier, Cedric; Boman, Erik Gunnar

The computational work in many information retrieval and analysis algorithms is based on sparse linear algebra. Sparse matrix-vector multiplication is a common kernel in many of these computations. Thus, an important related combinatorial problem in parallel computing is how to distribute the matrix and the vectors among processors so as to minimize the communication cost. We focus on minimizing the total communication volume while keeping the computation balanced across processes. In [1], the first two authors presented a new 2D partitioning method, the nested dissection partitioning algorithm. In this paper, we improve on that algorithm and show that it ismore » a good option for data partitioning in information retrieval. We also show partitioning time can be substantially reduced by using the SCOTCH software, and quality improves in some cases, too.« less
Asymmetry in the Farley-Buneman dispersion relation caused by parallel electric fields

NASA Astrophysics Data System (ADS)

Forsythe, Victoriya V.; Makarevich, Roman A.

2016-11-01

An implicit assumption utilized in studies of E region plasma waves generated by the Farley-Buneman instability (FBI) is that the FBI dispersion relation and its solutions for the growth rate and phase velocity are perfectly symmetric with respect to the reversal of the wave propagation component parallel to the magnetic field. In the present study, a recently derived general dispersion relation that describes fundamental plasma instabilities in the lower ionosphere including FBI is considered and it is demonstrated that the dispersion relation is symmetric only for background electric fields that are perfectly perpendicular to the magnetic field. It is shown that parallel electric fields result in significant differences between the growth rates and phase velocities for propagation of parallel components of opposite signs. These differences are evaluated using numerical solutions of the general dispersion relation and shown to exhibit an approximately linear relationship with the parallel electric field near the E region peak altitude of 110 km. An analytic expression for the differences is also derived from an approximate version of the dispersion relation, with comparisons between numerical and analytic results agreeing near 110 km. It is further demonstrated that parallel electric fields do not change the overall symmetry when the full 3-D wave propagation vector is reversed, with no symmetry seen when either the perpendicular or parallel component is reversed. The present results indicate that moderate-to-strong parallel electric fields of 0.1-1.0 mV/m can result in experimentally measurable differences between the characteristics of plasma waves with parallel propagation components of opposite polarity.
Aztec user`s guide. Version 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1995-10-01

Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less
Sparse Partial Equilibrium Tables in Chemically Resolved Reactive Flow

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vitello, P; Fried, L E; Pudliner, B

2003-07-14

The detonation of an energetic material is the result of a complex interaction between kinetic chemical reactions and hydrodynamics. Unfortunately, little is known concerning the detailed chemical kinetics of detonations in energetic materials. CHEETAH uses rate laws to treat species with the slowest chemical reactions, while assuming other chemical species are in equilibrium. CHEETAH supports a wide range of elements and condensed detonation products and can also be applied to gas detonations. A sparse hash table of equation of state values, called the ''cache'' is used in CHEETAH to enhance the efficiency of kinetic reaction calculations. For large-scale parallel hydrodynamicmore » calculations, CHEETAH uses MPI communication to updates to the cache. We present here details of the sparse caching model used in the CHEETAH. To demonstrate the efficiency of modeling using a sparse cache model we consider detonations in energetic materials.« less
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

PubMed

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amestoy, Patrick R.; Duff, Iain S.; L'Excellent, Jean-Yves

2001-10-10

We examine the mechanics of the send and receive mechanism of MPI and in particular how we can implement message passing in a robust way so that our performance is not significantly affected by changes to the MPI system. This leads us to using the Isend/Irecv protocol which will entail sometimes significant algorithmic changes. We discuss this within the context of two different algorithms for sparse Gaussian elimination that we have parallelized. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. Both algorithms are difficult to parallelize on distributed memory machines. Our initial strategiesmore » were based on simple MPI point-to-point communication primitives. With such approaches, the parallel performance of both codes are very sensitive to the MPI implementation, the way MPI internal buffers are used in particular. We then modified our codes to use more sophisticated nonblocking versions of MPI communication. This significantly improved the performance robustness (independent of the MPI buffering mechanism) and scalability, but at the cost of increased code complexity.« less
Approximate equiangular tight frames for compressed sensing and CDMA applications

NASA Astrophysics Data System (ADS)

Tsiligianni, Evaggelia; Kondi, Lisimachos P.; Katsaggelos, Aggelos K.

2017-12-01

Performance guarantees for recovery algorithms employed in sparse representations, and compressed sensing highlights the importance of incoherence. Optimal bounds of incoherence are attained by equiangular unit norm tight frames (ETFs). Although ETFs are important in many applications, they do not exist for all dimensions, while their construction has been proven extremely difficult. In this paper, we construct frames that are close to ETFs. According to results from frame and graph theory, the existence of an ETF depends on the existence of its signature matrix, that is, a symmetric matrix with certain structure and spectrum consisting of two distinct eigenvalues. We view the construction of a signature matrix as an inverse eigenvalue problem and propose a method that produces frames of any dimensions that are close to ETFs. Due to the achieved equiangularity property, the so obtained frames can be employed as spreading sequences in synchronous code-division multiple access (s-CDMA) systems, besides compressed sensing.
Atom by atom: HRTEM insights into inorganic nanotubes and fullerene-like structures

PubMed Central

Sadan, Maya Bar; Houben, Lothar; Enyashin, Andrey N.; Seifert, Gotthard; Tenne, Reshef

2008-01-01

The characterization of nanostructures down to the atomic scale is essential to understand some physical properties. Such a characterization is possible today using direct imaging methods such as aberration-corrected high-resolution transmission electron microscopy (HRTEM), when iteratively backed by advanced modeling produced by theoretical structure calculations and image calculations. Aberration-corrected HRTEM is therefore extremely useful for investigating low-dimensional structures, such as inorganic fullerene-like particles and inorganic nanotubes. The atomic arrangement in these nanostructures can lead to new insights into the growth mechanism or physical properties, where imminent commercial applications are unfolding. This article will focus on two structures that are symmetric and reproducible. The first structure that will be dealt with is the smallest stable symmetric closed-cage structure in the inorganic system, a MoS2 nanooctahedron. It is investigated by means of aberration-corrected microscopy which allowed validating the suggested DFTB-MD model. It will be shown that structures diverging from the energetically most stable structures are present in the laser ablated soot and that the alignment of the different shells is parallel, unlike the bulk material where the alignment is antiparallel. These findings correspond well with the high-energy synthetic route and they provide more insight into the growth mechanism. The second structure studied is WS2 nanotubes, which have already been shown to have a unique structure with very desirable mechanical properties. The joint HRTEM study combined with modeling reveals new information regarding the chirality of the different shells and provides a better understanding of their growth mechanism. PMID:18838681

Atom by atom: HRTEM insights into inorganic nanotubes and fullerene-like structures.

PubMed

Bar Sadan, Maya; Houben, Lothar; Enyashin, Andrey N; Seifert, Gotthard; Tenne, Reshef

2008-10-14

The characterization of nanostructures down to the atomic scale is essential to understand some physical properties. Such a characterization is possible today using direct imaging methods such as aberration-corrected high-resolution transmission electron microscopy (HRTEM), when iteratively backed by advanced modeling produced by theoretical structure calculations and image calculations. Aberration-corrected HRTEM is therefore extremely useful for investigating low-dimensional structures, such as inorganic fullerene-like particles and inorganic nanotubes. The atomic arrangement in these nanostructures can lead to new insights into the growth mechanism or physical properties, where imminent commercial applications are unfolding. This article will focus on two structures that are symmetric and reproducible. The first structure that will be dealt with is the smallest stable symmetric closed-cage structure in the inorganic system, a MoS(2) nanooctahedron. It is investigated by means of aberration-corrected microscopy which allowed validating the suggested DFTB-MD model. It will be shown that structures diverging from the energetically most stable structures are present in the laser ablated soot and that the alignment of the different shells is parallel, unlike the bulk material where the alignment is antiparallel. These findings correspond well with the high-energy synthetic route and they provide more insight into the growth mechanism. The second structure studied is WS(2) nanotubes, which have already been shown to have a unique structure with very desirable mechanical properties. The joint HRTEM study combined with modeling reveals new information regarding the chirality of the different shells and provides a better understanding of their growth mechanism.
Limiter

DOEpatents

Cohen, Samuel A.; Hosea, Joel C.; Timberlake, John R.

1986-01-01

A limiter with a specially contoured front face accommodates the various power scrape-off distances .lambda..sub.p, which depend on the parallel velocity, V.sub..parallel., of the impacting particles. The front face of the limiter (the plasma-side face) is flat with a central indentation. In addition, the limiter shape is cylindrically symmetric so that the limiter can be rotated for greater heat distribution.
Ab initio method for calculating total cross sections

NASA Technical Reports Server (NTRS)

Bhatia, A. K.; Schneider, B. I.; Temkin, A.

1993-01-01

A method for calculating total cross sections without formally including nonelastic channels is presented. The idea is to use a one channel T-matrix variational principle with a complex correlation function. The derived T matrix is therefore not unitary. Elastic scattering is calculated from T-parallel-squared, but total scattering is derived from the imaginary part of T using the optical theorem. The method is applied to the spherically symmetric model of electron-hydrogen scattering. No spurious structure arises; results for sigma(el) and sigma(total) are in excellent agreement with calculations of Callaway and Oza (1984). The method has wide potential applicability.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Bit error rate tester using fast parallel generation of linear recurring sequences

DOEpatents

Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.

2003-05-06

A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Evaluation of generalized degrees of freedom for sparse estimation by replica method

NASA Astrophysics Data System (ADS)

Sakata, A.

2016-12-01

We develop a method to evaluate the generalized degrees of freedom (GDF) for linear regression with sparse regularization. The GDF is a key factor in model selection, and thus its evaluation is useful in many modelling applications. An analytical expression for the GDF is derived using the replica method in the large-system-size limit with random Gaussian predictors. The resulting formula has a universal form that is independent of the type of regularization, providing us with a simple interpretation. Within the framework of replica symmetric (RS) analysis, GDF has a physical meaning as the effective fraction of non-zero components. The validity of our method in the RS phase is supported by the consistency of our results with previous mathematical results. The analytical results in the RS phase are calculated numerically using the belief propagation algorithm.
Three-dimensional wideband electromagnetic modeling on massively parallel computers

NASA Astrophysics Data System (ADS)

Alumbaugh, David L.; Newman, Gregory A.; Prevost, Lydie; Shadid, John N.

1996-01-01

A method is presented for modeling the wideband, frequency domain electromagnetic (EM) response of a three-dimensional (3-D) earth to dipole sources operating at frequencies where EM diffusion dominates the response (less than 100 kHz) up into the range where propagation dominates (greater than 10 MHz). The scheme employs the modified form of the vector Helmholtz equation for the scattered electric fields to model variations in electrical conductivity, dielectric permitivity and magnetic permeability. The use of the modified form of the Helmholtz equation allows for perfectly matched layer ( PML) absorbing boundary conditions to be employed through the use of complex grid stretching. Applying the finite difference operator to the modified Helmholtz equation produces a linear system of equations for which the matrix is sparse and complex symmetrical. The solution is obtained using either the biconjugate gradient (BICG) or quasi-minimum residual (QMR) methods with preconditioning; in general we employ the QMR method with Jacobi scaling preconditioning due to stability. In order to simulate larger, more realistic models than has been previously possible, the scheme has been modified to run on massively parallel (MP) computer architectures. Execution on the 1840-processor Intel Paragon has indicated a maximum model size of 280 × 260 × 200 cells with a maximum flop rate of 14.7 Gflops. Three different geologic models are simulated to demonstrate the use of the code for frequencies ranging from 100 Hz to 30 MHz and for different source types and polarizations. The simulations show that the scheme is correctly able to model the air-earth interface and the jump in the electric and magnetic fields normal to discontinuities. For frequencies greater than 10 MHz, complex grid stretching must be employed to incorporate absorbing boundaries while below this normal (real) grid stretching can be employed.
A novel L-shaped linear ultrasonic motor operating in a single resonance mode

NASA Astrophysics Data System (ADS)

Zhang, Bailiang; Yao, Zhiyuan; Liu, Zhen; Li, Xiaoniu

2018-01-01

In this study, a large thrust linear ultrasonic motor using an L-shaped stator is described. The stator is constructed by two mutually perpendicular rectangular plate vibrators, one of which is mounted in parallel with the slider to make the motor structure to be more compact. The symmetric and antisymmetric modes of the stator based on the first order bending vibration of two vibrators are adopted, in which each resonance mode is assigned to drive the slider in one direction. The placement of piezoelectric ceramics in a stator could be determined by finite element analysis, and the influence of slots in the head block on the vibration amplitudes of driving foot was studied as well. Three types of prototypes (non-slotted, dual-slot, and single-slot) were fabricated and experimentally investigated. Experimental results demonstrated that the prototype with one slot exhibited the best mechanical output performance. The maximum loads under the excitation of symmetric mode and antisymmetric mode were 65 and 90 N, respectively.
A novel L-shaped linear ultrasonic motor operating in a single resonance mode.

PubMed

Zhang, Bailiang; Yao, Zhiyuan; Liu, Zhen; Li, Xiaoniu

2018-01-01

In this study, a large thrust linear ultrasonic motor using an L-shaped stator is described. The stator is constructed by two mutually perpendicular rectangular plate vibrators, one of which is mounted in parallel with the slider to make the motor structure to be more compact. The symmetric and antisymmetric modes of the stator based on the first order bending vibration of two vibrators are adopted, in which each resonance mode is assigned to drive the slider in one direction. The placement of piezoelectric ceramics in a stator could be determined by finite element analysis, and the influence of slots in the head block on the vibration amplitudes of driving foot was studied as well. Three types of prototypes (non-slotted, dual-slot, and single-slot) were fabricated and experimentally investigated. Experimental results demonstrated that the prototype with one slot exhibited the best mechanical output performance. The maximum loads under the excitation of symmetric mode and antisymmetric mode were 65 and 90 N, respectively.
The "Fermi hole" and the correlation introduced by the symmetrization or the anti-symmetrization of the wave function.

PubMed

Giner, Emmanuel; Tenti, Lorenzo; Angeli, Celestino; Malrieu, Jean-Paul

2016-09-28

The impact of the antisymmetrization is often addressed as a local property of the many-electron wave function, namely that the wave function should vanish when two electrons with parallel spins are in the same position in space. In this paper, we emphasize that this presentation is unduly restrictive: we illustrate the strong non-local character of the antisymmetrization principle, together with the fact that it is a matter of spin symmetry rather than spin parallelism. To this aim, we focus our attention on the simplest representation of various states of two-electron systems, both in atomic (helium atom) and molecular (H 2 and the π system of the ethylene molecule) cases. We discuss the non-local property of the nodal structure of some two-electron wave functions, both using analytical derivations and graphical representations of cuttings of the nodal hypersurfaces. The attention is then focussed on the impact of the antisymmetrization on the maxima of the two-body density, and we show that it introduces strong correlation effects (radial and/or angular) with a non-local character. These correlation effects are analyzed in terms of inflation and depletion zones, which are easily identifiable, thanks to the nodes of the orbitals composing the wave function. Also, we show that the correlation effects induced by the antisymmetrization occur also for anti-parallel spins since all M s components of a given spin state have the same N-body densities. Finally, we illustrate that these correlation effects occur also for the singlet states, but they have strictly opposite impacts: the inflation zones in the triplet become depletion zones in the singlet and vice versa.
Raytracing and Direct-Drive Targets

NASA Astrophysics Data System (ADS)

Schmitt, Andrew J.; Bates, Jason; Fyfe, David; Eimerl, David

2013-10-01

Accurate simulation of the effects of laser imprinting and drive asymmetries in directly driven targets requires the ability to distinguish between raytrace noise and the intensity structure produced by the spatial and temporal incoherence of optical smoothing. We have developed and implemented a smoother raytrace algorithm for our mpi-parallel radiation hydrodynamics code, FAST3D. The underlying approach is to connect the rays into either sheets (in 2D) or volume-enclosing chunks (in 3D) so that the absorbed energy distribution continuously covers the propagation area illuminated by the laser. We will describe the status and show the different scalings encountered in 2D and 3D problems as the computational size, parallelization strategy, and number of rays is varied. Finally, we show results using the method in current NIKE experimental target simulations and in proposed symmetric and polar direct-drive target designs. Supported by US DoE/NNSA.
Communication-avoiding symmetric-indefinite factorization

DOE PAGES

Ballard, Grey Malone; Becker, Dulcenia; Demmel, James; ...

2014-11-13

We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen's triangular tridiagonalization. It factors a dense symmetric matrix A as the product A=PLTL TP T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. Asmore » a result, the current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal.« less
Communication-avoiding symmetric-indefinite factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ballard, Grey Malone; Becker, Dulcenia; Demmel, James

We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen's triangular tridiagonalization. It factors a dense symmetric matrix A as the product A=PLTL TP T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. Asmore » a result, the current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal.« less
Symmetrized density matrix renormalization group algorithm for low-lying excited states of conjugated carbon systems: Application to 1,12-benzoperylene and polychrysene

NASA Astrophysics Data System (ADS)

Prodhan, Suryoday; Ramasesha, S.

2018-05-01

The symmetry adapted density matrix renormalization group (SDMRG) technique has been an efficient method for studying low-lying eigenstates in one- and quasi-one-dimensional electronic systems. However, the SDMRG method had bottlenecks involving the construction of linearly independent symmetry adapted basis states as the symmetry matrices in the DMRG basis were not sparse. We have developed a modified algorithm to overcome this bottleneck. The new method incorporates end-to-end interchange symmetry (C2) , electron-hole symmetry (J ) , and parity or spin-flip symmetry (P ) in these calculations. The one-to-one correspondence between direct-product basis states in the DMRG Hilbert space for these symmetry operations renders the symmetry matrices in the new basis with maximum sparseness, just one nonzero matrix element per row. Using methods similar to those employed in the exact diagonalization technique for Pariser-Parr-Pople (PPP) models, developed in the 1980s, it is possible to construct orthogonal SDMRG basis states while bypassing the slow step of the Gram-Schmidt orthonormalization procedure. The method together with the PPP model which incorporates long-range electronic correlations is employed to study the correlated excited-state spectra of 1,12-benzoperylene and a narrow mixed graphene nanoribbon with a chrysene molecule as the building unit, comprising both zigzag and cove-edge structures.
PT -symmetric gain and loss in a rotating Bose-Einstein condensate

NASA Astrophysics Data System (ADS)

Haag, Daniel; Dast, Dennis; Cartarius, Holger; Wunner, Günter

2018-03-01

PT -symmetric quantum mechanics allows finding stationary states in mean-field systems with balanced gain and loss of particles. In this work we apply this method to rotating Bose-Einstein condensates with contact interaction which are known to support ground states with vortices. Due to the particle exchange with the environment transport phenomena through ultracold gases with vortices can be studied. We find that even strongly interacting rotating systems support stable PT -symmetric ground states, sustaining a current parallel and perpendicular to the vortex cores. The vortices move through the nonuniform particle density and leave or enter the condensate through its borders creating the required net current.
Limiter

DOEpatents

Cohen, S.A.; Hosea, J.C.; Timberlake, J.R.

1984-10-19

A limiter with a specially contoured front face is provided. The front face of the limiter (the plasma-side face) is flat with a central indentation. In addition, the limiter shape is cylindrically symmetric so that the limiter can be rotated for greater heat distribution. This limiter shape accommodates the various power scrape-off distances lambda p, which depend on the parallel velocity, V/sub parallel/, of the impacting particles.
A possibility of parallel and anti-parallel diffraction measurements on neu- tron diffractometer employing bent perfect crystal monochromator at the monochromatic focusing condition

NASA Astrophysics Data System (ADS)

Choi, Yong Nam; Kim, Shin Ae; Kim, Sung Kyu; Kim, Sung Baek; Lee, Chang-Hee; Mikula, Pavel

2004-07-01

In a conventional diffractometer having single monochromator, only one position, parallel position, is used for the diffraction experiment (i.e. detection) because the resolution property of the other one, anti-parallel position, is very poor. However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the Delta d/d measured on three diffraction geometries (symmetric, asymmetric compression and asymmetric expansion), we can conclude that the simultaneous diffraction measurement in both parallel and anti-parallel positions can be achieved.
Killing and Noether Symmetries of Plane Symmetric Spacetime

NASA Astrophysics Data System (ADS)

Shamir, M. Farasat; Jhangeer, Adil; Bhatti, Akhlaq Ahmad

2013-09-01

This paper is devoted to investigate the Killing and Noether symmetries of static plane symmetric spacetime. For this purpose, five different cases have been discussed. The Killing and Noether symmetries of Minkowski spacetime in cartesian coordinates are calculated as a special case and it is found that Lie algebra of the Lagrangian is 10 and 17 dimensional respectively. The symmetries of Taub's universe, anti-deSitter universe, self similar solutions of infinite kind for parallel perfect fluid case and self similar solutions of infinite kind for parallel dust case are also explored. In all the cases, the Noether generators are calculated in the presence of gauge term. All these examples justify the conjecture that Killing symmetries form a subalgebra of Noether symmetries (Bokhari et al. in Int. J. Theor. Phys. 45:1063, 2006).
A symmetric version of the generalized alternating direction method of multipliers for two-block separable convex programming.

PubMed

Liu, Jing; Duan, Yongrui; Sun, Min

2017-01-01

This paper introduces a symmetric version of the generalized alternating direction method of multipliers for two-block separable convex programming with linear equality constraints, which inherits the superiorities of the classical alternating direction method of multipliers (ADMM), and which extends the feasible set of the relaxation factor α of the generalized ADMM to the infinite interval [Formula: see text]. Under the conditions that the objective function is convex and the solution set is nonempty, we establish the convergence results of the proposed method, including the global convergence, the worst-case [Formula: see text] convergence rate in both the ergodic and the non-ergodic senses, where k denotes the iteration counter. Numerical experiments to decode a sparse signal arising in compressed sensing are included to illustrate the efficiency of the new method.
Signal processing using sparse derivatives with applications to chromatograms and ECG

NASA Astrophysics Data System (ADS)

Ning, Xiaoran

In this thesis, we investigate the sparsity exist in the derivative domain. Particularly, we focus on the type of signals which posses up to Mth (M > 0) order sparse derivatives. Efforts are put on formulating proper penalty functions and optimization problems to capture properties related to sparse derivatives, searching for fast, computationally efficient solvers. Also the effectiveness of these algorithms are applied to two real world applications. In the first application, we provide an algorithm which jointly addresses the problems of chromatogram baseline correction and noise reduction. The series of chromatogram peaks are modeled as sparse with sparse derivatives, and the baseline is modeled as a low-pass signal. A convex optimization problem is formulated so as to encapsulate these non-parametric models. To account for the positivity of chromatogram peaks, an asymmetric penalty function is also utilized with symmetric penalty functions. A robust, computationally efficient, iterative algorithm is developed that is guaranteed to converge to the unique optimal solution. The approach, termed Baseline Estimation And Denoising with Sparsity (BEADS), is evaluated and compared with two state-of-the-art methods using both simulated and real chromatogram data. Promising result is obtained. In the second application, a novel Electrocardiography (ECG) enhancement algorithm is designed also based on sparse derivatives. In the real medical environment, ECG signals are often contaminated by various kinds of noise or artifacts, for example, morphological changes due to motion artifact, non-stationary noise due to muscular contraction (EMG), etc. Some of these contaminations severely affect the usefulness of ECG signals, especially when computer aided algorithms are utilized. By solving the proposed convex l1 optimization problem, artifacts are reduced by modeling the clean ECG signal as a sum of two signals whose second and third-order derivatives (differences) are sparse respectively. At the end, the algorithm is applied to a QRS detection system and validated using the MIT-BIH Arrhythmia database (109452 anotations), resulting a sensitivity of Se = 99.87%$ and a positive prediction of +P = 99.88%.

A folding-dependent mechanism of antimicrobial peptide resistance to degradation unveiled by solution structure of distinctin

PubMed Central

Raimondo, Domenico; Andreotti, Giuseppina; Saint, Nathalie; Amodeo, Pietro; Renzone, Giovanni; Sanseverino, Marina; Zocchi, Ivana; Molle, Gerard; Motta, Andrea; Scaloni, Andrea

2005-01-01

Many bioactive peptides, presenting an unstructured conformation in aqueous solution, are made resistant to degradation by posttranslational modifications. Here, we describe how molecular oligomerization in aqueous solution can generate a still unknown transport form for amphipathic peptides, which is more compact and resistant to proteases than forms related to any possible monomer. This phenomenon emerged from 3D structure, function, and degradation properties of distinctin, a heterodimeric antimicrobial compound consisting of two peptide chains linked by a disulfide bond. After homodimerization in water, this peptide exhibited a fold consisting of a symmetrical full-parallel four-helix bundle, with a well secluded hydrophobic core and exposed basic residues. This fold significantly stabilizes distinctin against proteases compared with other linear amphipathic peptides, without affecting its antimicrobial, hemolytic, and ion-channel formation properties after membrane interaction. This full-parallel helical orientation represents a perfect compromise between formation of a stable structure in water and requirement of a drastic structural rearrangement in membranes to elicit antimicrobial potential. Thus, distinctin can be claimed as a prototype of a previously unrecognized class of antimicrobial derivatives. These results suggest a critical revision of the role of peptide oligomerization whenever solubility or resistance to proteases is known to affect biological properties. PMID:15840728
Crystal structure of a new amine nitrate: 4-dimethylaminopyridinium nitrate (C{sub 7}H{sub 11}N{sub 2})NO{sub 3}

DOE Office of Scientific and Technical Information (OSTI.GOV)

Benhassan, D., E-mail: houcine-naili@yahoo.com; Rekik, W.; Naïli, H.

2015-12-15

The title compound (C{sub 7}H{sub 11}N{sub 2})NO{sub 3} (I) was obtained by the slow evaporation method at room temperature. Its crystal structure consists of organic cations (C{sub 7}H{sub 11}N{sub 2}){sup +} and nitrate anions (NO{sub 3}){sup –} linked by two types of hydrogen bonds. Each monoprotonated nitrogen atom, called bifurcated, is engaged in two N–H···O hydrogen bonds with two symmetric oxygen atoms. In addition, the crystal structure stability is established by C–H···O hydrogen bonds that ensure the formation of infinite layers, parallel to (001) plane. These layers are related together through π···π interactions established between aromatic amines.
A survey of packages for large linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Kesheng; Milne, Brent

2000-02-11

This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

PubMed

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
Design of a MIMD neural network processor

NASA Astrophysics Data System (ADS)

Saeks, Richard E.; Priddy, Kevin L.; Pap, Robert M.; Stowell, S.

1994-03-01

The Accurate Automation Corporation (AAC) neural network processor (NNP) module is a fully programmable multiple instruction multiple data (MIMD) parallel processor optimized for the implementation of neural networks. The AAC NNP design fully exploits the intrinsic sparseness of neural network topologies. Moreover, by using a MIMD parallel processing architecture one can update multiple neurons in parallel with efficiency approaching 100 percent as the size of the network increases. Each AAC NNP module has 8 K neurons and 32 K interconnections and is capable of 140,000,000 connections per second with an eight processor array capable of over one billion connections per second.
Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas

In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

DOE PAGES

Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas; ...

2016-01-06

In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Solvers for $$\\mathcal{O} (N)$$ Electronic Structure in the Strong Scaling Limit

DOE PAGES

Bock, Nicolas; Challacombe, William M.; Kale, Laxmikant

2016-01-26

Here we present a hybrid OpenMP/Charm\\tt++ framework for solving themore » $$\\mathcal{O} (N)$$ self-consistent-field eigenvalue problem with parallelism in the strong scaling regime, $$P\\gg{N}$$, where $P$ is the number of cores, and $N$ is a measure of system size, i.e., the number of matrix rows/columns, basis functions, atoms, molecules, etc. This result is achieved with a nested approach to spectral projection and the sparse approximate matrix multiply [Bock and Challacombe, SIAM J. Sci. Comput., 35 (2013), pp. C72--C98], and involves a recursive, task-parallel algorithm, often employed by generalized $N$-Body solvers, to occlusion and culling of negligible products in the case of matrices with decay. Lastly, employing classic technologies associated with generalized $N$-Body solvers, including overdecomposition, recursive task parallelism, orderings that preserve locality, and persistence-based load balancing, we obtain scaling beyond hundreds of cores per molecule for small water clusters ([H$${}_2$$O]$${}_N$$, $$N \\in \\{ 30, 90, 150 \\}$$, $$P/N \\approx \\{ 819, 273, 164 \\}$$) and find support for an increasingly strong scalability with increasing system size $N$.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE PAGES

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

2017-09-21

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, Ajit; Khalil, Mohammad; Pettit, Chris

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Parallel Implementation of the Discontinuous Galerkin Method

NASA Technical Reports Server (NTRS)

Baggag, Abdalkader; Atkins, Harold; Keyes, David

1999-01-01

This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
Investigation of wall-bounded turbulence over regularly distributed roughness

NASA Astrophysics Data System (ADS)

Placidi, Marco; Ganapathisubramani, Bharathram

2012-11-01

The effects of regularly distributed roughness elements on the structure of a turbulent boundary layer are examined by performing a series of Planar (high resolution l+ ~ 30) and Stereoscopic Particle Image Velocimetry (PIV) experiments in a wind tunnel. An adequate description of how to best characterise a rough wall, especially one where the density of roughness elements is sparse, is yet to be developed. In this study, rough surfaces consisting of regularly and uniformly distributed LEGO® blocks are used. Twelve different patterns are adopted in order to systematically examine the effects of frontal solidity (λf, frontal area of the roughness elements per unit wall-parallel area) and plan solidity (λp, plan area of roughness elements per unit wall-parallel area), on the turbulence structure. The Karman number, Reτ , is approximately 4000 across the different cases. Spanwise 3D vector fields at two different wall-normal locations (top of the canopy and within the log-region) are also compared to examine the spanwise homogeneity of the flow across different surfaces. In the talk, a detailed analysis of mean and rms velocity profiles, Reynolds stresses, and quadrant decomposition for the different patterns will be presented.
Graph cuts via l1 norm minimization.

PubMed

Bhusnurmath, Arvind; Taylor, Camillo J

2008-10-01

Graph cuts have become an increasingly important tool for solving a number of energy minimization problems in computer vision and other fields. In this paper, the graph cut problem is reformulated as an unconstrained l1 norm minimization that can be solved effectively using interior point methods. This reformulation exposes connections between the graph cuts and other related continuous optimization problems. Eventually the problem is reduced to solving a sequence of sparse linear systems involving the Laplacian of the underlying graph. The proposed procedure exploits the structure of these linear systems in a manner that is easily amenable to parallel implementations. Experimental results obtained by applying the procedure to graphs derived from image processing problems are provided.
Recursive inverse factorization.

PubMed

Rubensson, Emanuel H; Bock, Nicolas; Holmström, Erik; Niklasson, Anders M N

2008-03-14

A recursive algorithm for the inverse factorization S(-1)=ZZ(*) of Hermitian positive definite matrices S is proposed. The inverse factorization is based on iterative refinement [A.M.N. Niklasson, Phys. Rev. B 70, 193102 (2004)] combined with a recursive decomposition of S. As the computational kernel is matrix-matrix multiplication, the algorithm can be parallelized and the computational effort increases linearly with system size for systems with sufficiently sparse matrices. Recent advances in network theory are used to find appropriate recursive decompositions. We show that optimization of the so-called network modularity results in an improved partitioning compared to other approaches. In particular, when the recursive inverse factorization is applied to overlap matrices of irregularly structured three-dimensional molecules.
Longitudinal disordering of vortex lattices in anisotropic superconductors

NASA Astrophysics Data System (ADS)

Harshman, D. R.; Brandt, E. H.; Fiory, A. T.; Inui, M.; Mitzi, D. B.; Schneemeyer, L. F.; Waszczak, J. V.

1993-02-01

Vortex disordering in superconducting crystals is shown to be markedly sensitive to penetration-depth anisotropy. At low temperature and high magnetic field, the muon-spin-rotation spectra for the highly anisotropic Bi2Sr2CaCu2O8+δ material are found to be anomalously narrow and symmetric about the applied field, in a manner consistent with a layered vortex sublattice structure with pinning-induced misalignment between layers. In contrast, spectra for the less-anisotropic YBa2Cu3O7-δ compounds taken at comparable fields are broader and asymmetric, showing that the vortex lattices are aligned parallel to the applied-field direction.
Effect of dividing daylight in symmetric prismatic daylight collector

NASA Astrophysics Data System (ADS)

Yeh, Shih-Chuan; Lu, Ju-Lin; Cheng, Yu-Chin

2017-04-01

This paper presented a symmetric prismatic daylight collector to collect daylight for the natural light illumination system. We analyzed the characteristics of the emerging light when the parallel light beam illuminate on the horizontally placed symmetric prismatic daylight collector. The ratio of the relative intensities of collected daylight that emerging from each surface of the daylight collector shown that the ratio is varied with the incident angle during a day. The simulation of the emerging light of the daylight collector shown that the ratio of emerging light is varied with the tilted angle when sunshine illuminated on a symmetric prismatic daylight collector which was not placed horizontally. The integration of normalized intensity is also varied with the tilted angle. The symmetric prismatic daylight collector with the benefits of reducing glare and dividing intensity of incident daylight, it is applicable to using in the natural light illumination system and hybrid system for improving the efficiency of utilizing of solar energy.
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Exploratory flow visualization investigation of mast-mounted sights in presence of a rotor

NASA Technical Reports Server (NTRS)

Ghee, Terence A.; Kelley, Henry L.

1995-01-01

A flow visualization investigation with a laser light sheet system was conducted on a 27-percent-scale AH-64 attack helicopter model fitted with two mast-mounted sights in the langley 14- by 22-foot subsonic tunnel. The investigation was conducted to identify aerodynamic phenomena that may have contributed to adverse vibration encountered during full-scale flight of the AH-64D apache/longbow helicopter with an asymmetric mast-mounted sight. Symmetric and asymmetric mast-mounted sights oriented at several skew angles were tested at simulated forward and rearward flight speeds of 30 and 45 knots. A laser light sheet system was used to visualize the flow in planes parallel to and perpendicular to the free-stream flow. Analysis of these flow visualization data identified frequencies of flow patterns in the wake shed from the sight, the streamline angle at the sight, and the location where the shed wake crossed the rotor plane. Differences in wake structure were observed between the sight configurations and various skew angles. Analysis of lateral light sheet plane data implied significant vortex structure in the wake of the asymmetric mast-mounted sight in the configuration that produced maximum in-flight vibration. The data showed no significant vortex structure in the wake of the asymmetric and symmetric configurations that produced no increase in in-flight adverse vibration.
Semiconductor laser devices having lateral refractive index tailoring

DOEpatents

Ashby, Carol I. H.; Hadley, G. Ronald; Hohimer, John P.; Owyoung, Adelbert

1990-01-01

A broad-area semiconductor laser diode includes an active lasing region interposed between an upper and a lower cladding layer, the laser diode further comprising structure for controllably varying a lateral refractive index profile of the diode to substantially compensate for an effect of junction heating during operation. In embodiments disclosed the controlling structure comprises resistive heating strips or non-radiative linear junctions disposed parallel to the active region. Another embodiment discloses a multi-layered upper cladding region selectively disordered by implanted or diffused dopant impurities. Still another embodiment discloses an upper cladding layer of variable thickness that is convex in shape and symmetrically disposed about a central axis of the active region. The teaching of the invention is also shown to be applicable to arrays of semiconductor laser diodes.
A language comparison for scientific computing on MIMD architectures

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.

1989-01-01

Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.

A parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix

NASA Technical Reports Server (NTRS)

Swarztrauber, Paul N.

1993-01-01

A parallel algorithm, called polysection, is presented for computing the eigenvalues of a symmetric tridiagonal matrix. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The signs of the polynomials at the interval endpoints are determined a priori and used to guarantee that all zeros are found. The use of finite-precision arithmetic may result in multiple zeros; however, in this case, the intervals coalesce and their number determines exactly the multiplicity of the zero. For an N x N matrix the eigenvalues can be determined in O(log-squared N) time with N-squared processors and O(N) time with N processors. The method is compared with a parallel variant of bisection that requires O(N-squared) time on a single processor, O(N) time with N processors, and O(log N) time with N-squared processors.
Three-dimensional Finite Element Formulation and Scalable Domain Decomposition for High Fidelity Rotor Dynamic Analysis

NASA Technical Reports Server (NTRS)

Datta, Anubhav; Johnson, Wayne R.

2009-01-01

This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
PAM4 based symmetrical 112-Gbps long-reach TWDM-PON

NASA Astrophysics Data System (ADS)

Wu, Liyu; Gao, Fan; Zhang, Minming; Fu, Songnian; Deng, Lei; Choi, Michael; Chang, Donald; Lei, Gordon K. P.; Liu, Deming

2018-02-01

We experimentally demonstrate cost effective symmetrical 112-Gbps long-reach passive optical network (LR-PON) over 70-km standard signal mode fiber (SSMF), based on pulse amplitude modulation (PAM)-4. Four 10G-class directly modulated lasers (DMLs) at C-band are used for achieving 4 × 28-Gbps downstream transmission, while two 18G-class DMLs at O-band are used to realize 2 × 56-Gbps upstream transmission, without any optical amplification in optical distributed network (ODN). Both dispersion compensation fiber (DCF) for downstream signal and praseodymium-doped fiber amplifier (PDFA) for upstream signal are equipped at optical line terminal (OLT). Meanwhile, sparse Volterra filter (SVF) equalizer is proposed to mitigate the transmission impairments with substantial reduction of computation complexity. Finally, we can successfully provide a loss budget of 33 dB per downstream wavelength channel, indicating of 64 optical network units (ONUs) with more than 1.25 Gbps per ONU.
FWT2D: A massively parallel program for frequency-domain full-waveform tomography of wide-aperture seismic data—Part 1: Algorithm

NASA Astrophysics Data System (ADS)

Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves

2009-03-01

This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Chiou, Jin-Chern

1990-01-01

Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
Long waves in parallel flow in Hele-Shaw cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zeybek, M.; Yortsos, Y.C.

1991-09-09

The evolution of fluid interfaces in parallel flow in Hele-Shaw cells is studied theoretically and experimentally in the limit of large capillary number. It is shown that such interfaces support wave motion, the amplitude of which for long waves is governed by a set of Korteweg--de Vries and Airy equations. Experiments conducted in a long Hele-Shaw cell validate the theory in the symmetric case.
Efficient, massively parallel eigenvalue computation

NASA Technical Reports Server (NTRS)

Huo, Yan; Schreiber, Robert

1993-01-01

In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Efficient Implementation of an Optimal Interpolator for Large Spatial Data Sets

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.

2007-01-01

Scattered data interpolation is a problem of interest in numerous areas such as electronic imaging, smooth surface modeling, and computational geometry. Our motivation arises from applications in geology and mining, which often involve large scattered data sets and a demand for high accuracy. The method of choice is ordinary kriging. This is because it is a best unbiased estimator. Unfortunately, this interpolant is computationally very expensive to compute exactly. For n scattered data points, computing the value of a single interpolant involves solving a dense linear system of size roughly n x n. This is infeasible for large n. In practice, kriging is solved approximately by local approaches that are based on considering only a relatively small'number of points that lie close to the query point. There are many problems with this local approach, however. The first is that determining the proper neighborhood size is tricky, and is usually solved by ad hoc methods such as selecting a fixed number of nearest neighbors or all the points lying within a fixed radius. Such fixed neighborhood sizes may not work well for all query points, depending on local density of the point distribution. Local methods also suffer from the problem that the resulting interpolant is not continuous. Meyer showed that while kriging produces smooth continues surfaces, it has zero order continuity along its borders. Thus, at interface boundaries where the neighborhood changes, the interpolant behaves discontinuously. Therefore, it is important to consider and solve the global system for each interpolant. However, solving such large dense systems for each query point is impractical. Recently a more principled approach to approximating kriging has been proposed based on a technique called covariance tapering. The problems arise from the fact that the covariance functions that are used in kriging have global support. Our implementations combine, utilize, and enhance a number of different approaches that have been introduced in literature for solving large linear systems for interpolation of scattered data points. For very large systems, exact methods such as Gaussian elimination are impractical since they require 0(n(exp 3)) time and 0(n(exp 2)) storage. As Billings et al. suggested, we use an iterative approach. In particular, we use the SYMMLQ method, for solving the large but sparse ordinary kriging systems that result from tapering. The main technical issue that need to be overcome in our algorithmic solution is that the points' covariance matrix for kriging should be symmetric positive definite. The goal of tapering is to obtain a sparse approximate representation of the covariance matrix while maintaining its positive definiteness. Furrer et al. used tapering to obtain a sparse linear system of the form Ax = b, where A is the tapered symmetric positive definite covariance matrix. Thus, Cholesky factorization could be used to solve their linear systems. They implemented an efficient sparse Cholesky decomposition method. They also showed if these tapers are used for a limited class of covariance models, the solution of the system converges to the solution of the original system. Matrix A in the ordinary kriging system, while symmetric, is not positive definite. Thus, their approach is not applicable to the ordinary kriging system. Therefore, we use tapering only to obtain a sparse linear system. Then, we use SYMMLQ to solve the ordinary kriging system. We show that solving large kriging systems becomes practical via tapering and iterative methods, and results in lower estimation errors compared to traditional local approaches, and significant memory savings compared to the original global system. We also developed a more efficient variant of the sparse SYMMLQ method for large ordinary kriging systems. This approach adaptively finds the correct local neighborhood for each query point in the interpolation process.
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
Functional brain networks reconstruction using group sparsity-regularized learning.

PubMed

Zhao, Qinghua; Li, Will X Y; Jiang, Xi; Lv, Jinglei; Lu, Jianfeng; Liu, Tianming

2018-06-01

Investigating functional brain networks and patterns using sparse representation of fMRI data has received significant interests in the neuroimaging community. It has been reported that sparse representation is effective in reconstructing concurrent and interactive functional brain networks. To date, most of data-driven network reconstruction approaches rarely take consideration of anatomical structures, which are the substrate of brain function. Furthermore, it has been rarely explored whether structured sparse representation with anatomical guidance could facilitate functional networks reconstruction. To address this problem, in this paper, we propose to reconstruct brain networks utilizing the structure guided group sparse regression (S2GSR) in which 116 anatomical regions from the AAL template, as prior knowledge, are employed to guide the network reconstruction when performing sparse representation of whole-brain fMRI data. Specifically, we extract fMRI signals from standard space aligned with the AAL template. Then by learning a global over-complete dictionary, with the learned dictionary as a set of features (regressors), the group structured regression employs anatomical structures as group information to regress whole brain signals. Finally, the decomposition coefficients matrix is mapped back to the brain volume to represent functional brain networks and patterns. We use the publicly available Human Connectome Project (HCP) Q1 dataset as the test bed, and the experimental results indicate that the proposed anatomically guided structure sparse representation is effective in reconstructing concurrent functional brain networks.
Quantum interference in multi-branched molecules: The exact transfer matrix solutions.

PubMed

Jiang, Yu

2017-12-07

We present a transfer matrix formalism for studying quantum interference in a single molecule electronic system with internal branched structures. Based on the Schrödinger equation with the Bethe ansatz and employing Kirchhoff's rule for quantum wires, we derive a general closed-form expression for the transmission and reflection amplitudes of a two-port quantum network. We show that the transport through a molecule with complex internal structures can be reduced to that of a single two-port scattering unit, which contains all the information of the original composite molecule. Our method allows for the calculation of the transmission coefficient for various types of individual molecular modules giving rise to different resonant transport behaviors such as the Breit-Wigner, Fano, and Mach-Zehnder resonances. As an illustration, we first re-derive the transmittance of the Aharonov-Bohm ring, and then we apply our formulation to N identical parity-time (PT)-symmetric potentials, connected in series as well as in parallel. It is shown that the spectral singularities and PT-symmetric transitions of single scattering cells may be observed in coupled systems. Such transitions may occur at the same or distinct values of the critical parameters, depending on the connection modes under which the scattering objects are coupled.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Oliker, Leonid; Vuduc, Richard

2007-01-01

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Multiprocessor sparse L/U decomposition with controlled fill-in

NASA Technical Reports Server (NTRS)

Alaghband, G.; Jordan, H. F.

1985-01-01

Generation of the maximal compatibles of pivot elements for a class of small sparse matrices is studied. The algorithm involves a binary tree search and has a complexity exponential in the order of the matrix. Different strategies for selection of a set of compatible pivots based on the Markowitz criterion are investigated. The competing issues of parallelism and fill-in generation are studied and results are provided. A technque for obtaining an ordered compatible set directly from the ordered incompatible table is given. This technique generates a set of compatible pivots with the property of generating few fills. A new hueristic algorithm is then proposed that combines the idea of an ordered compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. Finally, an elimination set to reduce the matrix is selected. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices are presented and analyzed.
Significance of rotating ground motions on nonlinear behavior of symmetric and asymmetric buildings in near fault sites

USGS Publications Warehouse

Kalkan, Erol; ,

2012-01-01

Building codes in the U.S. require at least two horizontal ground motion components for three-dimensional (3D) response history analysis (RHA) of structures. For sites within 5 km of an active fault, these records should be rotated to fault-normal/fault-parallel (FN/FP) directions, and two RHA analyses should be performed separately (when FN and then FP are aligned with transverse direction of the structural axes). It is assumed that this approach will lead to two sets of responses that envelope the range of possible responses over all non-redundant rotation angles. This assumption is examined here using 3D computer models of a single-story structure having symmetric (that is, torsionally-stiff) and asymmetric (that is, torsionally flexible) layouts subjected to an ensemble of bi-directional near-fault strong ground motions with and without apparent velocity pulses. In this parametric study, the elastic vibration period of the structures is varied from 0.2 to 5 seconds, and yield strength reduction factors R is varied from a value that leads to linear-elastic design to 3 and 5. The influence that the rotation angle of the ground motion has on several engineering demand parameters (EDPs) is examined in linear-elastic and nonlinear-inelastic domains to form a benchmark for evaluating the use of the FN/FP directions as well as the maximum-direction (MD) ground motion, a new definition of horizontal ground motions for use in the seismic design of structures according to the 2009 NEHRP Provisions and Commentary.
Krylov Subspace Methods for Complex Non-Hermitian Linear Systems. Thesis

NASA Technical Reports Server (NTRS)

Freund, Roland W.

1991-01-01

We consider Krylov subspace methods for the solution of large sparse linear systems Ax = b with complex non-Hermitian coefficient matrices. Such linear systems arise in important applications, such as inverse scattering, numerical solution of time-dependent Schrodinger equations, underwater acoustics, eddy current computations, numerical computations in quantum chromodynamics, and numerical conformal mapping. Typically, the resulting coefficient matrices A exhibit special structures, such as complex symmetry, or they are shifted Hermitian matrices. In this paper, we first describe a Krylov subspace approach with iterates defined by a quasi-minimal residual property, the QMR method, for solving general complex non-Hermitian linear systems. Then, we study special Krylov subspace methods designed for the two families of complex symmetric respectively shifted Hermitian linear systems. We also include some results concerning the obvious approach to general complex linear systems by solving equivalent real linear systems for the real and imaginary parts of x. Finally, numerical experiments for linear systems arising from the complex Helmholtz equation are reported.
An automatic multigrid method for the solution of sparse linear systems

NASA Technical Reports Server (NTRS)

Shapira, Yair; Israeli, Moshe; Sidi, Avram

1993-01-01

An automatic version of the multigrid method for the solution of linear systems arising from the discretization of elliptic PDE's is presented. This version is based on the structure of the algebraic system solely, and does not use the original partial differential operator. Numerical experiments show that for the Poisson equation the rate of convergence of our method is equal to that of classical multigrid methods. Moreover, the method is robust in the sense that its high rate of convergence is conserved for other classes of problems: non-symmetric, hyperbolic (even with closed characteristics) and problems on non-uniform grids. No double discretization or special treatment of sub-domains (e.g. boundaries) is needed. When supplemented with a vector extrapolation method, high rates of convergence are achieved also for anisotropic and discontinuous problems and also for indefinite Helmholtz equations. A new double discretization strategy is proposed for finite and spectral element schemes and is found better than known strategies.
Exact analytic solution for the spin-up maneuver of an axially symmetric spacecraft

NASA Astrophysics Data System (ADS)

Ventura, Jacopo; Romano, Marcello

2014-11-01

The problem of spinning-up an axially symmetric spacecraft subjected to an external torque constant in magnitude and parallel to the symmetry axis is considered. The existing exact analytic solution for an axially symmetric body is applied for the first time to this problem. The proposed solution is valid for any initial conditions of attitude and angular velocity and for any length of time and rotation amplitude. Furthermore, the proposed solution can be numerically evaluated up to any desired level of accuracy. Numerical experiments and comparison with an existing approximated solution and with the integration of the equations of motion are reported in the paper. Finally, a new approximated solution obtained from the exact one is introduced in this paper.

An approximation method for improving dynamic network model fitting.

PubMed

Carnegie, Nicole Bohme; Krivitsky, Pavel N; Hunter, David R; Goodreau, Steven M

There has been a great deal of interest recently in the modeling and simulation of dynamic networks, i.e., networks that change over time. One promising model is the separable temporal exponential-family random graph model (ERGM) of Krivitsky and Handcock, which treats the formation and dissolution of ties in parallel at each time step as independent ERGMs. However, the computational cost of fitting these models can be substantial, particularly for large, sparse networks. Fitting cross-sectional models for observations of a network at a single point in time, while still a non-negligible computational burden, is much easier. This paper examines model fitting when the available data consist of independent measures of cross-sectional network structure and the duration of relationships under the assumption of stationarity. We introduce a simple approximation to the dynamic parameters for sparse networks with relationships of moderate or long duration and show that the approximation method works best in precisely those cases where parameter estimation is most likely to fail-networks with very little change at each time step. We consider a variety of cases: Bernoulli formation and dissolution of ties, independent-tie formation and Bernoulli dissolution, independent-tie formation and dissolution, and dependent-tie formation models.
Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

PubMed Central

Papalexakis, Evangelos E.; Faloutsos, Christos; Mitchell, Tom M.; Talukdar, Partha Pratim; Sidiropoulos, Nicholas D.; Murphy, Brian

2016-01-01

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ’edible’, ’fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, ’friends’, wall-postings); there, Turbo-SMT spots spammer-like anomalies. PMID:27672406
Efficient sparse matrix-matrix multiplication for computing periodic responses by shooting method on Intel Xeon Phi

NASA Astrophysics Data System (ADS)

Stoykov, S.; Atanassov, E.; Margenov, S.

2016-10-01

Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.
Generation of Alfvenic Waves and Turbulence in Magnetic Reconnection Jets

NASA Astrophysics Data System (ADS)

Hoshino, M.

2014-12-01

The magneto-hydro-dynamic (MHD) linear stability for the plasma sheet with a localized bulk plasma flow parallel to the neutral sheet is investigated. We find three different unstable modes propagating parallel to the anti-parallel magnetic field line, and we call them as "streaming　tearing'', "streaming sausage'', and "streaming kink'' mode. The streaming tearing and sausage modes have the tearing mode-like structure with symmetric density fluctuation to the neutral sheet, and the streaming kink mode has the asymmetric fluctuation.　The growth rate of the streaming tearing mode decreases with increasing the magnetic Reynolds number, while those of the streaming　sausage and kink modes do not strongly depend on the Reynolds number. The wavelengths of these unstable modes are of the order of the thickness of plasma sheet, which behavior is almost same as the standard　tearing mode with no bulk flow. Roughly speaking the growth rates of　three modes become faster than the standard tearing mode. The situation of the plasma sheet with the bulk flow can be realized in the reconnection exhaust with the Alfvenic reconnection jet, and the unstable modes may be regarded as one of the generation processes of Alfvenic　turbulence in the plasma sheet during magnetic reconnection.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Ca2+-binding Motif of βγ-Crystallins*

PubMed Central

Srivastava, Shanti Swaroop; Mishra, Amita; Krishnan, Bal; Sharma, Yogendra

2014-01-01

βγ-Crystallin-type double clamp (N/D)(N/D)XX(S/T)S motif is an established but sparsely investigated motif for Ca2+ binding. A βγ-crystallin domain is formed of two Greek key motifs, accommodating two Ca2+-binding sites. βγ-Crystallins make a separate class of Ca2+-binding proteins (CaBP), apparently a major group of CaBP in bacteria. Paralleling the diversity in βγ-crystallin domains, these motifs also show great diversity, both in structure and in function. Although the expression of some of them has been associated with stress, virulence, and adhesion, the functional implications of Ca2+ binding to βγ-crystallins in mediating biological processes are yet to be elucidated. PMID:24567326
Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

DOE PAGES

Zhang, Hong; Zapol, Peter; Dixon, David A.; ...

2015-11-17

The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Hong; Zapol, Peter; Dixon, David A.

The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
Collagen production of osteoblasts revealed by ultra-high voltage electron microscopy.

PubMed

Hosaki-Takamiya, Rumiko; Hashimoto, Mana; Imai, Yuichi; Nishida, Tomoki; Yamada, Naoko; Mori, Hirotaro; Tanaka, Tomoyo; Kawanabe, Noriaki; Yamashiro, Takashi; Kamioka, Hiroshi

2016-09-01

In the bone, collagen fibrils form a lamellar structure called the "twisted plywood-like model." Because of this unique structure, bone can withstand various mechanical stresses. However, the formation of this structure has not been elucidated because of the difficulty of observing the collagen fibril production of the osteoblasts via currently available methods. This is because the formation occurs in the very limited space between the osteoblast layer and bone matrix. In this study, we used ultra-high-voltage electron microscopy (UHVEM) to observe collagen fibril production three-dimensionally. UHVEM has 3-MV acceleration voltage and enables us to use thicker sections. We observed collagen fibrils that were beneath the cell membrane of osteoblasts elongated to the outside of the cell. We also observed that osteoblasts produced collagen fibrils with polarity. By using AVIZO software, we observed collagen fibrils produced by osteoblasts along the contour of the osteoblasts toward the bone matrix area. Immediately after being released from the cell, the fibrils run randomly and sparsely. But as they recede from the osteoblast, the fibrils began to run parallel to the definite direction and became thick, and we observed a periodical stripe at that area. Furthermore, we also observed membrane structures wrapped around filamentous structures inside the osteoblasts. The filamentous structures had densities similar to the collagen fibrils and a columnar form and diameter. Our results suggested that collagen fibrils run parallel and thickly, which may be related to the lateral movement of the osteoblasts. UHVEM is a powerful tool for observing collagen fibril production.
Relaxations to Sparse Optimization Problems and Applications

NASA Astrophysics Data System (ADS)

Skau, Erik West

Parsimony is a fundamental property that is applied to many characteristics in a variety of fields. Of particular interest are optimization problems that apply rank, dimensionality, or support in a parsimonious manner. In this thesis we study some optimization problems and their relaxations, and focus on properties and qualities of the solutions of these problems. The Gramian tensor decomposition problem attempts to decompose a symmetric tensor as a sum of rank one tensors.We approach the Gramian tensor decomposition problem with a relaxation to a semidefinite program. We study conditions which ensure that the solution of the relaxed semidefinite problem gives the minimal Gramian rank decomposition. Sparse representations with learned dictionaries are one of the leading image modeling techniques for image restoration. When learning these dictionaries from a set of training images, the sparsity parameter of the dictionary learning algorithm strongly influences the content of the dictionary atoms.We describe geometrically the content of trained dictionaries and how it changes with the sparsity parameter.We use statistical analysis to characterize how the different content is used in sparse representations. Finally, a method to control the structure of the dictionaries is demonstrated, allowing us to learn a dictionary which can later be tailored for specific applications. Variations of dictionary learning can be broadly applied to a variety of applications.We explore a pansharpening problem with a triple factorization variant of coupled dictionary learning. Another application of dictionary learning is computer vision. Computer vision relies heavily on object detection, which we explore with a hierarchical convolutional dictionary learning model. Data fusion of disparate modalities is a growing topic of interest.We do a case study to demonstrate the benefit of using social media data with satellite imagery to estimate hazard extents. In this case study analysis we apply a maximum entropy model, guided by the social media data, to estimate the flooded regions during a 2013 flood in Boulder, CO and show that the results are comparable to those obtained using expert information.
Analysis, tuning and comparison of two general sparse solvers for distributed memory computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amestoy, P.R.; Duff, I.S.; L'Excellent, J.-Y.

2000-06-30

We describe the work performed in the context of a Franco-Berkeley funded project between NERSC-LBNL located in Berkeley (USA) and CERFACS-ENSEEIHT located in Toulouse (France). We discuss both the tuning and performance analysis of two distributed memory sparse solvers (superlu from Berkeley and mumps from Toulouse) on the 512 processor Cray T3E from NERSC (Lawrence Berkeley National Laboratory). This project gave us the opportunity to improve the algorithms and add new features to the codes. We then quite extensively analyze and compare the two approaches on a set of large problems from real applications. We further explain the main differencesmore » in the behavior of the approaches on artificial regular grid problems. As a conclusion to this activity report, we mention a set of parallel sparse solvers on which this type of study should be extended.« less
M1 transitions between low-lying states in the sdg-IBM-2

NASA Astrophysics Data System (ADS)

Casperson, Robert; Werner, Volker

2006-10-01

The interplay between collective and single-particle degrees of freedom for nuclei in the A=90 region have recently been under investigation. In Molybdenum and Ruthenium nuclei, collective symmetric and mixed-symmetric structures have been identified, while in Zirconium, underlying shell-structure plays an enhanced role. Collective symmetric structures appear when protons and neutrons are in phase, whereas mixed-symmetric structures occur when they are not. The one-phonon 2^+ mixed-symmetric state was identified from strong M1 transitions to the 2^+1 state. Similar transitions were observed between higher-spin states, and are predicted by the shell model. These phenomena will be investigated within the sdg Interacting Boson Model 2 in order to obtain a better understanding about the structure of the states involved, and results from first model calculations will be presented. Work supported by US DOE under grant number DE-FG02-91ER-40609.
Overview of Sparse Graph for Multiple Access in Future Mobile Networks

NASA Astrophysics Data System (ADS)

Lei, Jing; Li, Baoguo; Li, Erbao; Gong, Zhenghui

2017-10-01

Multiple access via sparse graph, such as low density signature (LDS) and sparse code multiple access (SCMA), is a promising technique for future wireless communications. This survey presents an overview of the developments in this burgeoning field, including transmitter structures, extrinsic information transform (EXIT) chart analysis and comparisons with existing multiple access techniques. Such technique enables multiple access under overloaded conditions to achieve a satisfactory performance. Message passing algorithm is utilized for multi-user detection in the receiver, and structures of the sparse graph are illustrated in detail. Outlooks and challenges of this technique are also presented.
Parallel computing techniques for rotorcraft aerodynamics

NASA Astrophysics Data System (ADS)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
MODFLOW–USG version 1: An unstructured grid version of MODFLOW for simulating groundwater flow and tightly coupled processes using a control volume finite-difference formulation

USGS Publications Warehouse

Panday, Sorab; Langevin, Christian D.; Niswonger, Richard G.; Ibaraki, Motomu; Hughes, Joseph D.

2013-01-01

A new version of MODFLOW, called MODFLOW–USG (for UnStructured Grid), was developed to support a wide variety of structured and unstructured grid types, including nested grids and grids based on prismatic triangles, rectangles, hexagons, and other cell shapes. Flexibility in grid design can be used to focus resolution along rivers and around wells, for example, or to subdiscretize individual layers to better represent hydrostratigraphic units. MODFLOW–USG is based on an underlying control volume finite difference (CVFD) formulation in which a cell can be connected to an arbitrary number of adjacent cells. To improve accuracy of the CVFD formulation for irregular grid-cell geometries or nested grids, a generalized Ghost Node Correction (GNC) Package was developed, which uses interpolated heads in the flow calculation between adjacent connected cells. MODFLOW–USG includes a Groundwater Flow (GWF) Process, based on the GWF Process in MODFLOW–2005, as well as a new Connected Linear Network (CLN) Process to simulate the effects of multi-node wells, karst conduits, and tile drains, for example. The CLN Process is tightly coupled with the GWF Process in that the equations from both processes are formulated into one matrix equation and solved simultaneously. This robustness results from using an unstructured grid with unstructured matrix storage and solution schemes. MODFLOW–USG also contains an optional Newton-Raphson formulation, based on the formulation in MODFLOW–NWT, for improving solution convergence and avoiding problems with the drying and rewetting of cells. Because the existing MODFLOW solvers were developed for structured and symmetric matrices, they were replaced with a new Sparse Matrix Solver (SMS) Package developed specifically for MODFLOW–USG. The SMS Package provides several methods for resolving nonlinearities and multiple symmetric and asymmetric linear solution schemes to solve the matrix arising from the flow equations and the Newton-Raphson formulation, respectively.
Spatio-temporal Event Classification using Time-series Kernel based Structured Sparsity

PubMed Central

Jeni, László A.; Lőrincz, András; Szabó, Zoltán; Cohn, Jeffrey F.; Kanade, Takeo

2016-01-01

In many behavioral domains, such as facial expression and gesture, sparse structure is prevalent. This sparsity would be well suited for event detection but for one problem. Features typically are confounded by alignment error in space and time. As a consequence, high-dimensional representations such as SIFT and Gabor features have been favored despite their much greater computational cost and potential loss of information. We propose a Kernel Structured Sparsity (KSS) method that can handle both the temporal alignment problem and the structured sparse reconstruction within a common framework, and it can rely on simple features. We characterize spatio-temporal events as time-series of motion patterns and by utilizing time-series kernels we apply standard structured-sparse coding techniques to tackle this important problem. We evaluated the KSS method using both gesture and facial expression datasets that include spontaneous behavior and differ in degree of difficulty and type of ground truth coding. KSS outperformed both sparse and non-sparse methods that utilize complex image features and their temporal extensions. In the case of early facial event classification KSS had 10% higher accuracy as measured by F1 score over kernel SVM methods1. PMID:27830214
Medical Image Fusion Based on Feature Extraction and Sparse Representation

PubMed Central

Wei, Gao; Zongxi, Song

2017-01-01

As a novel multiscale geometric analysis tool, sparse representation has shown many advantages over the conventional image representation methods. However, the standard sparse representation does not take intrinsic structure and its time complexity into consideration. In this paper, a new fusion mechanism for multimodal medical images based on sparse representation and decision map is proposed to deal with these problems simultaneously. Three decision maps are designed including structure information map (SM) and energy information map (EM) as well as structure and energy map (SEM) to make the results reserve more energy and edge information. SM contains the local structure feature captured by the Laplacian of a Gaussian (LOG) and EM contains the energy and energy distribution feature detected by the mean square deviation. The decision map is added to the normal sparse representation based method to improve the speed of the algorithm. Proposed approach also improves the quality of the fused results by enhancing the contrast and reserving more structure and energy information from the source images. The experiment results of 36 groups of CT/MR, MR-T1/MR-T2, and CT/PET images demonstrate that the method based on SR and SEM outperforms five state-of-the-art methods. PMID:28321246
New shape models of asteroids reconstructed from sparse-in-time photometry

NASA Astrophysics Data System (ADS)

Durech, Josef; Hanus, Josef; Vanco, Radim; Oszkiewicz, Dagmara Anna

2015-08-01

Asteroid physical parameters - the shape, the sidereal rotation period, and the spin axis orientation - can be reconstructed from the disk-integrated photometry either dense (classical lightcurves) or sparse in time by the lightcurve inversion method. We will review our recent progress in asteroid shape reconstruction from sparse photometry. The problem of finding a unique solution of the inverse problem is time consuming because the sidereal rotation period has to be found by scanning a wide interval of possible periods. This can be efficiently solved by splitting the period parameter space into small parts that are sent to computers of volunteers and processed in parallel. We will show how this approach of distributed computing works with currently available sparse photometry processed in the framework of project Asteroids@home. In particular, we will show the results based on the Lowell Photometric Database. The method produce reliable asteroid models with very low rate of false solutions and the pipelines and codes can be directly used also to other sources of sparse photometry - Gaia data, for example. We will present the distribution of spin axis of hundreds of asteroids, discuss the dependence of the spin obliquity on the size of an asteroid,and show examples of spin-axis distribution in asteroid families that confirm the Yarkovsky/YORP evolution scenario.
HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION

PubMed Central

Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong

2015-01-01

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645
Group-sparse representation with dictionary learning for medical image denoising and fusion.

PubMed

Li, Shutao; Yin, Haitao; Fang, Leyuan

2012-12-01

Recently, sparse representation has attracted a lot of interest in various areas. However, the standard sparse representation does not consider the intrinsic structure, i.e., the nonzero elements occur in clusters, called group sparsity. Furthermore, there is no dictionary learning method for group sparse representation considering the geometrical structure of space spanned by atoms. In this paper, we propose a novel dictionary learning method, called Dictionary Learning with Group Sparsity and Graph Regularization (DL-GSGR). First, the geometrical structure of atoms is modeled as the graph regularization. Then, combining group sparsity and graph regularization, the DL-GSGR is presented, which is solved by alternating the group sparse coding and dictionary updating. In this way, the group coherence of learned dictionary can be enforced small enough such that any signal can be group sparse coded effectively. Finally, group sparse representation with DL-GSGR is applied to 3-D medical image denoising and image fusion. Specifically, in 3-D medical image denoising, a 3-D processing mechanism (using the similarity among nearby slices) and temporal regularization (to perverse the correlations across nearby slices) are exploited. The experimental results on 3-D image denoising and image fusion demonstrate the superiority of our proposed denoising and fusion approaches.

A new method for computation of eigenvector derivatives with distinct and repeated eigenvalues in structural dynamic analysis

NASA Astrophysics Data System (ADS)

Li, Zhengguang; Lai, Siu-Kai; Wu, Baisheng

2018-07-01

Determining eigenvector derivatives is a challenging task due to the singularity of the coefficient matrices of the governing equations, especially for those structural dynamic systems with repeated eigenvalues. An effective strategy is proposed to construct a non-singular coefficient matrix, which can be directly used to obtain the eigenvector derivatives with distinct and repeated eigenvalues. This approach also has an advantage that only requires eigenvalues and eigenvectors of interest, without solving the particular solutions of eigenvector derivatives. The Symmetric Quasi-Minimal Residual (SQMR) method is then adopted to solve the governing equations, only the existing factored (shifted) stiffness matrix from an iterative eigensolution such as the subspace iteration method or the Lanczos algorithm is utilized. The present method can deal with both cases of simple and repeated eigenvalues in a unified manner. Three numerical examples are given to illustrate the accuracy and validity of the proposed algorithm. Highly accurate approximations to the eigenvector derivatives are obtained within a few iteration steps, making a significant reduction of the computational effort. This method can be incorporated into a coupled eigensolver/derivative software module. In particular, it is applicable for finite element models with large sparse matrices.
Design of almost symmetric orthogonal wavelet filter bank via direct optimization.

PubMed

Murugesan, Selvaraaju; Tay, David B H

2012-05-01

It is a well-known fact that (compact-support) dyadic wavelets [based on the two channel filter banks (FBs)] cannot be simultaneously orthogonal and symmetric. Although orthogonal wavelets have the energy preservation property, biorthogonal wavelets are preferred in image processing applications because of their symmetric property. In this paper, a novel method is presented for the design of almost symmetric orthogonal wavelet FB. Orthogonality is structurally imposed by using the unnormalized lattice structure, and this leads to an objective function, which is relatively simple to optimize. The designed filters have good frequency response, flat group delay, almost symmetric filter coefficients, and symmetric wavelet function.
The apparatus composition and architecture of Cordylodus pander - Concepts of homology in primitive conodonts

USGS Publications Warehouse

Smith, M.P.; Donoghue, P.C.J.; Repetski, J.E.

2005-01-01

A clear distinction may be drawn between the perpendicular architecture of the feeding apparatus of ozarkodinid, prioniodontid and prioniodinid conodonts, in which the P elements are situated at a high angle to the M and S elements, and the parallel architecture of panderodontid and other coniform apparatuses, where two suites of coniform elements lie parallel to each other and oppose across the midline. The quest for homologies between the two architectures has been fraught with difficulty, at least in part because of the paucity of natural assemblages of coniform taxa. A diagenetically fused apparatus of Cordylodns lindstroini elements is here described which is made up of one rounded and two compressed element morphotypes. One of the compressed elements is bowed and asymmetrical and the other is unbowed and more symmetrical. These compressed elements are considered to be homologous with those of panderodontid apparatuses and would have lain at the caudal end of the parallel arrays, with the more symmetrical morphotypes located rostrally to the asymmetrical ones. The bowed and unbowed compressed elements of Cordylodns thus correspond, respectively, to the pt and pf positions of panderodontid apparatuses. In addition, the presence of symmetry transition within the rounded elements of Cordylodns, but not the compressed morphotypes, enables correlation of these with the S and M element locations of ozarkodinid apparatuses. By extension, the compressed elements must be homologues of the P elements. Specifically, the asymmetrical pt morphotype is homologous with the P1 of ozarkodinids and the more symmetrical and rostral pf morphotype is homologous with the P2 position. However, because of uncertainties over the nature of topological transformation of the rostral element array (the "rounded" or "costate" suites), it is not possible to recognize specific homologies between these elements and the M and S elements of ozarkodinids. Morphologic differentiation of P from M and S element suites thus preceded the topological transformation from parallel to perpendicular apparatus architectures.
Remote sensing image segmentation using local sparse structure constrained latent low rank representation

NASA Astrophysics Data System (ADS)

Tian, Shu; Zhang, Ye; Yan, Yimin; Su, Nan; Zhang, Junping

2016-09-01

Latent low-rank representation (LatLRR) has been attached considerable attention in the field of remote sensing image segmentation, due to its effectiveness in exploring the multiple subspace structures of data. However, the increasingly heterogeneous texture information in the high spatial resolution remote sensing images, leads to more severe interference of pixels in local neighborhood, and the LatLRR fails to capture the local complex structure information. Therefore, we present a local sparse structure constrainted latent low-rank representation (LSSLatLRR) segmentation method, which explicitly imposes the local sparse structure constraint on LatLRR to capture the intrinsic local structure in manifold structure feature subspaces. The whole segmentation framework can be viewed as two stages in cascade. In the first stage, we use the local histogram transform to extract the texture local histogram features (LHOG) at each pixel, which can efficiently capture the complex and micro-texture pattern. In the second stage, a local sparse structure (LSS) formulation is established on LHOG, which aims to preserve the local intrinsic structure and enhance the relationship between pixels having similar local characteristics. Meanwhile, by integrating the LSS and the LatLRR, we can efficiently capture the local sparse and low-rank structure in the mixture of feature subspace, and we adopt the subspace segmentation method to improve the segmentation accuracy. Experimental results on the remote sensing images with different spatial resolution show that, compared with three state-of-the-art image segmentation methods, the proposed method achieves more accurate segmentation results.
Sparse Matrices in MATLAB: Design and Implementation

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Moler, Cleve; Schreiber, Robert

1992-01-01

The matrix computation language and environment MATLAB is extended to include sparse matrix storage and operations. The only change to the outward appearance of the MATLAB language is a pair of commands to create full or sparse matrices. Nearly all the operations of MATLAB now apply equally to full or sparse matrices, without any explicit action by the user. The sparse data structure represents a matrix in space proportional to the number of nonzero entries, and most of the operations compute sparse results in time proportional to the number of arithmetic operations on nonzeros.
A Parallel Vector Machine for the PM Programming Language

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2016-04-01

PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
A new idea for broad band reflector and tunable multichannel filter of one dimensional symmetric photonic crystal with magnetized cold plasma defects

NASA Astrophysics Data System (ADS)

Kumar, Asish; Singh, Prabal P.; Thapa, Khem B.

2018-05-01

The optical properties of one-dimensional periodic structure composed by SiO2 and dielectric (air) layers with asymmetric and symmetric forms studied. The transmittance for symmetric periodic defective structure analyzed by introducing one, two, three layers of magnetized cold plasma (MCP) in one-dimensional periodic structure. We found better result for symmetric defect of three layer of the MCP compare to the other defective structures. On the basis of our calculated results, we proposed a new idea for broadband reflector at lower frequency range as well as the multichannel filter at higher frequency range.
Epileptic Seizure Detection with Log-Euclidean Gaussian Kernel-Based Sparse Representation.

PubMed

Yuan, Shasha; Zhou, Weidong; Wu, Qi; Zhang, Yanli

2016-05-01

Epileptic seizure detection plays an important role in the diagnosis of epilepsy and reducing the massive workload of reviewing electroencephalography (EEG) recordings. In this work, a novel algorithm is developed to detect seizures employing log-Euclidean Gaussian kernel-based sparse representation (SR) in long-term EEG recordings. Unlike the traditional SR for vector data in Euclidean space, the log-Euclidean Gaussian kernel-based SR framework is proposed for seizure detection in the space of the symmetric positive definite (SPD) matrices, which form a Riemannian manifold. Since the Riemannian manifold is nonlinear, the log-Euclidean Gaussian kernel function is applied to embed it into a reproducing kernel Hilbert space (RKHS) for performing SR. The EEG signals of all channels are divided into epochs and the SPD matrices representing EEG epochs are generated by covariance descriptors. Then, the testing samples are sparsely coded over the dictionary composed by training samples utilizing log-Euclidean Gaussian kernel-based SR. The classification of testing samples is achieved by computing the minimal reconstructed residuals. The proposed method is evaluated on the Freiburg EEG dataset of 21 patients and shows its notable performance on both epoch-based and event-based assessments. Moreover, this method handles multiple channels of EEG recordings synchronously which is more speedy and efficient than traditional seizure detection methods.
Turbulence-driven anisotropic electron tail generation during magnetic reconnection

NASA Astrophysics Data System (ADS)

DuBois, A. M.; Scherer, A.; Almagri, A. F.; Anderson, J. K.; Pandya, M. D.; Sarff, J. S.

2018-05-01

Magnetic reconnection (MR) plays an important role in particle transport, energization, and acceleration in space, astrophysical, and laboratory plasmas. In the Madison Symmetric Torus reversed field pinch, discrete MR events release large amounts of energy from the equilibrium magnetic field, a fraction of which is transferred to electrons and ions. Previous experiments revealed an anisotropic electron tail that favors the perpendicular direction and is symmetric in the parallel. New profile measurements of x-ray emission show that the tail distribution is localized near the magnetic axis, consistent modeling of the bremsstrahlung emission. The tail appears first near the magnetic axis and then spreads radially, and the dynamics in the anisotropy and diffusion are discussed. The data presented imply that the electron tail formation likely results from a turbulent wave-particle interaction and provides evidence that high energy electrons are escaping the core-localized region through pitch angle scattering into the parallel direction, followed by stochastic parallel transport to the plasma edge. New measurements also show a strong correlation between high energy x-ray measurements and tearing mode dynamics, suggesting that the coupling between core and edge tearing modes is essential for energetic electron tail formation.
Influence of Thermal Anisotropy on Equilibrium Stellarator Beta Limits

NASA Astrophysics Data System (ADS)

Bechtel, T. A.; Hegna, C. C.; Sovinec, C. R.

2017-10-01

The effect of anisotropic heat conduction on the upper beta limit of stellarator plasmas is studied using the nonlinear, extended MHD code NIMROD. The configuration under investigation is an l=2, M=10 torsatron with vacuum rotational transform near unity. Finite-beta plasmas are created using a volumetric heating source and temperature dependent resistivity; modeled with 22 stellarator symmetric (integer multiples of M) toroidal modes. Extended MHD simulations are then performed to generate steady state solutions that represent 3D equilibria. With increased heating, Shafranov shifts occur, and the associated break up of edge magnetic surfaces limits the achievable beta. Due to the presence of finite parallel heat conduction, pressure profiles can exist in regions of magnetic stochasticity. Here, we present results of independently varying the parallel and perpendicular thermal anisotropy. In particular, simulations show that the attained stored energy is a function of the magnitude of parallel and perpendicular thermal conduction for a given heat source, indicating that equilibrium beta limits are sensitive to anisotropic transport properties. Preliminary studies of MHD stability with non-stellarator symmetric modes, near the highest achievable beta, are also presented. Research supported by US DOE under Grant No. DE-FG02-99ER54546.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Boman, Erik G.

This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less
Optimization and validation of accelerated golden-angle radial sparse MRI reconstruction with self-calibrating GRAPPA operator gridding.

PubMed

Benkert, Thomas; Tian, Ye; Huang, Chenchan; DiBella, Edward V R; Chandarana, Hersh; Feng, Li

2018-07-01

Golden-angle radial sparse parallel (GRASP) MRI reconstruction requires gridding and regridding to transform data between radial and Cartesian k-space. These operations are repeatedly performed in each iteration, which makes the reconstruction computationally demanding. This work aimed to accelerate GRASP reconstruction using self-calibrating GRAPPA operator gridding (GROG) and to validate its performance in clinical imaging. GROG is an alternative gridding approach based on parallel imaging, in which k-space data acquired on a non-Cartesian grid are shifted onto a Cartesian k-space grid using information from multicoil arrays. For iterative non-Cartesian image reconstruction, GROG is performed only once as a preprocessing step. Therefore, the subsequent iterative reconstruction can be performed directly in Cartesian space, which significantly reduces computational burden. Here, a framework combining GROG with GRASP (GROG-GRASP) is first optimized and then compared with standard GRASP reconstruction in 22 prostate patients. GROG-GRASP achieved approximately 4.2-fold reduction in reconstruction time compared with GRASP (∼333 min versus ∼78 min) while maintaining image quality (structural similarity index ≈ 0.97 and root mean square error ≈ 0.007). Visual image quality assessment by two experienced radiologists did not show significant differences between the two reconstruction schemes. With a graphics processing unit implementation, image reconstruction time can be further reduced to approximately 14 min. The GRASP reconstruction can be substantially accelerated using GROG. This framework is promising toward broader clinical application of GRASP and other iterative non-Cartesian reconstruction methods. Magn Reson Med 80:286-293, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Drywall stilt dermatosis.

PubMed

Lewis, E J; Prawer, S E; Crutchfield, C E

1996-12-01

We describe a previously unreported occupational dermatosis occurring in a worker employed in drywall installation and finishing. This 50-year-old man presented with bilaterally symmetrical, parallel, linear crusted erosions on his anteromedial legs after wearing drywall stilts. The pathophysiology of this condition is considered.
ASICs Approach for the Implementation of a Symmetric Triangular Fuzzy Coprocessor and Its Application to Adaptive Filtering

NASA Technical Reports Server (NTRS)

Starks, Scott; Abdel-Hafeez, Saleh; Usevitch, Bryan

1997-01-01

This paper discusses the implementation of a fuzzy logic system using an ASICs design approach. The approach is based upon combining the inherent advantages of symmetric triangular membership functions and fuzzy singleton sets to obtain a novel structure for fuzzy logic system application development. The resulting structure utilizes a fuzzy static RAM to store the rule-base and the end-points of the triangular membership functions. This provides advantages over other approaches in which all sampled values of membership functions for all universes must be stored. The fuzzy coprocessor structure implements the fuzzification and defuzzification processes through a two-stage parallel pipeline architecture which is capable of executing complex fuzzy computations in less than 0.55us with an accuracy of more than 95%, thus making it suitable for a wide range of applications. Using the approach presented in this paper, a fuzzy logic rule-base can be directly downloaded via a host processor to an onchip rule-base memory with a size of 64 words. The fuzzy coprocessor's design supports up to 49 rules for seven fuzzy membership functions associated with each of the chip's two input variables. This feature allows designers to create fuzzy logic systems without the need for additional on-board memory. Finally, the paper reports on simulation studies that were conducted for several adaptive filter applications using the least mean squared adaptive algorithm for adjusting the knowledge rule-base.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow

NASA Technical Reports Server (NTRS)

Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry

2005-01-01

Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
MGMRES: A generalization of GMRES for solving large sparse nonsymmetric linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Young, D.M.; Chen, J.Y.

1994-12-31

The authors are concerned with the solution of the linear system (1): Au = b, where A is a real square nonsingular matrix which is large, sparse and non-symmetric. They consider the use of Krylov subspace methods. They first choose an initial approximation u{sup (0)} to the solution {bar u} = A{sup {minus}1}B of (1). They also choose an auxiliary matrix Z which is nonsingular. For n = 1,2,{hor_ellipsis} they determine u{sup (n)} such that u{sup (n)} {minus} u{sup (0)}{epsilon}K{sub n}(r{sup (0)},A) where K{sub n}(r{sup (0)},A) is the (Krylov) subspace spanned by the Krylov vectors r{sup (0)}, Ar{sup (0)}, {hor_ellipsis},more » A{sup n{minus}1}r{sup 0} and where r{sup (0)} = b{minus}Au{sup (0)}. If ZA is SPD they also require that (u{sup (n)}{minus}{bar u}, ZA(u{sup (n)}{minus}{bar u})) be minimized. If, on the other hand, ZA is not SPD, then they require that the Galerkin condition, (Zr{sup n}, v) = 0, be satisfied for all v{epsilon}K{sub n}(r{sup (0)}, A) where r{sup n} = b{minus}Au{sup (n)}. In this paper the authors consider a generalization of GMRES. This generalized method, which they refer to as `MGMRES`, is very similar to GMRES except that they let Z = A{sup T}Y where Y is a nonsingular matrix which is symmetric by not necessarily SPD.« less
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

NASA Astrophysics Data System (ADS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

NASA Astrophysics Data System (ADS)

Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

2016-01-01

Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
Parallel computation safety analysis irradiation targets fission product molybdenum in neutronic aspect using the successive over-relaxation algorithm

NASA Astrophysics Data System (ADS)

Susmikanti, Mike; Dewayatna, Winter; Sulistyo, Yos

2014-09-01

One of the research activities in support of commercial radioisotope production program is a safety research on target FPM (Fission Product Molybdenum) irradiation. FPM targets form a tube made of stainless steel which contains nuclear-grade high-enrichment uranium. The FPM irradiation tube is intended to obtain fission products. Fission materials such as Mo99 used widely the form of kits in the medical world. The neutronics problem is solved using first-order perturbation theory derived from the diffusion equation for four groups. In contrast, Mo isotopes have longer half-lives, about 3 days (66 hours), so the delivery of radioisotopes to consumer centers and storage is possible though still limited. The production of this isotope potentially gives significant economic value. The criticality and flux in multigroup diffusion model was calculated for various irradiation positions and uranium contents. This model involves complex computation, with large and sparse matrix system. Several parallel algorithms have been developed for the sparse and large matrix solution. In this paper, a successive over-relaxation (SOR) algorithm was implemented for the calculation of reactivity coefficients which can be done in parallel. Previous works performed reactivity calculations serially with Gauss-Seidel iteratives. The parallel method can be used to solve multigroup diffusion equation system and calculate the criticality and reactivity coefficients. In this research a computer code was developed to exploit parallel processing to perform reactivity calculations which were to be used in safety analysis. The parallel processing in the multicore computer system allows the calculation to be performed more quickly. This code was applied for the safety limits calculation of irradiated FPM targets containing highly enriched uranium. The results of calculations neutron show that for uranium contents of 1.7676 g and 6.1866 g (× 106 cm-1) in a tube, their delta reactivities are the still within safety limits; however, for 7.9542 g and 8.838 g (× 106 cm-1) the limits were exceeded.
Extraordinary reflection and transmission with direction dependent wavelength selectivity based on parity-time-symmetric multilayers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Shulin; Wang, Guo Ping, E-mail: gpwang@szu.edu.cn; College of Electronic Science and Technology, Shenzhen University, Shenzhen 518060

In this paper, we present a kind of periodical ternary parity-time (PT) -symmetric multilayers to realize nearly 100% reflectance and transmittance simultaneously when light is incident from a certain direction. This extraordinary reflection and transmission is original from unidirectional Bragg reflection of PT-symmetric systems as the symmetry spontaneous breaking happens at PT thresholds. The extra energy involved in reflection and transmission lights is obtained from pumping light to the gain regions of the structure. Moreover, we find that our PT-symmetric structure shows direction dependent wavelength selectivity. When the illumination light is incident from two opposite directions into the multilayer structure,more » such extraordinary reflection and transmission appear at visible and near-infrared wavelengths, respectively. Such distinguishing properties may provide these structures with attractive applications as beam splitters, laser mirrors, narrow band filters, and multiband PT-symmetric optical devices.« less

The Effect of a Guide Field on the Structures of Magnetic Islands: 2D PIC Simulations

NASA Astrophysics Data System (ADS)

Huang, C.; Lu, Q.; Lu, S.; Wang, P.; Wang, S.

2014-12-01

Magnetic island plays an important role in magnetic reconnection. Using a series of 2D PIC simulations, we investigate the magnetic structures of a magnetic island formed during multiple X-line magnetic reconnection, considering the effects of the guide field in symmetric and asymmetric current sheets. In a symmetric current sheet, the current in the direction forms a tripolar structure inside a magnetic island during anti-parallel reconnection, which results in a quadrupole structure of the out-of-plane magnetic field. With the increase of the guide field, the symmetry of both the current system and out-of-plane magnetic field inside the magnetic island is distorted. When the guide field is sufficiently strong, the current forms a ring along the magnetic field lines inside magnetic island. At the same time, the current carried by the energetic electrons accelerated in the vicinity of the X lines forms another ring at the edge of the magnetic island. Such a dual-ring current system enhance the out-of-plane magnetic field inside the magnetic island with a dip in the center of the magnetic island. In an asymmetric current sheet, when there is no guide field, electrons flows toward the X lines along the separatrices from the side with a higher density, and are then directed away from the X lines along the separatrices to the side with a lower density. The formed current results in the enhancement of the out-of-plane magnetic field at one end of the magnetic island, and the attenuation at the other end. With the increase of the guide field, the structures of both the current system and the out-of-plane magnetic field are distorted.
Microfabricated linear Paul-Straubel ion trap

DOEpatents

Mangan, Michael A [Albuquerque, NM; Blain, Matthew G [Albuquerque, NM; Tigges, Chris P [Albuquerque, NM; Linker, Kevin L [Albuquerque, NM

2011-04-19

An array of microfabricated linear Paul-Straubel ion traps can be used for mass spectrometric applications. Each ion trap comprises two parallel inner RF electrodes and two parallel outer DC control electrodes symmetric about a central trap axis and suspended over an opening in a substrate. Neighboring ion traps in the array can share a common outer DC control electrode. The ions confined transversely by an RF quadrupole electric field potential well on the ion trap axis. The array can trap a wide array of ions.
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Oliker, Leonid; Vuduc, Richard

2008-10-16

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one ofmore » the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

NASA Astrophysics Data System (ADS)

Sakata, Ayaka; Xu, Yingying

2018-03-01

We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida-Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric (RS) solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of \
Repeated Red-Black ordering

NASA Astrophysics Data System (ADS)

Ciarlet, P.

1994-09-01

Hereafter, we describe and analyze, from both a theoretical and a numerical point of view, an iterative method for efficiently solving symmetric elliptic problems with possibly discontinuous coefficients. In the following, we use the Preconditioned Conjugate Gradient method to solve the symmetric positive definite linear systems which arise from the finite element discretization of the problems. We focus our interest on sparse and efficient preconditioners. In order to define the preconditioners, we perform two steps: first we reorder the unknowns and then we carry out a (modified) incomplete factorization of the original matrix. We study numerically and theoretically two preconditioners, the second preconditioner corresponding to the one investigated by Brand and Heinemann [2]. We prove convergence results about the Poisson equation with either Dirichlet or periodic boundary conditions. For a meshsizeh, Brand proved that the condition number of the preconditioned system is bounded byO(h-1/2) for Dirichlet boundary conditions. By slightly modifying the preconditioning process, we prove that the condition number is bounded byO(h-1/3).
Improvements in sparse matrix operations of NASTRAN

NASA Technical Reports Server (NTRS)

Harano, S.

1980-01-01

A "nontransmit" packing routine was added to NASTRAN to allow matrix data to be refered to directly from the input/output buffer. Use of the packing routine permits various routines for matrix handling to perform a direct reference to the input/output buffer if data addresses have once been received. The packing routine offers a buffer by buffer backspace feature for efficient backspacing in sequential access. Unlike a conventional backspacing that needs twice back record for a single read of one record (one column), this feature omits overlapping of READ operation and back record. It eliminates the necessity of writing, in decomposition of a symmetric matrix, of a portion of the matrix to its upper triangular matrix from the last to the first columns of the symmetric matrix, thus saving time for generating the upper triangular matrix. Only a lower triangular matrix must be written onto the secondary storage device, bringing 10 to 30% reduction in use of the disk space of the storage device.
Evolution of magnetization due to asymmetric dimerization: theoretical considerations and application to aberrant oligomers formed by apoSOD1(2SH).

PubMed

Sekhar, Ashok; Bain, Alex D; Rumfeldt, Jessica A O; Meiering, Elizabeth M; Kay, Lewis E

2016-02-17

A set of coupled differential equations is presented describing the evolution of magnetization due to an exchange reaction whereby a pair of identical monomers form an asymmetric dimer. In their most general form the equations describe a three-site exchange process that reduces to two-site exchange under certain limiting conditions that are discussed. An application to the study of sparsely populated, transiently formed sets of aberrant dimers, symmetric and asymmetric, of superoxide dismutase is presented. Fits of concentration dependent CPMG relaxation dispersion profiles provide measures of the dimer dissociation constants and both on- and off-rates. Dissociation constants on the order of 70 mM are extracted from fits of the data, with dimeric populations of ∼2% and lifetimes of ∼6 and ∼2 ms for the symmetric and asymmetric complexes, respectively. This work emphasizes the important role that NMR relaxation experiments can play in characterizing very weak molecular complexes that remain invisible to most biophysical approaches.
Charting the Replica Symmetric Phase

NASA Astrophysics Data System (ADS)

Coja-Oghlan, Amin; Efthymiou, Charilaos; Jaafari, Nor; Kang, Mihyun; Kapetanopoulos, Tobias

2018-02-01

Diluted mean-field models are spin systems whose geometry of interactions is induced by a sparse random graph or hypergraph. Such models play an eminent role in the statistical mechanics of disordered systems as well as in combinatorics and computer science. In a path-breaking paper based on the non-rigorous `cavity method', physicists predicted not only the existence of a replica symmetry breaking phase transition in such models but also sketched a detailed picture of the evolution of the Gibbs measure within the replica symmetric phase and its impact on important problems in combinatorics, computer science and physics (Krzakala et al. in Proc Natl Acad Sci 104:10318-10323, 2007). In this paper we rigorise this picture completely for a broad class of models, encompassing the Potts antiferromagnet on the random graph, the k-XORSAT model and the diluted k-spin model for even k. We also prove a conjecture about the detection problem in the stochastic block model that has received considerable attention (Decelle et al. in Phys Rev E 84:066106, 2011).
Non-axisymmetric Flows and Transport in the Edge of MST

NASA Astrophysics Data System (ADS)

Miller, Matthew Charles

Magnetic reconnection occurs in plasmas all throughout the universe and is responsible for spectacular and perplexing phenomena. In the Madison Symmetric Torus (MST) reversed field pinch (RFP), reconnection occurs as quasi-periodic bursts of tearing instabilities (saw-teeth), which give rise to a number of processes that affect the RFP's global behavior and confinement. This work examines the structure of turbulent plasma flow in the edge region and its role in affecting momentum and particle transport through the use of several insertable probes and novel ensemble techniques. Very few measurements exist of tearing mode flow structures. The flow structure has now been measured for m = 0 modes and is in good agreement with theoretical expectations for nonlinear resistive MHD calculated for the RFP using DEBS and NIMROD. The flows are predicted and measured to be different than the classical Sweet-Parker picture with symmetric inward flows. The flow fluctuations have a profound effect on momentum transport, which is trans- ported rapidly at the crash. This work advances the understanding of this process by measuring the Reynolds stress associated with turbulent flow. Combined with measurements of the Maxwell stress, a new picture for magnetic self-organization in the RFP via two-fluid physics has emerged. The Reynolds and Maxwell stresses are measured to be an order of magnitude larger than the rate of change in inertia but oppositely directed such that they almost cancel. Two-fluid effects are significant because of the relationship be- tween the Maxwell stress and the Hall dynamo, a term only existing in two-fluid theories. This relationship inextricably couples the momentum dynamics with the current dynamics. Indeed, the parallel momentum profile exhibits a relaxation at the crash akin to the relaxation seen in the parallel current density profile. Tearing modes also drive particle transport. Fluctuation-induced particle flux is resolved through a crash by measuring it directly as < neur>. The flux increases dramatically during a crash and is non-axisymmetric. Between crashes, the transport from tearing is small, which agrees with previous measurements that identified electrostatic transport as dominant at that time.
Should ground-motion records be rotated to fault-normal/parallel or maximum direction for response history analysis of buildings?

USGS Publications Warehouse

Reyes, Juan C.; Kalkan, Erol

2012-01-01

In the United States, regulatory seismic codes (for example, California Building Code) require at least two sets of horizontal ground-motion components for three-dimensional (3D) response history analysis (RHA) of building structures. For sites within 5 kilometers (3.1 miles) of an active fault, these records should be rotated to fault-normal and fault-parallel (FN/FP) directions, and two RHAs should be performed separately—when FN and then FP direction are aligned with transverse direction of the building axes. This approach is assumed to lead to two sets of responses that envelope the range of possible responses over all nonredundant rotation angles. The validity of this assumption is examined here using 3D computer models of single-story structures having symmetric (torsionally stiff) and asymmetric (torsionally flexible) layouts subjected to an ensemble of near-fault ground motions with and without apparent velocity pulses. In this parametric study, the elastic vibration period is varied from 0.2 to 5 seconds, and yield-strength reduction factors, R, are varied from a value that leads to linear-elastic design to 3 and 5. Further validations are performed using 3D computer models of 9-story structures having symmetric and asymmetric layouts subjected to the same ground-motion set. The influence of the ground-motion rotation angle on several engineering demand parameters (EDPs) is examined in both linear-elastic and nonlinear-inelastic domains to form benchmarks for evaluating the use of the FN/FP directions and also the maximum direction (MD). The MD ground motion is a new definition for horizontal ground motions for use in site-specific ground-motion procedures for seismic design according to provisions of the American Society of Civil Engineers/Seismic Engineering Institute (ASCE/SEI) 7-10. The results of this study have important implications for current practice, suggesting that ground motions rotated to MD or FN/FP directions do not necessarily provide the most critical EDPs in nonlinear-inelastic domain; however, they tend to produce larger EDPs than as-recorded (arbitrarily oriented) motions.
Dynamics and statics of nonaxisymmetric and symmetric liquid bridges

NASA Technical Reports Server (NTRS)

Alexander, J. Iwan D.; Resnick, Andrew H.; Kaukler, William F.; Zhang, Yiqiang

1994-01-01

This program of theoretical and experimental ground-based research focuses on the understanding of the dynamics and stability limits of nonaxisymmetric and symmetric liquid bridges. There are three basic objectives: First, to determine the stability limits of nonaxisymmetric liquid bridges held between non-coaxial parallel disks, Second, to examine the dynamics of nonaxisymmetric bridges and nonaxisymmetric oscillations of initially axisymmetric bridges. The third objective is to experimentally investigate the vibration sensitivity of liquid bridges under terrestrial and low gravity conditions. Some of these experiments will require a low gravity environment and the ground-based research will culminate in a definitive flight experiment.
A radial basis function Galerkin method for inhomogeneous nonlocal diffusion

DOE PAGES

Lehoucq, Richard B.; Rowe, Stephen T.

2016-02-01

We introduce a discretization for a nonlocal diffusion problem using a localized basis of radial basis functions. The stiffness matrix entries are assembled by a special quadrature routine unique to the localized basis. Combining the quadrature method with the localized basis produces a well-conditioned, sparse, symmetric positive definite stiffness matrix. We demonstrate that both the continuum and discrete problems are well-posed and present numerical results for the convergence behavior of the radial basis function method. As a result, we explore approximating the solution to anisotropic differential equations by solving anisotropic nonlocal integral equations using the radial basis function method.
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics

NASA Astrophysics Data System (ADS)

Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.

2003-09-01

Multiconjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wave-front control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10-2 Hz, i.e., 4-5 orders of magnitude lower than the typical 103 Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
Adaptive structured dictionary learning for image fusion based on group-sparse-representation

NASA Astrophysics Data System (ADS)

Yang, Jiajie; Sun, Bin; Luo, Chengwei; Wu, Yuzhong; Xu, Limei

2018-04-01

Dictionary learning is the key process of sparse representation which is one of the most widely used image representation theories in image fusion. The existing dictionary learning method does not use the group structure information and the sparse coefficients well. In this paper, we propose a new adaptive structured dictionary learning algorithm and a l1-norm maximum fusion rule that innovatively utilizes grouped sparse coefficients to merge the images. In the dictionary learning algorithm, we do not need prior knowledge about any group structure of the dictionary. By using the characteristics of the dictionary in expressing the signal, our algorithm can automatically find the desired potential structure information that hidden in the dictionary. The fusion rule takes the physical meaning of the group structure dictionary, and makes activity-level judgement on the structure information when the images are being merged. Therefore, the fused image can retain more significant information. Comparisons have been made with several state-of-the-art dictionary learning methods and fusion rules. The experimental results demonstrate that, the dictionary learning algorithm and the fusion rule both outperform others in terms of several objective evaluation metrics.
Enhanced directional second harmonic radiation via nonlinear interference in 1D metamaterials

NASA Astrophysics Data System (ADS)

Guo, B. S.; Loo, Y. L.; Zhao, Q.; Ong, C. K.

2018-06-01

By using a one-dimensional nonlinear metamaterial in the experiment, we achieve a directional second harmonic radiation via nonlinear interference at approximately 2.5 GHz. Each meta-atom has the structure of coupled split-ring resonators and two varactors arranged parallel (symmetric) or antiparallel (antisymmetric) to each other. With an incident power of approximately ‑2.7 dBm, the power of the emitted directional wave from the sample is at the scale of nanowatt. This relatively high magnitude of directional nonlinear power is the result of the 1D metamaterial abilities in exhibiting nonlinear magnetoelectric coupling, as well as supporting an electric dipole or magnetic dipole resonance within a narrow second harmonic frequency range.
An improved integrally formed radio frequency quadrupole

DOEpatents

Abbott, S.R.

1987-10-05

An improved radio frequency quadrupole is provided having an elongate housing with an elongate central axis and top, bottom and two side walls symmetrically disposed about the axis, and vanes formed integrally with the walls, the vanes each having a cross-section at right angles to the central axis which tapers inwardly toward the axis to form electrode tips spaced from each other by predetermined distances. Each of the four walls, and the vanes integral therewith, is a separate structural element having a central lengthwise plane passing through the tip of the vane, the walls having flat mounting surfaces at right angles to and parallel to the control plane, respectively, which are butted together to position the walls and vane tips relative to each other. 4 figs.
Ultrafast laser-induced birefringence in various porosity silica glasses: from fused silica to aerogel.

PubMed

Cerkauskaite, Ausra; Drevinskas, Rokas; Rybaltovskii, Alexey O; Kazansky, Peter G

2017-04-03

We compare a femtosecond laser induced modification in silica matrices with three different degrees of porosity. In single pulse regime, the decrease of substrate density from fused silica to high-silica porous glass and to silica aerogel glass results in tenfold increase of laser affected region with the formation of a symmetric cavity surrounded by the compressed silica shell with pearl like structures. In multi-pulse regime, if the cavity produced by the first pulse is relatively large, the subsequent pulses do not cause further modifications. If not, the transition from void to the anisotropic structure with the optical axis oriented parallel to the incident polarization is observed. The maximum retardance value achieved in porous glass is twofold higher than in fused silica, and tenfold greater than in aerogel. The polarization sensitive structuring in porous glass by two pulses of ultrafast laser irradiation is demonstrated, as well as no observable stress is generated at any conditions.
Ferroelectric order in liquid crystal phases of polar disk-shaped ellipsoids

NASA Astrophysics Data System (ADS)

Bose, Tushar Kanti; Saha, Jayashree

2014-05-01

The demonstration of a spontaneous macroscopic ferroelectric order in liquid phases in the absence of any long range positional order is considered an outstanding problem of both fundamental and technological interest. Recently, we reported that a system of polar achiral disklike ellipsoids can spontaneously exhibit a long searched ferroelectric nematic phase and a ferroelectric columnar phase with strong axial polarization. The major role is played by the dipolar interactions. The model system of interest consists of attractive-repulsive Gay-Berne oblate ellipsoids embedded with two parallel point dipoles positioned symmetrically on the equatorial plane of the ellipsoids. In the present work, we investigate in detail the profound effects of changing the separation between the two symmetrically placed dipoles and the strength of the dipoles upon the existence of different ferroelectric discotic liquid crystal phases via extensive off-lattice N-P-T Monte Carlo simulations. Ferroelectric biaxial phases are exhibited in addition to the uniaxial ferroelectric fluids where the phase biaxiality results from the dipolar interactions. The structures of all the ferroelectric configurations of interest are presented in detail. Simple phase diagrams are determined which include different polar and apolar discotic fluids generated by the system.
FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

NASA Astrophysics Data System (ADS)

Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Exploring symmetry as an avenue to the computational design of large protein domains.

PubMed

Fortenberry, Carie; Bowman, Elizabeth Anne; Proffitt, Will; Dorr, Brent; Combs, Steven; Harp, Joel; Mizoue, Laura; Meiler, Jens

2011-11-16

It has been demonstrated previously that symmetric, homodimeric proteins are energetically favored, which explains their abundance in nature. It has been proposed that such symmetric homodimers underwent gene duplication and fusion to evolve into protein topologies that have a symmetric arrangement of secondary structure elements--"symmetric superfolds". Here, the ROSETTA protein design software was used to computationally engineer a perfectly symmetric variant of imidazole glycerol phosphate synthase and its corresponding symmetric homodimer. The new protein, termed FLR, adopts the symmetric (βα)(8) TIM-barrel superfold. The protein is soluble and monomeric and exhibits two-fold symmetry not only in the arrangement of secondary structure elements but also in sequence and at atomic detail, as verified by crystallography. When cut in half, FLR dimerizes readily to form the symmetric homodimer. The successful computational design of FLR demonstrates progress in our understanding of the underlying principles of protein stability and presents an attractive strategy for the in silico construction of larger protein domains from smaller pieces.

Incorporating biological information in sparse principal component analysis with application to genomic data.

PubMed

Li, Ziyi; Safo, Sandra E; Long, Qi

2017-07-11

Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.
Recursive Factorization of the Inverse Overlap Matrix in Linear-Scaling Quantum Molecular Dynamics Simulations.

PubMed

Negre, Christian F A; Mniszewski, Susan M; Cawkwell, Marc J; Bock, Nicolas; Wall, Michael E; Niklasson, Anders M N

2016-07-12

We present a reduced complexity algorithm to compute the inverse overlap factors required to solve the generalized eigenvalue problem in a quantum-based molecular dynamics (MD) simulation. Our method is based on the recursive, iterative refinement of an initial guess of Z (inverse square root of the overlap matrix S). The initial guess of Z is obtained beforehand by using either an approximate divide-and-conquer technique or dynamical methods, propagated within an extended Lagrangian dynamics from previous MD time steps. With this formulation, we achieve long-term stability and energy conservation even under the incomplete, approximate, iterative refinement of Z. Linear-scaling performance is obtained using numerically thresholded sparse matrix algebra based on the ELLPACK-R sparse matrix data format, which also enables efficient shared-memory parallelization. As we show in this article using self-consistent density-functional-based tight-binding MD, our approach is faster than conventional methods based on the diagonalization of overlap matrix S for systems as small as a few hundred atoms, substantially accelerating quantum-based simulations even for molecular structures of intermediate size. For a 4158-atom water-solvated polyalanine system, we find an average speedup factor of 122 for the computation of Z in each MD step.
Recursive Factorization of the Inverse Overlap Matrix in Linear Scaling Quantum Molecular Dynamics Simulations

DOE PAGES

Negre, Christian F. A; Mniszewski, Susan M.; Cawkwell, Marc Jon; ...

2016-06-06

We present a reduced complexity algorithm to compute the inverse overlap factors required to solve the generalized eigenvalue problem in a quantum-based molecular dynamics (MD) simulation. Our method is based on the recursive iterative re nement of an initial guess Z of the inverse overlap matrix S. The initial guess of Z is obtained beforehand either by using an approximate divide and conquer technique or dynamically, propagated within an extended Lagrangian dynamics from previous MD time steps. With this formulation, we achieve long-term stability and energy conservation even under incomplete approximate iterative re nement of Z. Linear scaling performance ismore » obtained using numerically thresholded sparse matrix algebra based on the ELLPACK-R sparse matrix data format, which also enables e cient shared memory parallelization. As we show in this article using selfconsistent density functional based tight-binding MD, our approach is faster than conventional methods based on the direct diagonalization of the overlap matrix S for systems as small as a few hundred atoms, substantially accelerating quantum-based simulations even for molecular structures of intermediate size. For a 4,158 atom water-solvated polyalanine system we nd an average speedup factor of 122 for the computation of Z in each MD step.« less
Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes

DOEpatents

Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey

2010-10-05

In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.
FOLDER: A numerical tool to simulate the development of structures in layered media

NASA Astrophysics Data System (ADS)

Adamuszek, Marta; Dabrowski, Marcin; Schmid, Daniel W.

2015-04-01

FOLDER is a numerical toolbox for modelling deformation in layered media during layer parallel shortening or extension in two dimensions. FOLDER builds on MILAMIN [1], a finite element method based mechanical solver, with a range of utilities included from the MUTILS package [2]. Numerical mesh is generated using the Triangle software [3]. The toolbox includes features that allow for: 1) designing complex structures such as multi-layer stacks, 2) accurately simulating large-strain deformation of linear and non-linear viscous materials, 3) post-processing of various physical fields such as velocity (total and perturbing), rate of deformation, finite strain, stress, deviatoric stress, pressure, apparent viscosity. FOLDER is designed to ensure maximum flexibility to configure model geometry, define material parameters, specify range of numerical parameters in simulations and choose the plotting options. FOLDER is an open source MATLAB application and comes with a user friendly graphical interface. The toolbox additionally comprises an educational application that illustrates various analytical solutions of growth rates calculated for the cases of folding and necking of a single layer with interfaces perturbed with a single sinusoidal waveform. We further derive two novel analytical expressions for the growth rate in the cases of folding and necking of a linear viscous layer embedded in a linear viscous medium of a finite thickness. We use FOLDER to test the accuracy of single-layer folding simulations using various 1) spatial and temporal resolutions, 2) time integration schemes, and 3) iterative algorithms for non-linear materials. The accuracy of the numerical results is quantified by: 1) comparing them to analytical solution, if available, or 2) running convergence tests. As a result, we provide a map of the most optimal choice of grid size, time step, and number of iterations to keep the results of the numerical simulations below a given error for a given time integration scheme. We also demonstrate that Euler and Leapfrog time integration schemes are not recommended for any practical use. Finally, the capabilities of the toolbox are illustrated based on two examples: 1) shortening of a synthetic multi-layer sequence and 2) extension of a folded quartz vein embedded in phyllite from Sprague Upper Reservoir (example discussed by Sherwin and Chapple [4]). The latter example demonstrates that FOLDER can be successfully used for reverse modelling and mechanical restoration. [1] Dabrowski, M., Krotkiewski, M., and Schmid, D. W., 2008, MILAMIN: MATLAB-based finite element method solver for large problems. Geochemistry Geophysics Geosystems, vol. 9. [2] Krotkiewski, M. and Dabrowski M., 2010 Parallel symmetric sparse matrix-vector product on scalar multi-core cpus. Parallel Computing, 36(4):181-198 [3] Shewchuk, J. R., 1996, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator, In: Applied Computational Geometry: Towards Geometric Engineering'' (Ming C. Lin and Dinesh Manocha, editors), Vol. 1148 of Lecture Notes in Computer Science, pp. 203-222, Springer-Verlag, Berlin [4] Sherwin, J.A., Chapple, W.M., 1968. Wavelengths of single layer folds - a Comparison between theory and Observation. American Journal of Science 266 (3), p. 167-179
Atomic scale structure and chemistry of interfaces by Z-contrast imaging and electron energy loss spectroscopy in the stem

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGibbon, M.M.; Browning, N.D.; Chisholm, M.F.

The macroscopic properties of many materials are controlled by the structure and chemistry at grain boundaries. A basic understanding of the structure-property relationship requires a technique which probes both composition and chemical bonding on an atomic scale. High-resolution Z-contrast imaging in the scanning transmission electron microscope (STEM) forms an incoherent image in which changes in atomic structure and composition across an interface can be interpreted directly without the need for preconceived atomic structure models. Since the Z-contrast image is formed by electrons scattered through high angles, parallel detection electron energy loss spectroscopy (PEELS) can be used simultaneously to provide complementarymore » chemical information on an atomic scale. The fine structure in the PEEL spectra can be used to investigate the local electronic structure and the nature of the bonding across the interface. In this paper we use the complimentary techniques of high resolution Z-contrast imaging and PEELS to investigate the atomic structure and chemistry of a 25{degree} symmetric tilt boundary in a bicrystal of the electroceramic SrTiO{sub 3}.« less
Measurements of the momentum and current transport from tearing instability in the Madison Symmetric Torus reversed-field pinch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuritsyn, A.; Fiksel, G.; Almagri, A. F.

2009-05-15

In this paper measurements of momentum and current transport caused by current driven tearing instability are reported. The measurements are done in the Madison Symmetric Torus reversed-field pinch [R. N. Dexter, D. W. Kerst, T. W. Lovell, S. C. Prager, and J. C. Sprott, Fusion Technol. 19, 131 (1991)] in a regime with repetitive bursts of tearing instability causing magnetic field reconnection. It is established that the plasma parallel momentum profile flattens during these reconnection events: The flow decreases in the core and increases at the edge. The momentum relaxation phenomenon is similar in nature to the well established relaxationmore » of the parallel electrical current and could be a general feature of self-organized systems. The measured fluctuation-induced Maxwell and Reynolds stresses, which govern the dynamics of plasma flow, are large and almost balance each other such that their difference is approximately equal to the rate of change of plasma momentum. The Hall dynamo, which is directly related to the Maxwell stress, drives the parallel current profile relaxation at resonant surfaces at the reconnection events. These results qualitatively agree with analytical calculations and numerical simulations. It is plausible that current-driven instabilities can be responsible for momentum transport in other laboratory and astrophysical plasmas.« less
Structured sparse linear graph embedding.

PubMed

Wang, Haixian

2012-03-01

Subspace learning is a core issue in pattern recognition and machine learning. Linear graph embedding (LGE) is a general framework for subspace learning. In this paper, we propose a structured sparse extension to LGE (SSLGE) by introducing a structured sparsity-inducing norm into LGE. Specifically, SSLGE casts the projection bases learning into a regression-type optimization problem, and then the structured sparsity regularization is applied to the regression coefficients. The regularization selects a subset of features and meanwhile encodes high-order information reflecting a priori structure information of the data. The SSLGE technique provides a unified framework for discovering structured sparse subspace. Computationally, by using a variational equality and the Procrustes transformation, SSLGE is efficiently solved with closed-form updates. Experimental results on face image show the effectiveness of the proposed method. Copyright © 2011 Elsevier Ltd. All rights reserved.
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures

DTIC Science & Technology

2008-10-01

42 3.4 Residual history of WSO banded preconditioner for problem 2D 54019 HIGHK . . . . . . . . . . . . . . . . . . . . . . . . . . 43...3.5 Residual history of WSO banded preconditioner for problem Appu 43 3.6 Residual history of WSO banded preconditioner for problem ASIC 680k...44 3.7 Residual history of WSO banded preconditioner for problem BUN- DLE1
BCYCLIC: A parallel block tridiagonal matrix cyclic solver

NASA Astrophysics Data System (ADS)

Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.

2010-09-01

A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
WE-G-18A-04: 3D Dictionary Learning Based Statistical Iterative Reconstruction for Low-Dose Cone Beam CT Imaging

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bai, T; UT Southwestern Medical Center, Dallas, TX; Yan, H

2014-06-15

Purpose: To develop a 3D dictionary learning based statistical reconstruction algorithm on graphic processing units (GPU), to improve the quality of low-dose cone beam CT (CBCT) imaging with high efficiency. Methods: A 3D dictionary containing 256 small volumes (atoms) of 3x3x3 voxels was trained from a high quality volume image. During reconstruction, we utilized a Cholesky decomposition based orthogonal matching pursuit algorithm to find a sparse representation on this dictionary basis of each patch in the reconstructed image, in order to regularize the image quality. To accelerate the time-consuming sparse coding in the 3D case, we implemented our algorithm inmore » a parallel fashion by taking advantage of the tremendous computational power of GPU. Evaluations are performed based on a head-neck patient case. FDK reconstruction with full dataset of 364 projections is used as the reference. We compared the proposed 3D dictionary learning based method with a tight frame (TF) based one using a subset data of 121 projections. The image qualities under different resolutions in z-direction, with or without statistical weighting are also studied. Results: Compared to the TF-based CBCT reconstruction, our experiments indicated that 3D dictionary learning based CBCT reconstruction is able to recover finer structures, to remove more streaking artifacts, and is less susceptible to blocky artifacts. It is also observed that statistical reconstruction approach is sensitive to inconsistency between the forward and backward projection operations in parallel computing. Using high a spatial resolution along z direction helps improving the algorithm robustness. Conclusion: 3D dictionary learning based CBCT reconstruction algorithm is able to sense the structural information while suppressing noise, and hence to achieve high quality reconstruction. The GPU realization of the whole algorithm offers a significant efficiency enhancement, making this algorithm more feasible for potential clinical application. A high zresolution is preferred to stabilize statistical iterative reconstruction. This work was supported in part by NIH(1R01CA154747-01), NSFC((No. 61172163), Research Fund for the Doctoral Program of Higher Education of China (No. 20110201110011), China Scholarship Council.« less
Parallel Grid Manipulations in Earth Science Calculations

NASA Technical Reports Server (NTRS)

Sawyer, W.; Lucchesi, R.; daSilva, A.; Takacs, L. L.

1999-01-01

The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center is moving its data assimilation system to massively parallel computing platforms. This parallel implementation of GEOS DAS will be used in the DAO's normal activities, which include reanalysis of data, and operational support for flight missions. Key components of GEOS DAS, including the gridpoint-based general circulation model and a data analysis system, are currently being parallelized. The parallelization of GEOS DAS is also one of the HPCC Grand Challenge Projects. The GEOS-DAS software employs several distinct grids. Some examples are: an observation grid- an unstructured grid of points at which observed or measured physical quantities from instruments or satellites are associated- a highly-structured latitude-longitude grid of points spanning the earth at given latitude-longitude coordinates at which prognostic quantities are determined, and a computational lat-lon grid in which the pole has been moved to a different location to avoid computational instabilities. Each of these grids has a different structure and number of constituent points. In spite of that, there are numerous interactions between the grids, e.g., values on one grid must be interpolated to another, or, in other cases, grids need to be redistributed on the underlying parallel platform. The DAO has designed a parallel integrated library for grid manipulations (PILGRIM) to support the needed grid interactions with maximum efficiency. It offers a flexible interface to generate new grids, define transformations between grids and apply them. Basic communication is currently MPI, however the interfaces defined here could conceivably be implemented with other message-passing libraries, e.g., Cray SHMEM, or with shared-memory constructs. The library is written in Fortran 90. First performance results indicate that even difficult problems, such as above-mentioned pole rotation- a sparse interpolation with little data locality between the physical lat-lon grid and a pole rotated computational grid- can be solved efficiently and at the GFlop/s rates needed to solve tomorrow's high resolution earth science models. In the subsequent presentation we will discuss the design and implementation of PILGRIM as well as a number of the problems it is required to solve. Some conclusions will be drawn about the potential performance of the overall earth science models on the supercomputer platforms foreseen for these problems.
Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.

PubMed

Safo, Sandra E; Li, Shuzhao; Long, Qi

2018-03-01

Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach. © 2017, The International Biometric Society.
Butterfly scale form birefringence related to photonics.

PubMed

Vidal, Benedicto de Campos

2011-12-01

Wings of the butterflies Morpho aega and Eryphanis reevesi were investigated in the present study by fluorescence, polarization and infra-red (IR) spectroscopic microscopy with the aim of identifying the oriented organization of their components and morphological details of their substructures. These wings were found to exhibit a strong iridescent glow depending on the angle of the incident light; their isolated scales exhibited blue fluorescence. Parallel columns or ridges extend from the pad and sockets to the dented apical scale's region, and they are perpendicular to the ribs that connect the columnar ridges. The scales reveal linear dichroism (LD) visually, when attached on the wing matrix or isolated on slides. The LD was inferred to be textural and positive and was also demonstrated with IR microscopy. The scale columns and ribs are birefringent structures. Images obtained before and after birefringence compensation allowed a detailed study of the scale morphology. Form and intrinsic birefringence findings here estimated and discussed in the context of nonlinear optical properties, bring to the level of morphology the state of molecular order and periodicity of the wing structure. FT-IR absorption peaks were found at wavenumbers which correspond to symmetric and asymmetric (-N-H) stretching, symmetric (-C-H) stretching, amide I (-CO) stretching, amide II(-N-H), and β-linking. Based on LD results obtained with polarized IR the molecular vibrations of the wing scales of M. aega and E. reevesi are assumed to be oriented with respect to the long axis of these structures. Copyright © 2011 Elsevier Ltd. All rights reserved.
Two-level structural sparsity regularization for identifying lattices and defects in noisy images

DOE PAGES

Li, Xin; Belianinov, Alex; Dyck, Ondrej E.; ...

2018-03-09

Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less
Structural organization of the gynoecium and pollen tube path in Himalayan sea buckthorn, Hippophae rhamnoides (Elaeagnaceae)

PubMed Central

Mangla, Yash; Tandon, Rajesh; Goel, Shailendra; Raina, S. N.

2013-01-01

Closure of carpels or angiospermy, a key developmental innovation, has been accomplished through different ontogenic routes among the flowering plants. The mechanism of angiospermy produces structural novelties in the gynoecium, which in turn affects the progamic phase. In this paper, we present the structural details of the gynoecium and functional attributes of the progamic phase of Hippophae rhamnoides, a dioecious species of Elaeagnaceae. The gynoecium is unicarpellate, and the carpel is dorsiventrally symmetric and conduplicate. The pollen tube path comprises a prominent, ventrally localized dry and non-papillate stigma, a pseudostyle and a dorsally protruded superior ovary. The pollen tube path in the stigmatic region is subdermal, and from the pseudostyle onwards, it resides over the epidermis of conduplicated margins. The epidermal cells along this region are secretory but produce sparse extracellular matrix. The tube approaches the solitary ovule through a tiny conduit in the carpel, the ventral pore. The duration of the entire progamic phase is ∼72 h. The observed mean pollen tube length from stigma to ovule was 908.13 ± 180 µm and the mean tube growth rate was 18.75 µm h−1. The study demonstrates that sea buckthorn, a core eudicot, has a simple gynoecium with a pollen tube pathway that incorporates elements of both completely externalized and internalized compitum.
Two-level structural sparsity regularization for identifying lattices and defects in noisy images

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Xin; Belianinov, Alex; Dyck, Ondrej E.

Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less
Controlled Growth of Parallel Oriented ZnO Nanostructural Arrays on Ga2O3 Nanowires

DTIC Science & Technology

2008-11-01

Controlled Growth of Parallel Oriented ZnO Nanostructural Arrays on Ga2O3 Nanowires Lena Mazeina,* Yoosuf N. Picard, and Sharka M. Prokes Electronics...Manuscript ReceiVed NoVember 6, 2008 ABSTRACT: Novel hierarchical ZnO- Ga2O3 nanostructures were fabricated via a two stage growth process. Nanowires of Ga2O3 ...nanobrushes (NBs) with Ga2O3 as the core and ZnO as the branches self-assembling symmetrically in six equiangular directions around the core
Ultra-small-angle neutron scattering with azimuthal asymmetry

DOE PAGES

Gu, X.; Mildner, D. F. R.

2016-05-16

Small-angle neutron scattering (SANS) measurements from thin sections of rock samples such as shales demand as great a scattering vector range as possible because the pores cover a wide range of sizes. The limitation of the scattering vector range for pinhole SANS requires slit-smeared ultra-SANS (USANS) measurements that need to be converted to pinhole geometry. The desmearing algorithm is only successful for azimuthally symmetric data. Scattering from samples cut parallel to the plane of bedding is symmetric, exhibiting circular contours on a two-dimensional detector. Samples cut perpendicular to the bedding show elliptically dependent contours with the long axis corresponding tomore » the normal to the bedding plane. A method is given for converting such asymmetric data collected on a double-crystal diffractometer for concatenation with the usual pinhole-geometry SANS data. Furthermore, the aspect ratio from the SANS data is used to modify the slit-smeared USANS data to produce quasi-symmetric contours. Rotation of the sample about the incident beam may result in symmetric data but cannot extract the same information as obtained from pinhole geometry.« less

Ultra-small-angle neutron scattering with azimuthal asymmetry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gu, X.; Mildner, D. F. R.

Small-angle neutron scattering (SANS) measurements from thin sections of rock samples such as shales demand as great a scattering vector range as possible because the pores cover a wide range of sizes. The limitation of the scattering vector range for pinhole SANS requires slit-smeared ultra-SANS (USANS) measurements that need to be converted to pinhole geometry. The desmearing algorithm is only successful for azimuthally symmetric data. Scattering from samples cut parallel to the plane of bedding is symmetric, exhibiting circular contours on a two-dimensional detector. Samples cut perpendicular to the bedding show elliptically dependent contours with the long axis corresponding tomore » the normal to the bedding plane. A method is given for converting such asymmetric data collected on a double-crystal diffractometer for concatenation with the usual pinhole-geometry SANS data. Furthermore, the aspect ratio from the SANS data is used to modify the slit-smeared USANS data to produce quasi-symmetric contours. Rotation of the sample about the incident beam may result in symmetric data but cannot extract the same information as obtained from pinhole geometry.« less
Resonance-dependent extraordinary reflection and transmission in PC-symmetric layered structure

NASA Astrophysics Data System (ADS)

Fang, Yun-tuan; Zhang, Yi-chi; Wang, Ji-Jun

2018-01-01

In order to achieve controllable enhanced reflection and transmission in part-time (PT) symmetric systems, we combine a cavity resonance effect with the layered PT-symmetric structure. At the resonance wavelength, except for the nonreciprocal extraordinary reflection, an enhanced transmission is also obtained. Both the extraordinary reflectance and transmittance are dependent on the modulation depth and period number in a discrete form.
New computing systems and their impact on structural analysis and design

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1989-01-01

A review is given of the recent advances in computer technology that are likely to impact structural analysis and design. The computational needs for future structures technology are described. The characteristics of new and projected computing systems are summarized. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed, and a novel partitioning strategy is outlined for maximizing the degree of parallelism. The strategy is designed for computers with a shared memory and a small number of powerful processors (or a small number of clusters of medium-range processors). It is based on approximating the response of the structure by a combination of symmetric and antisymmetric response vectors, each obtained using a fraction of the degrees of freedom of the original finite element model. The strategy was implemented on the CRAY X-MP/4 and the Alliant FX/8 computers. For nonlinear dynamic problems on the CRAY X-MP with four CPUs, it resulted in an order of magnitude reduction in total analysis time, compared with the direct analysis on a single-CPU CRAY X-MP machine.
Compact broadband polarization beam splitter using a symmetric directional coupler with sinusoidal bends.

PubMed

Zhang, Fan; Yun, Han; Wang, Yun; Lu, Zeqin; Chrostowski, Lukas; Jaeger, Nicolas A F

2017-01-15

We design and demonstrate a compact broadband polarization beam splitter (PBS) using a symmetric directional coupler with sinusoidal bends on a silicon-on-insulator platform. The sinusoidal bends in our PBS suppress the power exchange between two parallel symmetric strip waveguides for the transverse-electric (TE) mode, while allowing for the maximum power transfer to the adjacent waveguide for the transverse-magnetic (TM) mode. Our PBS has a nominal coupler length of 8.55 μm, and it has an average extinction ratio (ER) of 12.0 dB for the TE mode, an average ER of 20.1 dB for the TM mode, an average polarization isolation (PI) of 20.6 dB for the through port, and an average PI of 11.5 dB for the cross port, all over a bandwidth of 100 nm.
Coherent Backscattering in the Cross-Polarized Channel

NASA Technical Reports Server (NTRS)

Mischenko, Michael I.; Mackowski, Daniel W.

2011-01-01

We analyze the asymptotic behavior of the cross-polarized enhancement factor in the framework of the standard low-packing-density theory of coherent backscattering by discrete random media composed of spherically symmetric particles. It is shown that if the particles are strongly absorbing or if the smallest optical dimension of the particulate medium (i.e., the optical thickness of a plane-parallel slab or the optical diameter of a spherically symmetric volume) approaches zero, then the cross-polarized enhancement factor tends to its upper-limit value 2. This theoretical prediction is illustrated using direct computer solutions of the Maxwell equations for spherical volumes of discrete random medium.
MASPROP- MASS PROPERTIES OF A RIGID STRUCTURE

NASA Technical Reports Server (NTRS)

Hull, R. A.

1994-01-01

The computer program MASPROP was developed to rapidly calculate the mass properties of complex rigid structural systems. This program's basic premise is that complex systems can be adequately described by a combination of basic elementary structural shapes. Thirteen widely used basic structural shapes are available in this program. They are as follows: Discrete Mass, Cylinder, Truncated Cone, Torus, Beam (arbitrary cross section), Circular Rod (arbitrary cross section), Spherical Segment, Sphere, Hemisphere, Parallelepiped, Swept Trapezoidal Panel, Symmetric Trapezoidal Panels, and a Curved Rectangular Panel. MASPROP provides a designer with a simple technique that requires minimal input to calculate the mass properties of a complex rigid structure and should be useful in any situation where one needs to calculate the center of gravity and moments of inertia of a complex structure. Rigid body analysis is used to calculate mass properties. Mass properties are calculated about component axes that have been rotated to be parallel to the system coordinate axes. Then the system center of gravity is calculated and the mass properties are transferred to axes through the system center of gravity by using the parallel axis theorem. System weight, moments of inertia about the system origin, and the products of inertia about the system center of mass are calculated and printed. From the information about the system center of mass the principal axes of the system and the moments of inertia about them are calculated and printed. The only input required is simple geometric data describing the size and location of each element and the respective material density or weight of each element. This program is written in FORTRAN for execution on a CDC 6000 series computer with a central memory requirement of approximately 62K (octal) of 60 bit words. The development of this program was completed in 1978.
Self-organisation of an oligodeoxynucleotide containing the G- and C-rich stretches of the direct repeats of the human mitochondrial DNA.

PubMed

Nonin-Lecomte, Sylvie; Dardel, Frédéric; Lestienne, Patrick

2005-08-01

Stretches of cytosines and guanosines have been shown in vitro to adopt non-canonical structures known as i-motifs and G-quartets, respectively. When combined, such sequences are expected to either retain their structure or form duplexes or triple helices. All these structures may occur in vivo whenever the sequence criteria are met. Such stretches are present in the circular genome of human mitochondria, as two 10 nucleotide-long perfect tandem direct repeats (DR1 and DR2). The DR1 and DR2 repeats are G-rich on the heavy strand and C-rich on the light strand. Previous results suggested that during replication, transient formation of a parallel GGC triple helix between the neo-synthesised G-rich DR1 and the double-stranded homologous DR2 could be involved in a rearrangement process leading to genome instability. In order to get structural insights into the interaction between the two repeats, we have studied by nuclear magnetic resonance (NMR) the assembly properties of a 24-mer oligodeoxyribonucleotide in which the C- and G-rich segments of the DRs are covalently tethered by a TTTT linker. We show here that this 24-mer self-associates into a triplex-containing symmetrical tetramer. The core of the structure is composed of anti-parallel Watson-Crick (WC) base pairs. Two additional strands are hydrogen-bonded to the Hoogsteen side of the Gs, thus forming CGC(+) triple helices, with G-rich ends folding into G-quartets. These results suggest that such structures could occur when the two DRs are put to close proximity in a biological context.
Non-coaxial-based microwave ablation antennas for creating symmetric and asymmetric coagulation zones

NASA Astrophysics Data System (ADS)

Mohtashami, Yahya; Luyen, Hung; Hagness, Susan C.; Behdad, Nader

2018-06-01

We present an investigation of a new class of microwave ablation (MWA) antennas capable of producing axially symmetric or asymmetric heating patterns. The antenna design is based on a dipole fed by a balanced parallel-wire transmission line. The angle and direction of the deployed dipole arms are used to control the heating pattern. We analyzed the specific absorption rate and temperature profiles using electromagnetic and thermal simulations. Two prototypes were fabricated and tested in ex vivo ablation experiments: one was designed to produce symmetric heating patterns and the other was designed to generate asymmetric heating patterns. Both fabricated prototypes exhibited good impedance matching and produced localized coagulation zones as predicted by the simulations. The prototype operating in porcine muscle created an ˜10 cm3 symmetric ablation zone after 10 min of ablation with a power level of 18 W. The prototype operating in egg white created an ˜4 cm3 asymmetric ablation zone with a directionality ratio of 40% after 5 min of ablation with a power level of 25 W. The proposed MWA antenna design shows promise for minimally invasive treatment of tumors in various clinical scenarios where, depending on the situation, a symmetric or an asymmetric heating pattern may be needed.
Post-earthquake relaxation using a spectral element method: 2.5-D case

USGS Publications Warehouse

Pollitz, Fred

2014-01-01

The computation of quasi-static deformation for axisymmetric viscoelastic structures on a gravitating spherical earth is addressed using the spectral element method (SEM). A 2-D spectral element domain is defined with respect to spherical coordinates of radius and angular distance from a pole of symmetry, and 3-D viscoelastic structure is assumed to be azimuthally symmetric with respect to this pole. A point dislocation source that is periodic in azimuth is implemented with a truncated sequence of azimuthal order numbers. Viscoelasticity is limited to linear rheologies and is implemented with the correspondence principle in the Laplace transform domain. This leads to a series of decoupled 2-D problems which are solved with the SEM. Inverse Laplace transform of the independent 2-D solutions leads to the time-domain solution of the 3-D equations of quasi-static equilibrium imposed on a 2-D structure. The numerical procedure is verified through comparison with analytic solutions for finite faults embedded in a laterally homogeneous viscoelastic structure. This methodology is applicable to situations where the predominant structure varies in one horizontal direction, such as a structural contrast across (or parallel to) a long strike-slip fault.
Efficient parallel linear scaling construction of the density matrix for Born-Oppenheimer molecular dynamics.

PubMed

Mniszewski, S M; Cawkwell, M J; Wall, M E; Mohd-Yusof, J; Bock, N; Germann, T C; Niklasson, A M N

2015-10-13

We present an algorithm for the calculation of the density matrix that for insulators scales linearly with system size and parallelizes efficiently on multicore, shared memory platforms with small and controllable numerical errors. The algorithm is based on an implementation of the second-order spectral projection (SP2) algorithm [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] in sparse matrix algebra with the ELLPACK-R data format. We illustrate the performance of the algorithm within self-consistent tight binding theory by total energy calculations of gas phase poly(ethylene) molecules and periodic liquid water systems containing up to 15,000 atoms on up to 16 CPU cores. We consider algorithm-specific performance aspects, such as local vs nonlocal memory access and the degree of matrix sparsity. Comparisons to sparse matrix algebra implementations using off-the-shelf libraries on multicore CPUs, graphics processing units (GPUs), and the Intel many integrated core (MIC) architecture are also presented. The accuracy and stability of the algorithm are illustrated with long duration Born-Oppenheimer molecular dynamics simulations of 1000 water molecules and a 303 atom Trp cage protein solvated by 2682 water molecules.
Duality, phase structures, and dilemmas in symmetric quantum games

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ichikawa, Tsubasa; Tsutsui, Izumi

Symmetric quantum games for 2-player, 2-qubit strategies are analyzed in detail by using a scheme in which all pure states in the 2-qubit Hilbert space are utilized for strategies. We consider two different types of symmetric games exemplified by the familiar games, the Battle of the Sexes (BoS) and the Prisoners' Dilemma (PD). These two types of symmetric games are shown to be related by a duality map, which ensures that they share common phase structures with respect to the equilibria of the strategies. We find eight distinct phase structures possible for the symmetric games, which are determined by themore » classical payoff matrices from which the quantum games are defined. We also discuss the possibility of resolving the dilemmas in the classical BoS, PD, and the Stag Hunt (SH) game based on the phase structures obtained in the quantum games. It is observed that quantization cannot resolve the dilemma fully for the BoS, while it generically can for the PD and SH if appropriate correlations for the strategies of the players are provided.« less
An Object-Oriented Collection of Minimum Degree Algorithms: Design, Implementation, and Experiences

NASA Technical Reports Server (NTRS)

Kumfert, Gary; Pothen, Alex

1999-01-01

The multiple minimum degree (MMD) algorithm and its variants have enjoyed 20+ years of research and progress in generating fill-reducing orderings for sparse, symmetric positive definite matrices. Although conceptually simple, efficient implementations of these algorithms are deceptively complex and highly specialized. In this case study, we present an object-oriented library that implements several recent minimum degree-like algorithms. We discuss how object-oriented design forces us to decompose these algorithms in a different manner than earlier codes and demonstrate how this impacts the flexibility and efficiency of our C++ implementation. We compare the performance of our code against other implementations in C or Fortran.
Symmetrical group theory for mathematical complexity reduction of digital holograms

NASA Astrophysics Data System (ADS)

Perez-Ramirez, A.; Guerrero-Juk, J.; Sanchez-Lara, R.; Perez-Ramirez, M.; Rodriguez-Blanco, M. A.; May-Alarcon, M.

2017-10-01

This work presents the use of mathematical group theory through an algorithm to reduce the multiplicative computational complexity in the process of creating digital holograms. An object is considered as a set of point sources using mathematical symmetry properties of both the core in the Fresnel integral and the image, where the image is modeled using group theory. This algorithm has multiplicative complexity equal to zero and an additive complexity ( k - 1) × N for the case of sparse matrices and binary images, where k is the number of pixels other than zero and N is the total points in the image.
Implicit solvers for unstructured meshes

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Mavriplis, Dimitri J.

1991-01-01

Implicit methods for unstructured mesh computations are developed and tested. The approximate system which arises from the Newton-linearization of the nonlinear evolution operator is solved by using the preconditioned generalized minimum residual technique. These different preconditioners are investigated: the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over-relaxation (SSOR). The preconditioners have been optimized to have good vectorization properties. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also investigated. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
The effects of dication symmetry on ionic liquid electrolytes in supercapacitors.

PubMed

Li, Song; Zhu, Mengyang; Feng, Guang

2016-11-23

The effects of dication symmetry on the structure and capacitance of the electrical double layers (EDLs) of dicationic ionic liquids (DILs) near graphene electrodes were investigated by molecular dynamics (MD) simulation in this work. Symmetrical 1-hexyl-3-dimethylimidazolium di[bis(trifluoromethyl)imide]([C6(mim)2](Tf2N)2) and asymmetrical 1-(1-trimethylammonium-yl-hexyl)-3-methylimidazolium di[bis(trifluoro-methanesulfonyl)-imide] ([C6(tma)(mim)](Tf2N)2) were both employed. Radial distribution function (RDF) analysis of the two DILs revealed a shorter distance between the cation-anion pairs in symmetrical [C6(mim)2](Tf2N)2), which was attributed to the closely packed imidazolium ring-anion pairs. In contrast, the trimethylammonium head groups and anions exhibit a relatively longer distance, but a stronger correlation in asymmetrical [C6(tma)(mim)](Tf2N)2. In addition, it was illustrated that more symmetrical DIL ions in EDLs are distributed near graphite electrodes and exhibit closer distances to the electrode, which is most probably due to the parallel orientation of imidazolium rings, reducing the distance between the cation and the graphene. In contrast, asymmetrical DILs, with one trimethylammonium head group and one imidazolium ring in the dications, are loosely packed due to their tilting orientation near graphene surfaces. However, the capacitance-potential (C-V) curves of the two DILs are almost the same, regardless of the opposite sign of potential of zero charge (PZC), indicating the insignificant influence of dication symmetry on the capacitance of DIL-based supercapacitors.
The effects of dication symmetry on ionic liquid electrolytes in supercapacitors

NASA Astrophysics Data System (ADS)

Li, Song; Zhu, Mengyang; Feng, Guang

2016-11-01

The effects of dication symmetry on the structure and capacitance of the electrical double layers (EDLs) of dicationic ionic liquids (DILs) near graphene electrodes were investigated by molecular dynamics (MD) simulation in this work. Symmetrical 1-hexyl-3-dimethylimidazolium di[bis(trifluoromethyl)imide]([C6(mim)2](Tf2N)2) and asymmetrical 1-(1-trimethylammonium-yl-hexyl)-3-methylimidazolium di[bis(trifluoro-methanesulfonyl)-imide] ([C6(tma)(mim)](Tf2N)2) were both employed. Radial distribution function (RDF) analysis of the two DILs revealed a shorter distance between the cation-anion pairs in symmetrical [C6(mim)2](Tf2N)2), which was attributed to the closely packed imidazolium ring-anion pairs. In contrast, the trimethylammonium head groups and anions exhibit a relatively longer distance, but a stronger correlation in asymmetrical [C6(tma)(mim)](Tf2N)2. In addition, it was illustrated that more symmetrical DIL ions in EDLs are distributed near graphite electrodes and exhibit closer distances to the electrode, which is most probably due to the parallel orientation of imidazolium rings, reducing the distance between the cation and the graphene. In contrast, asymmetrical DILs, with one trimethylammonium head group and one imidazolium ring in the dications, are loosely packed due to their tilting orientation near graphene surfaces. However, the capacitance-potential (C-V) curves of the two DILs are almost the same, regardless of the opposite sign of potential of zero charge (PZC), indicating the insignificant influence of dication symmetry on the capacitance of DIL-based supercapacitors.
Solid-state NMR/NQR and first-principles study of two niobium halide cluster compounds.

PubMed

Perić, Berislav; Gautier, Régis; Pickard, Chris J; Bosiočić, Marko; Grbić, Mihael S; Požek, Miroslav

2014-01-01

Two hexanuclear niobium halide cluster compounds with a [Nb6X12](2+) (X=Cl, Br) diamagnetic cluster core, have been studied by a combination of experimental solid-state NMR/NQR techniques and PAW/GIPAW calculations. For niobium sites the NMR parameters were determined by using variable Bo field static broadband NMR measurements and additional NQR measurements. It was found that they possess large positive chemical shifts, contrary to majority of niobium compounds studied so far by solid-state NMR, but in accordance with chemical shifts of (95)Mo nuclei in structurally related compounds containing [Mo6Br8](4+) cluster cores. Experimentally determined δiso((93)Nb) values are in the range from 2,400 to 3,000 ppm. A detailed analysis of geometrical relations between computed electric field gradient (EFG) and chemical shift (CS) tensors with respect to structural features of cluster units was carried out. These tensors on niobium sites are almost axially symmetric with parallel orientation of the largest EFG and the smallest CS principal axes (Vzz and δ33) coinciding with the molecular four-fold axis of the [Nb6X12](2+) unit. Bridging halogen sites are characterized by large asymmetry of EFG and CS tensors, the largest EFG principal axis (Vzz) is perpendicular to the X-Nb bonds, while intermediate EFG principal axis (Vyy) and the largest CS principal axis (δ11) are oriented in the radial direction with respect to the center of the cluster unit. For more symmetrical bromide compound the PAW predictions for EFG parameters are in better correspondence with the NMR/NQR measurements than in the less symmetrical chlorine compound. Theoretically predicted NMR parameters of bridging halogen sites were checked by (79/81)Br NQR and (35)Cl solid-state NMR measurements. Copyright © 2014 Elsevier Inc. All rights reserved.
Architecture of interstitial nodal spaces in the rodent renal inner medulla.

PubMed

Gilbert, Rebecca L; Pannabecker, Thomas L

2013-09-01

Every collecting duct (CD) of the rat inner medulla is uniformly surrounded by about four abutting ascending vasa recta (AVR) running parallel to it. One or two ascending thin limbs (ATLs) lie between and parallel to each abutting AVR pair, opposite the CD. These structures form boundaries of axially running interstitial compartments. Viewed in transverse sections, these compartments appear as four interstitial nodal spaces (INSs) positioned symmetrically around each CD. The axially running compartments are segmented by interstitial cells spaced at regular intervals. The pairing of ATLs and CDs bounded by an abundant supply of AVR carrying reabsorbed water, NaCl, and urea make a strong argument that the mixing of NaCl and urea within the INSs and countercurrent flows play a critical role in generating the inner medullary osmotic gradient. The results of this study fully support that hypothesis. We quantified interactions of all structures comprising INSs along the corticopapillary axis for two rodent species, the Munich-Wistar rat and the kangaroo rat. The results showed remarkable similarities in the configurations of INSs, suggesting that the structural arrangement of INSs is a highly conserved architecture that plays a fundamental role in renal function. The number density of INSs along the corticopapillary axis directly correlated with a loop population that declines exponentially with distance below the outer medullary-inner medullary boundary. The axial configurations were consistent with discrete association between near-bend loop segments and INSs and with upper loop segments lying distant from INSs.
Architecture of interstitial nodal spaces in the rodent renal inner medulla

PubMed Central

Gilbert, Rebecca L.

2013-01-01

Every collecting duct (CD) of the rat inner medulla is uniformly surrounded by about four abutting ascending vasa recta (AVR) running parallel to it. One or two ascending thin limbs (ATLs) lie between and parallel to each abutting AVR pair, opposite the CD. These structures form boundaries of axially running interstitial compartments. Viewed in transverse sections, these compartments appear as four interstitial nodal spaces (INSs) positioned symmetrically around each CD. The axially running compartments are segmented by interstitial cells spaced at regular intervals. The pairing of ATLs and CDs bounded by an abundant supply of AVR carrying reabsorbed water, NaCl, and urea make a strong argument that the mixing of NaCl and urea within the INSs and countercurrent flows play a critical role in generating the inner medullary osmotic gradient. The results of this study fully support that hypothesis. We quantified interactions of all structures comprising INSs along the corticopapillary axis for two rodent species, the Munich-Wistar rat and the kangaroo rat. The results showed remarkable similarities in the configurations of INSs, suggesting that the structural arrangement of INSs is a highly conserved architecture that plays a fundamental role in renal function. The number density of INSs along the corticopapillary axis directly correlated with a loop population that declines exponentially with distance below the outer medullary-inner medullary boundary. The axial configurations were consistent with discrete association between near-bend loop segments and INSs and with upper loop segments lying distant from INSs. PMID:23825077
Electrical contact structures for solid oxide electrolyte fuel cell

DOEpatents

Isenberg, Arnold O.

1984-01-01

An improved electrical output connection means is provided for a high temperature solid oxide electrolyte type fuel cell generator. The electrical connection of the fuel cell electrodes to the electrical output bus, which is brought through the generator housing to be connected to an electrical load line maintains a highly uniform temperature distribution. The electrical connection means includes an electrode bus which is spaced parallel to the output bus with a plurality of symmetrically spaced transversely extending conductors extending between the electrode bus and the output bus, with thermal insulation means provided about the transverse conductors between the spaced apart buses. Single or plural stages of the insulated transversely extending conductors can be provided within the high temperatures regions of the fuel cell generator to provide highly homogeneous temperature distribution over the contacting surfaces.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Xiuguo; Ma, Zhichao; Xu, Zhimou

Mueller matrix ellipsometry (MME) is applied to detect foot-like asymmetry encountered in nanoimprint lithography (NIL) processes. We present both theoretical and experimental results which show that MME has good sensitivity to both the magnitude and direction of asymmetric profiles. The physics behind the use of MME for asymmetry detection is the breaking of electromagnetic reciprocity theorem for the zeroth-order diffraction of asymmetric gratings. We demonstrate that accurate characterization of asymmetric nanoimprinted gratings can be achieved by performing MME measurements in a conical mounting with the plane of incidence parallel to grating lines and meanwhile incorporating depolarization effects into the opticalmore » model. The comparison of MME-extracted asymmetric profile with the measurement by cross-sectional scanning electron microscopy also reveals the strong potential of this technique for in-line monitoring NIL processes, where symmetric structures are desired.« less
A Choice Reaction Time Index of Callosal Anatomical Homotopy

ERIC Educational Resources Information Center

Desjardins, Sameul; Braun, Claude M. J.; Achim, Andre; Roberge, Carl

2009-01-01

Tachistoscopically presented bilateral stimulus pairs not parallel to the meridian produced significantly longer RTs on a task requiring discrimination of shapes (Go/no-Go) than pairs emplaced symmetrically on each side of the meridian in Desjardins and Braun [Desjardins, S., & Braun, C. M. J. (2006). Homotopy and heterotopy and the bilateral…
Interaction between a laminar starting immersed micro-jet and a parallel wall

NASA Astrophysics Data System (ADS)

Cabaleiro, Juan Martin; Laborde, Cecilia; Artana, Guillermo

2015-01-01

In the present work, we study the starting transient of an immersed micro-jet in close vicinity to a solid wall parallel to its axis. The experiments concern laminar jets (Re < 200) issuing from a 100 μm internal tip diameter glass micro-pipette. The effect of the confinement was studied placing the micro-pipette at different distances from the wall. The characterization of the jet was carried out by visualizations on which the morphology of the vortex head and trajectories was analyzed. Numerical simulations were used as a complementary tool for the analysis. The jet remains stable for very long distances away from the tip allowing for a similarity analysis. The self-similar behavior of the starting jet has been studied in terms of the frontline position with time. A symmetric and a wall dominated regime could be identified. The starting jet in the wall type regime, and in the symmetric regime as well, develops a self-similar behavior that has a relative rapid loss of memory of the preceding condition of the flow. Scaling for both regimes are those that correspond to viscous dominated flows.
High quality NMR structures: a new force field with implicit water and membrane solvation for Xplor-NIH.

PubMed

Tian, Ye; Schwieters, Charles D; Opella, Stanley J; Marassi, Francesca M

2017-01-01

Structure determination of proteins by NMR is unique in its ability to measure restraints, very accurately, in environments and under conditions that closely mimic those encountered in vivo. For example, advances in solid-state NMR methods enable structure determination of membrane proteins in detergent-free lipid bilayers, and of large soluble proteins prepared by sedimentation, while parallel advances in solution NMR methods and optimization of detergent-free lipid nanodiscs are rapidly pushing the envelope of the size limit for both soluble and membrane proteins. These experimental advantages, however, are partially squandered during structure calculation, because the commonly used force fields are purely repulsive and neglect solvation, Van der Waals forces and electrostatic energy. Here we describe a new force field, and updated energy functions, for protein structure calculations with EEFx implicit solvation, electrostatics, and Van der Waals Lennard-Jones forces, in the widely used program Xplor-NIH. The new force field is based primarily on CHARMM22, facilitating calculations with a wider range of biomolecules. The new EEFx energy function has been rewritten to enable OpenMP parallelism, and optimized to enhance computation efficiency. It implements solvation, electrostatics, and Van der Waals energy terms together, thus ensuring more consistent and efficient computation of the complete nonbonded energy lists. Updates in the related python module allow detailed analysis of the interaction energies and associated parameters. The new force field and energy function work with both soluble proteins and membrane proteins, including those with cofactors or engineered tags, and are very effective in situations where there are sparse experimental restraints. Results obtained for NMR-restrained calculations with a set of five soluble proteins and five membrane proteins show that structures calculated with EEFx have significant improvements in accuracy, precision, and conformation, and that structure refinement can be obtained by short relaxation with EEFx to obtain improvements in these key metrics. These developments broaden the range of biomolecular structures that can be calculated with high fidelity from NMR restraints.
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
Performance improvements of symmetry-breaking reflector structures in nonimaging devices

DOEpatents

Winston, Roland

2004-01-13

A structure and method for providing a broken symmetry reflector structure for a solar concentrator device. The component of the optical direction vector along the symmetry axis is conserved for all rays propagated through a translationally symmetric optical device. This quantity, referred to as the translational skew invariant, is conserved in rotationally symmetric optical systems. Performance limits for translationally symmetric nonimaging optical devices are derived from the distributions of the translational skew invariant for the optical source and for the target to which flux is to be transferred. A numerically optimized non-tracking solar concentrator utilizing symmetry-breaking reflector structures can overcome the performance limits associated with translational symmetry.
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics.

PubMed

Gilles, Luc; Ellerbroek, Brent L; Vogel, Curtis R

2003-09-10

Multiconjugate adaptive optics (MCAO) systems with 10(4)-10(5) degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 10(4) actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10(-2) Hz, i.e., 4-5 orders of magnitude lower than the typical 10(3) Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
Extending the eigCG algorithm to nonsymmetric Lanczos for linear systems with multiple right-hand sides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdel-Rehim, A M; Stathopoulos, Andreas; Orginos, Kostas

2014-08-01

The technique that was used to build the EigCG algorithm for sparse symmetric linear systems is extended to the nonsymmetric case using the BiCG algorithm. We show that, similarly to the symmetric case, we can build an algorithm that is capable of computing a few smallest magnitude eigenvalues and their corresponding left and right eigenvectors of a nonsymmetric matrix using only a small window of the BiCG residuals while simultaneously solving a linear system with that matrix. For a system with multiple right-hand sides, we give an algorithm that computes incrementally more eigenvalues while solving the first few systems andmore » then uses the computed eigenvectors to deflate BiCGStab for the remaining systems. Our experiments on various test problems, including Lattice QCD, show the remarkable ability of EigBiCG to compute spectral approximations with accuracy comparable to that of the unrestarted, nonsymmetric Lanczos. Furthermore, our incremental EigBiCG followed by appropriately restarted and deflated BiCGStab provides a competitive method for systems with multiple right-hand sides.« less
A General Sparse Tensor Framework for Electronic Structure Theory

DOE PAGES

Manzer, Samuel; Epifanovsky, Evgeny; Krylov, Anna I.; ...

2017-01-24

Linear-scaling algorithms must be developed in order to extend the domain of applicability of electronic structure theory to molecules of any desired size. But, the increasing complexity of modern linear-scaling methods makes code development and maintenance a significant challenge. A major contributor to this difficulty is the lack of robust software abstractions for handling block-sparse tensor operations. We therefore report the development of a highly efficient symbolic block-sparse tensor library in order to provide access to high-level software constructs to treat such problems. Our implementation supports arbitrary multi-dimensional sparsity in all input and output tensors. We then avoid cumbersome machine-generatedmore » code by implementing all functionality as a high-level symbolic C++ language library and demonstrate that our implementation attains very high performance for linear-scaling sparse tensor contractions.« less
Nonlocal sparse model with adaptive structural clustering for feature extraction of aero-engine bearings

NASA Astrophysics Data System (ADS)

Zhang, Han; Chen, Xuefeng; Du, Zhaohui; Li, Xiang; Yan, Ruqiang

2016-04-01

Fault information of aero-engine bearings presents two particular phenomena, i.e., waveform distortion and impulsive feature frequency band dispersion, which leads to a challenging problem for current techniques of bearing fault diagnosis. Moreover, although many progresses of sparse representation theory have been made in feature extraction of fault information, the theory also confronts inevitable performance degradation due to the fact that relatively weak fault information has not sufficiently prominent and sparse representations. Therefore, a novel nonlocal sparse model (coined NLSM) and its algorithm framework has been proposed in this paper, which goes beyond simple sparsity by introducing more intrinsic structures of feature information. This work adequately exploits the underlying prior information that feature information exhibits nonlocal self-similarity through clustering similar signal fragments and stacking them together into groups. Within this framework, the prior information is transformed into a regularization term and a sparse optimization problem, which could be solved through block coordinate descent method (BCD), is formulated. Additionally, the adaptive structural clustering sparse dictionary learning technique, which utilizes k-Nearest-Neighbor (kNN) clustering and principal component analysis (PCA) learning, is adopted to further enable sufficient sparsity of feature information. Moreover, the selection rule of regularization parameter and computational complexity are described in detail. The performance of the proposed framework is evaluated through numerical experiment and its superiority with respect to the state-of-the-art method in the field is demonstrated through the vibration signals of experimental rig of aircraft engine bearings.
Chebyshev polynomial filtered subspace iteration in the discontinuous Galerkin method for large-scale electronic structure calculations

DOE PAGES

Banerjee, Amartya S.; Lin, Lin; Hu, Wei; ...

2016-10-21

The Discontinuous Galerkin (DG) electronic structure method employs an adaptive local basis (ALB) set to solve the Kohn-Sham equations of density functional theory in a discontinuous Galerkin framework. The adaptive local basis is generated on-the-fly to capture the local material physics and can systematically attain chemical accuracy with only a few tens of degrees of freedom per atom. A central issue for large-scale calculations, however, is the computation of the electron density (and subsequently, ground state properties) from the discretized Hamiltonian in an efficient and scalable manner. We show in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) canmore » be used to address this issue and push the envelope in large-scale materials simulations in a discontinuous Galerkin framework. We describe how the subspace filtering steps can be performed in an efficient and scalable manner using a two-dimensional parallelization scheme, thanks to the orthogonality of the DG basis set and block-sparse structure of the DG Hamiltonian matrix. The on-the-fly nature of the ALB functions requires additional care in carrying out the subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI approach in calculations of large-scale twodimensional graphene sheets and bulk three-dimensional lithium-ion electrolyte systems. In conclusion, employing 55 296 computational cores, the time per self-consistent field iteration for a sample of the bulk 3D electrolyte containing 8586 atoms is 90 s, and the time for a graphene sheet containing 11 520 atoms is 75 s.« less
SPARSKIT: A basic tool kit for sparse matrix computations

NASA Technical Reports Server (NTRS)

Saad, Youcef

1990-01-01

Presented here are the main features of a tool package for manipulating and working with sparse matrices. One of the goals of the package is to provide basic tools to facilitate the exchange of software and data between researchers in sparse matrix computations. The starting point is the Harwell/Boeing collection of matrices for which the authors provide a number of tools. Among other things, the package provides programs for converting data structures, printing simple statistics on a matrix, plotting a matrix profile, and performing linear algebra operations with sparse matrices.
Harnessing data structure for recovery of randomly missing structural vibration responses time history: Sparse representation versus low-rank structure

NASA Astrophysics Data System (ADS)

Yang, Yongchao; Nagarajaiah, Satish

2016-06-01

Randomly missing data of structural vibration responses time history often occurs in structural dynamics and health monitoring. For example, structural vibration responses are often corrupted by outliers or erroneous measurements due to sensor malfunction; in wireless sensing platforms, data loss during wireless communication is a common issue. Besides, to alleviate the wireless data sampling or communication burden, certain accounts of data are often discarded during sampling or before transmission. In these and other applications, recovery of the randomly missing structural vibration responses from the available, incomplete data, is essential for system identification and structural health monitoring; it is an ill-posed inverse problem, however. This paper explicitly harnesses the data structure itself-of the structural vibration responses-to address this (inverse) problem. What is relevant is an empirical, but often practically true, observation, that is, typically there are only few modes active in the structural vibration responses; hence a sparse representation (in frequency domain) of the single-channel data vector, or, a low-rank structure (by singular value decomposition) of the multi-channel data matrix. Exploiting such prior knowledge of data structure (intra-channel sparse or inter-channel low-rank), the new theories of ℓ1-minimization sparse recovery and nuclear-norm-minimization low-rank matrix completion enable recovery of the randomly missing or corrupted structural vibration response data. The performance of these two alternatives, in terms of recovery accuracy and computational time under different data missing rates, is investigated on a few structural vibration response data sets-the seismic responses of the super high-rise Canton Tower and the structural health monitoring accelerations of a real large-scale cable-stayed bridge. Encouraging results are obtained and the applicability and limitation of the presented methods are discussed.
A critical analysis of computational protein design with sparse residue interaction graphs

PubMed Central

Georgiev, Ivelin S.

2017-01-01

Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Three-dimensional midwater camouflage from a novel two-component photonic structure in hatchetfish skin.

PubMed

Rosenthal, Eric I; Holt, Amanda L; Sweeney, Alison M

2017-05-01

The largest habitat by volume on Earth is the oceanic midwater, which is also one of the least understood in terms of animal ecology. The organisms here exhibit a spectacular array of optical adaptations for living in a visual void that have only barely begun to be described. We describe a complex pattern of broadband scattering from the skin of Argyropelecus sp., a hatchetfish found in the mesopelagic zone of the world's oceans. Hatchetfish skin superficially resembles the unpolished side of aluminium foil, but on closer inspection contains a complex composite array of subwavelength-scale dielectric structures. The superficial layer of this array contains dielectric stacks that are rectangular in cross-section, while the deeper layer contains dielectric bundles that are elliptical in cross-section; the cells in both layers have their longest dimension running parallel to the dorsal-ventral axis of the fish. Using the finite-difference time-domain approach and photographic radiometry, we explored the structural origins of this scattering behaviour and its environmental consequences. When the fish's flank is illuminated from an arbitrary incident angle, a portion of the scattered light exits in an arc parallel to the fish's anterior-posterior axis. Simultaneously, some incident light is also scattered downwards through the complex birefringent skin structure and exits from the ventral photophores. We show that this complex scattering pattern will provide camouflage simultaneously against the horizontal radially symmetric solar radiance in this habitat, and the predatory bioluminescent searchlights that are common here. The structure also directs light incident on the flank of the fish into the downwelling, silhouette-hiding counter-illumination of the ventral photophores. © 2017 The Authors.
Three-dimensional midwater camouflage from a novel two-component photonic structure in hatchetfish skin

PubMed Central

Rosenthal, Eric I.; Holt, Amanda L.

2017-01-01

The largest habitat by volume on Earth is the oceanic midwater, which is also one of the least understood in terms of animal ecology. The organisms here exhibit a spectacular array of optical adaptations for living in a visual void that have only barely begun to be described. We describe a complex pattern of broadband scattering from the skin of Argyropelecus sp., a hatchetfish found in the mesopelagic zone of the world's oceans. Hatchetfish skin superficially resembles the unpolished side of aluminium foil, but on closer inspection contains a complex composite array of subwavelength-scale dielectric structures. The superficial layer of this array contains dielectric stacks that are rectangular in cross-section, while the deeper layer contains dielectric bundles that are elliptical in cross-section; the cells in both layers have their longest dimension running parallel to the dorsal–ventral axis of the fish. Using the finite-difference time-domain approach and photographic radiometry, we explored the structural origins of this scattering behaviour and its environmental consequences. When the fish's flank is illuminated from an arbitrary incident angle, a portion of the scattered light exits in an arc parallel to the fish's anterior–posterior axis. Simultaneously, some incident light is also scattered downwards through the complex birefringent skin structure and exits from the ventral photophores. We show that this complex scattering pattern will provide camouflage simultaneously against the horizontal radially symmetric solar radiance in this habitat, and the predatory bioluminescent searchlights that are common here. The structure also directs light incident on the flank of the fish into the downwelling, silhouette-hiding counter-illumination of the ventral photophores. PMID:28468923
Reconstructing three-dimensional protein crystal intensities from sparse unoriented two-axis X-ray diffraction patterns

PubMed Central

Lan, Ti-Yen; Wierman, Jennifer L.; Tate, Mark W.; Philipp, Hugh T.; Elser, Veit

2017-01-01

Recently, there has been a growing interest in adapting serial microcrystallography (SMX) experiments to existing storage ring (SR) sources. For very small crystals, however, radiation damage occurs before sufficient numbers of photons are diffracted to determine the orientation of the crystal. The challenge is to merge data from a large number of such ‘sparse’ frames in order to measure the full reciprocal space intensity. To simulate sparse frames, a dataset was collected from a large lysozyme crystal illuminated by a dim X-ray source. The crystal was continuously rotated about two orthogonal axes to sample a subset of the rotation space. With the EMC algorithm [expand–maximize–compress; Loh & Elser (2009). Phys. Rev. E, 80, 026705], it is shown that the diffracted intensity of the crystal can still be reconstructed even without knowledge of the orientation of the crystal in any sparse frame. Moreover, parallel computation implementations were designed to considerably improve the time and memory scaling of the algorithm. The results show that EMC-based SMX experiments should be feasible at SR sources. PMID:28808431
Backward and forward plasmons in symmetric structures

NASA Astrophysics Data System (ADS)

Davidovich, Mikhael V.

2018-04-01

The electric and magnetic surface plasmons in symmetric structures of metallic and dielectric layers are considered. The existence of backward and forward waves and the slow and fast plasmon-polaritons are obtained. It is shown that the anomalous negative dispersion in the structures with dissipation does not necessarily indicate the backward surface plasmons.
Natural convection in symmetrically heated vertical parallel plates with discrete heat sources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Manca, O.; Nardini, S.; Naso, V.

Laminar air natural convection in a symmetrically heated vertical channel with uniform flush-mounted discrete heat sources has been experimentally investigated. The effects of heated strips location and of their number are pointed out in terms of the maximum wall temperatures. A flow visualization in the entrance region of the channel was carried out and air temperatures and velocities in two cross sections have been measured. Dimensionless local heat transfer coefficients have been evaluated and monomial correlations among relevant parameters have bee derived in the local Rayleigh number range 10--10{sup 6}. Channel Nusselt number has been correlated in a polynomial formmore » in terms of channel Rayleigh number.« less
A Fast parallel tridiagonal algorithm for a class of CFD applications

NASA Technical Reports Server (NTRS)

Moitra, Stuti; Sun, Xian-He

1996-01-01

The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable.

Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Communication requirements of Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The communication requirement is characterized by the data traffic generated on multiprocessor systems with local and shared memory. Lower bound proofs are given to show that when the load is uniformly distributed the data traffic associated with factoring an n x n dense matrix using n to the alpha power (alpha less than or equal 2) processors is omega(n to the 2 + alpha/2 power). For n x n sparse matrices representing a square root of n x square root of n regular grid graph the data traffic is shown to be omega(n to the 1 + alpha/2 power), alpha less than or equal 1. Partitioning schemes that are variations of block assignment scheme are described and it is shown that the data traffic generated by these schemes are asymptotically optimal. The schemes allow efficient use of up to O(n to the 2nd power) processors in the dense case and up to O(n) processors in the sparse case before the total data traffic reaches the maximum value of O(n to the 3rd power) and O(n to the 3/2 power), respectively. It is shown that the block based partitioning schemes allow a better utilization of the data accessed from shared memory and thus reduce the data traffic than those based on column-wise wrap around assignment schemes.
Methods for design and evaluation of integrated hardware-software systems for concurrent computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

Research activities and publications are briefly summarized. The major tasks reviewed are: (1) VAX implementation of the PISCES parallel programming environment; (2) Apollo workstation network implementation of the PISCES environment; (3) FLEX implementation of the PISCES environment; (4) sparse matrix iterative solver in PSICES Fortran; (5) image processing application of PISCES; and (6) a formal model of concurrent computation being developed.
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, T.

Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.
Tunable elastic parity-time symmetric structure based on the shunted piezoelectric materials

NASA Astrophysics Data System (ADS)

Hou, Zhilin; Assouar, Badreddine

2018-02-01

We theoretically and numerically report on the tunable elastic Parity-Time (PT) symmetric structure based on shunted piezoelectric units. We show that the elastic loss and gain can be archived in piezoelectric materials when they are shunted by external circuits containing positive and negative resistances. We present and discuss, as an example, the strongly dependent relationship between the exceptional points of a three-layered system and the impedance of their external shunted circuit. The achieved results evidence that the PT symmetric structures based on this proposed concept can actively be tuned without any change of their geometric configurations.
Rotating columns: Relating structure-from-motion, accretion/deletion, and figure/ground

PubMed Central

Froyen, Vicky; Feldman, Jacob; Singh, Manish

2013-01-01

We present a novel phenomenon involving an interaction between accretion deletion, figure-ground interpretation, and structure-from-motion. Our displays contain alternating light and dark vertical regions in which random-dot textures moved horizontally at constant speed but in opposite directions in alternating regions. This motion is consistent with all the light regions in front, with the dark regions completing amodally into a single large surface moving in the background, or vice versa. Surprisingly, the regions that are perceived as figural are also perceived as 3-D volumes rotating in depth (like rotating columns)—despite the fact that dot motion is not consistent with 3-D rotation. In a series of experiments, we found we could manipulate which set of regions is perceived as rotating volumes simply by varying known geometric cues to figure ground, including convexity, parallelism, symmetry, and relative area. Subjects indicated which colored regions they perceived as rotating. For our displays we found convexity to be a stronger cue than either symmetry or parallelism. We furthermore found a smooth monotonic decay of the proportion by which subjects perceive symmetric regions as figural, as a function of their relative area. Our results reveal an intriguing new interaction between accretion-deletion, figure-ground, and 3-D motion that is not captured by existing models. They also provide an effective tool for measuring figure-ground perception. PMID:23946432
Rotating columns: relating structure-from-motion, accretion/deletion, and figure/ground.

PubMed

Froyen, Vicky; Feldman, Jacob; Singh, Manish

2013-08-14

We present a novel phenomenon involving an interaction between accretion deletion, figure-ground interpretation, and structure-from-motion. Our displays contain alternating light and dark vertical regions in which random-dot textures moved horizontally at constant speed but in opposite directions in alternating regions. This motion is consistent with all the light regions in front, with the dark regions completing amodally into a single large surface moving in the background, or vice versa. Surprisingly, the regions that are perceived as figural are also perceived as 3-D volumes rotating in depth (like rotating columns)-despite the fact that dot motion is not consistent with 3-D rotation. In a series of experiments, we found we could manipulate which set of regions is perceived as rotating volumes simply by varying known geometric cues to figure ground, including convexity, parallelism, symmetry, and relative area. Subjects indicated which colored regions they perceived as rotating. For our displays we found convexity to be a stronger cue than either symmetry or parallelism. We furthermore found a smooth monotonic decay of the proportion by which subjects perceive symmetric regions as figural, as a function of their relative area. Our results reveal an intriguing new interaction between accretion-deletion, figure-ground, and 3-D motion that is not captured by existing models. They also provide an effective tool for measuring figure-ground perception.
Multiscale implementation of infinite-swap replica exchange molecular dynamics.

PubMed

Yu, Tang-Qing; Lu, Jianfeng; Abrams, Cameron F; Vanden-Eijnden, Eric

2016-10-18

Replica exchange molecular dynamics (REMD) is a popular method to accelerate conformational sampling of complex molecular systems. The idea is to run several replicas of the system in parallel at different temperatures that are swapped periodically. These swaps are typically attempted every few MD steps and accepted or rejected according to a Metropolis-Hastings criterion. This guarantees that the joint distribution of the composite system of replicas is the normalized sum of the symmetrized product of the canonical distributions of these replicas at the different temperatures. Here we propose a different implementation of REMD in which (i) the swaps obey a continuous-time Markov jump process implemented via Gillespie's stochastic simulation algorithm (SSA), which also samples exactly the aforementioned joint distribution and has the advantage of being rejection free, and (ii) this REMD-SSA is combined with the heterogeneous multiscale method to accelerate the rate of the swaps and reach the so-called infinite-swap limit that is known to optimize sampling efficiency. The method is easy to implement and can be trivially parallelized. Here we illustrate its accuracy and efficiency on the examples of alanine dipeptide in vacuum and C-terminal β-hairpin of protein G in explicit solvent. In this latter example, our results indicate that the landscape of the protein is a triple funnel with two folded structures and one misfolded structure that are stabilized by H-bonds.
Exploring Hypersonic, Unstructured-Grid Issues through Structured Grids

NASA Technical Reports Server (NTRS)

Mazaheri, Ali R.; Kleb, Bill

2007-01-01

Pure-tetrahedral unstructured grids have been shown to produce asymmetric heat transfer rates for symmetric problems. Meanwhile, two-dimensional structured grids produce symmetric solutions and as documented here, introducing a spanwise degree of freedom to these structured grids also yields symmetric solutions. The effects of grid skewness and other perturbations of structured-grids are investigated to uncover possible mechanisms behind the unstructured-grid solution asymmetries. By using controlled experiments around a known, good solution, the effects of particular grid pathologies are uncovered. These structured-grid experiments reveal that similar solution degradation occurs as for unstructured grids, especially for heat transfer rates. Non-smooth grids within the boundary layer is also shown to produce large local errors in heat flux but do not affect surface pressures.
Load-Dependent Soft-Switching Method of Half-Bridge Current Doubler for High-Voltage Point-of-Load Converter in Data Center Power Supplies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cui, Yutian; Yang, Fei; Tolbert, Leon M.

With the increased cloud computing and digital information storage, the energy requirement of data centers keeps increasing. A high-voltage point of load (HV POL) with an input series output parallel structure is proposed to convert 400 to 1 VDC within a single stage to increase the power conversion efficiency. The symmetrical controlled half-bridge current doubler is selected as the converter topology in the HV POL. A load-dependent soft-switching method has been proposed with an auxiliary circuit that includes inductor, diode, and MOSFETs so that the hard-switching issue of typical symmetrical controlled half-bridge converters is resolved. The operation principles of themore » proposed soft-switching half-bridge current doubler have been analyzed in detail. Then, the necessity of adjusting the timing with the loading in the proposed method is analyzed based on losses, and a controller is designed to realize the load-dependent operation. A lossless RCD current sensing method is used to sense the output inductor current value in the proposed load-dependent operation. In conclusion, experimental efficiency of a hardware prototype is provided to show that the proposed method can increase the converter's efficiency in both heavy- and light-load conditions.« less
Load-Dependent Soft-Switching Method of Half-Bridge Current Doubler for High-Voltage Point-of-Load Converter in Data Center Power Supplies

DOE PAGES

Cui, Yutian; Yang, Fei; Tolbert, Leon M.; ...

2016-06-14

With the increased cloud computing and digital information storage, the energy requirement of data centers keeps increasing. A high-voltage point of load (HV POL) with an input series output parallel structure is proposed to convert 400 to 1 VDC within a single stage to increase the power conversion efficiency. The symmetrical controlled half-bridge current doubler is selected as the converter topology in the HV POL. A load-dependent soft-switching method has been proposed with an auxiliary circuit that includes inductor, diode, and MOSFETs so that the hard-switching issue of typical symmetrical controlled half-bridge converters is resolved. The operation principles of themore » proposed soft-switching half-bridge current doubler have been analyzed in detail. Then, the necessity of adjusting the timing with the loading in the proposed method is analyzed based on losses, and a controller is designed to realize the load-dependent operation. A lossless RCD current sensing method is used to sense the output inductor current value in the proposed load-dependent operation. In conclusion, experimental efficiency of a hardware prototype is provided to show that the proposed method can increase the converter's efficiency in both heavy- and light-load conditions.« less
Magnetization distribution and spin transport of graphene/h-BN/graphene nanoribbon-based magnetic tunnel junction

NASA Astrophysics Data System (ADS)

Zhang, Y.; Yan, X. H.; Guo, Y. D.; Xiao, Y.

2017-09-01

Motivated by recent electronic transport measurement of boron nitride-graphene hybrid atomic layers, we studied magnetization distribution, transmission and current-bias relation of graphene/h-BN/graphene (C/BN/C) nanoribbon-based magnetic tunnel junctions (MTJ) based on density functional theory and non-equilibrium Green's function methods. Three types of MTJs, i.e. asymmetric, symmetric (S) and symmetric (SS), and two types of lead magnetization alignment, i.e. parallel (PC) and antiparallel (APC), are considered. The results show that the magnetization distribution is closely related to the interface structure. Especially for asymmetric MTJ, the B/N atoms at the C/BN interface are spin-polarized and give finite magnetic moments. More interesting, it is found that the APC transmission of asymmetric MTJ with the thinnest barrier dominates over the PC one. By analyzing the projected density of states, one finds that the unusual higher APC transmission than PC is due to the coupling of electronic states of left ZGNR and right ZGNR. By integrating transmission, we calculate the current-bias voltage relation and find that the APC current is larger than PC current at small bias voltage and therefore reproduces a negative tunnel magnetoresistance. The results reported here will be useful and important for the design of C/BN/C-based MTJ.
Impact of repeated uniaxial mechanical strain on flexible a-IGZO thin film transistors with symmetric and asymmetric structures

NASA Astrophysics Data System (ADS)

Liao, Po-Yung; Chang, Ting-Chang; Su, Wan-Ching; Chen, Bo-Wei; Chen, Li-Hui; Hsieh, Tien-Yu; Yang, Chung-Yi; Chang, Kuan-Chang; Zhang, Sheng-Dong; Huang, Yen-Yu; Chang, Hsi-Ming; Chiang, Shin-Chuan

2017-06-01

This letter investigates repeated uniaxial mechanical stress-induced degradation behavior in flexible amorphous In-Ga-Zn-O thin-film transistors (TFTs) of different geometric structures. Two types of via-contact structure TFTs are investigated: symmetrical and UI structure (TFTs with I- and U-shaped asymmetric electrodes). After repeated mechanical stress, I-V curves for the symmetrical structure show a significant negative threshold voltage (VT) shift, due to mechanical stress-induced oxygen vacancy generation. However, degradation in the UI structure TFTs after stress is a negative VT shift along with the parasitic transistor characteristic in the forward-operation mode, with this hump not evident in the reverse-operation mode. This asymmetrical degradation is clarified by the mechanical strain simulation of the UI TFTs.
Inter-trabecular angle: A parameter of trabecular bone architecture in the human proximal femur that reveals underlying topological motifs.

PubMed

Reznikov, Natalie; Chase, Hila; Ben Zvi, Yehonatan; Tarle, Victoria; Singer, Matthew; Brumfeld, Vlad; Shahar, Ron; Weiner, Steve

2016-10-15

Trabecular bone is an intricate 3D network of struts and plates. Although the structure-function relations in trabecular bone have been studied since the time of Julius Wolff, controversy still exists regarding the architectural parameters responsible for its stability and resilience. We present a parameter that measures the angle between two connected trabeculae - the Inter-Trabecular Angle (ITA). We studied the ITA values derived from μCT scans of different regions of the proximal femora of 5 individuals of different age and sex. We show that the ITA angle distribution of nodes with 3 connecting trabeculae has a mean close to 120°, nodes with 4 connecting trabeculae has a mean close to 109° and nodes of higher connectivity have mean ITA values around 100°. This tendency to spread the ITAs around geometrically symmetrical motifs is highly conserved. The implication is that the ITAs are optimized such that the smallest amount of material spans the maximal 3D volume, and possibly by so doing trabecular bone might be better adapted to multidirectional loading. We also draw a parallel between trabecular bone and tensegrity structures - where lightweight, resilient and stable tetrahedron-based shapes contribute to strain redistribution amongst all the elements and to collective impact dampening. The Inter-Trabecular Angle (ITA) is a new topological parameter of trabecular bone. The ITA characterizes the way trabeculae connect with each other at nodes, regardless of their thickness and shape. The mean ITA value of nodes with 3 trabeculae is close to 120°, of nodes with 4 trabeculae is just below 109°, and the mean ITA of nodes with 5 and more trabeculae is around 100°. Thus the connections of trabeculae trend towards adopting symmetrical shapes. This implies that trabeculae can maximally span 3D space using the minimal amount of material. We draw a parallel between this motif and the concept of tensegrity - an engineering premise to which many living creatures conform at multiple levels of organization. Copyright © 2016 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
InGaAs/InAlAs Double Quantum Wells as Starting Structures for Quantum Logic Gates

NASA Astrophysics Data System (ADS)

Marchewka, M.; Sheregii, E. M.

2011-12-01

The detection of both symmetric and anti-symmetric electron states in DQWs by an optical method is described in this paper. Values of the symmetric and anti-symmetric splitting (SAS-gap) determined in this way are used for interpretation of the beating effect in the SdH oscillations observed at low temperatures in the external magnetic field. SAS-splitting of electron states in DQWs clearly exists at room temperature and electrons in symmetric and anti-symmetric states have different statistics so these states can be identified in electron transport.
A novel design for a hybrid space manipulator

NASA Technical Reports Server (NTRS)

Shahinpoor, MO

1991-01-01

Described are the structural design, kinematics, and characteristics of a robot manipulator for space applications and use as an articulate and powerful space shuttle manipulator. Hybrid manipulators are parallel-serial connection robots that give rise to a multitude of highly precise robot manipulators. These manipulators are modular and can be extended by additional modules over large distances. Every module has a hemispherical work space and collective modules give rise to highly dexterous symmetrical work space. Some basic designs and kinematic structures of these robot manipulators are discussed, the associated direct and inverse kinematics formulations are presented, and solutions to the inverse kinematic problem are obtained explicitly and elaborated upon. These robot manipulators are shown to have a strength-to-weight ratio that is many times larger than the value that is currently available with industrial or research manipulators. This is due to the fact that these hybrid manipulators are stress-compensated and have an ultralight weight, yet, they are extremely stiff due to the fact that force distribution in their structure is mostly axial. Actuation is prismatic and can be provided by ball screws for maximum precision.
Numerical methods on some structured matrix algebra problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1996-06-01

This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was tomore » translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.« less
The SENSE-Isomorphism Theoretical Image Voxel Estimation (SENSE-ITIVE) Model for Reconstruction and Observing Statistical Properties of Reconstruction Operators

PubMed Central

Bruce, Iain P.; Karaman, M. Muge; Rowe, Daniel B.

2012-01-01

The acquisition of sub-sampled data from an array of receiver coils has become a common means of reducing data acquisition time in MRI. Of the various techniques used in parallel MRI, SENSitivity Encoding (SENSE) is one of the most common, making use of a complex-valued weighted least squares estimation to unfold the aliased images. It was recently shown in Bruce et al. [Magn. Reson. Imag. 29(2011):1267–1287] that when the SENSE model is represented in terms of a real-valued isomorphism, it assumes a skew-symmetric covariance between receiver coils, as well as an identity covariance structure between voxels. In this manuscript, we show that not only is the skew-symmetric coil covariance unlike that of real data, but the estimated covariance structure between voxels over a time series of experimental data is not an identity matrix. As such, a new model, entitled SENSE-ITIVE, is described with both revised coil and voxel covariance structures. Both the SENSE and SENSE-ITIVE models are represented in terms of real-valued isomorphisms, allowing for a statistical analysis of reconstructed voxel means, variances, and correlations resulting from the use of different coil and voxel covariance structures used in the reconstruction processes to be conducted. It is shown through both theoretical and experimental illustrations that the miss-specification of the coil and voxel covariance structures in the SENSE model results in a lower standard deviation in each voxel of the reconstructed images, and thus an artificial increase in SNR, compared to the standard deviation and SNR of the SENSE-ITIVE model where both the coil and voxel covariances are appropriately accounted for. It is also shown that there are differences in the correlations induced by the reconstruction operations of both models, and consequently there are differences in the correlations estimated throughout the course of reconstructed time series. These differences in correlations could result in meaningful differences in interpretation of results. PMID:22617147
Spontaneous symmetry breaking in coupled parametrically driven waveguides.

PubMed

Dror, Nir; Malomed, Boris A

2009-01-01

We introduce a system of linearly coupled parametrically driven damped nonlinear Schrödinger equations, which models a laser based on a nonlinear dual-core waveguide with parametric amplification symmetrically applied to both cores. The model may also be realized in terms of parallel ferromagnetic films, in which the parametric gain is provided by an external field. We analyze spontaneous symmetry breaking (SSB) of fundamental and multiple solitons in this system, which was not studied systematically before in linearly coupled dissipative systems with intrinsic nonlinearity. For fundamental solitons, the analysis reveals three distinct SSB scenarios. Unlike the standard dual-core-fiber model, the present system gives rise to a vast bistability region, which may be relevant to applications. Other noteworthy findings are restabilization of the symmetric soliton after it was destabilized by the SSB bifurcation, and the existence of a generic situation with all solitons unstable in the single-component (decoupled) model, while both symmetric and asymmetric solitons may be stable in the coupled system. The stability of the asymmetric solitons is identified via direct simulations, while for symmetric and antisymmetric ones the stability is verified too through the computation of stability eigenvalues, families of antisymmetric solitons being entirely unstable. In this way, full stability maps for the symmetric solitons are produced. We also investigate the SSB bifurcation of two-soliton bound states (it breaks the symmetry between the two components, while the two peaks in the shape of the soliton remain mutually symmetric). The family of the asymmetric double-peak states may decouple from its symmetric counterpart, being no longer connected to it by the bifurcation, with a large portion of the asymmetric family remaining stable.
Parallel Symmetric Eigenvalue Problem Solvers

DTIC Science & Technology

2015-05-01

get research, tutoring, and mentoring experience as an undergraduate. Last but not least, I thank my family for their love and support. v TABLE OF...32 4.6.2 Choice of the Ritz shifts . . . . . . . . . . . . . . . . . . . . 37 4.7 Relationship between...pencil. I will conclude with a discussion of the relationship between Trace- Min and simultaneous iteration. If both methods solve the linear systems
Solution of matrix equations using sparse techniques

NASA Technical Reports Server (NTRS)

Baddourah, Majdi

1994-01-01

The solution of large systems of matrix equations is key to the solution of a large number of scientific and engineering problems. This talk describes the sparse matrix solver developed at Langley which can routinely solve in excess of 263,000 equations in 40 seconds on one Cray C-90 processor. It appears that for large scale structural analysis applications, sparse matrix methods have a significant performance advantage over other methods.

Machine Learning Toolkit for Extreme Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-03-31

Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less
An Optimization Code for Nonlinear Transient Problems of a Large Scale Multidisciplinary Mathematical Model

NASA Astrophysics Data System (ADS)

Takasaki, Koichi

This paper presents a program for the multidisciplinary optimization and identification problem of the nonlinear model of large aerospace vehicle structures. The program constructs the global matrix of the dynamic system in the time direction by the p-version finite element method (pFEM), and the basic matrix for each pFEM node in the time direction is described by a sparse matrix similarly to the static finite element problem. The algorithm used by the program does not require the Hessian matrix of the objective function and so has low memory requirements. It also has a relatively low computational cost, and is suited to parallel computation. The program was integrated as a solver module of the multidisciplinary analysis system CUMuLOUS (Computational Utility for Multidisciplinary Large scale Optimization of Undense System) which is under development by the Aerospace Research and Development Directorate (ARD) of the Japan Aerospace Exploration Agency (JAXA).
Accelerating Full Configuration Interaction Calculations for Nuclear Structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Chao; Sternberg, Philip; Maris, Pieter

2008-04-14

One of the emerging computational approaches in nuclear physics is the full configuration interaction (FCI) method for solving the many-body nuclear Hamiltonian in a sufficiently large single-particle basis space to obtain exact answers - either directly or by extrapolation. The lowest eigenvalues and correspondingeigenvectors for very large, sparse and unstructured nuclear Hamiltonian matrices are obtained and used to evaluate additional experimental quantities. These matrices pose a significant challenge to the design and implementation of efficient and scalable algorithms for obtaining solutions on massively parallel computer systems. In this paper, we describe the computational strategies employed in a state-of-the-art FCI codemore » MFDn (Many Fermion Dynamics - nuclear) as well as techniques we recently developed to enhance the computational efficiency of MFDn. We will demonstrate the current capability of MFDn and report the latest performance improvement we have achieved. We will also outline our future research directions.« less
A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

PubMed

Zhang, L; Liu, X J

2016-06-03

With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Performance Models for the Spike Banded Linear System Solver

DOE PAGES

Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...

2011-01-01

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
Description of strong M1 transitions between 4^+ states at N=52 within the sdg-IBM-2

NASA Astrophysics Data System (ADS)

Casperson, R. J.; Werner, V.; Heinze, S.

2009-10-01

The interplay between collective and single-particle degrees of freedom for nuclei near the N=50 shell closure have recently been under investigation. In Molybdenum and Ruthenium nuclei, collective symmetric and mixed-symmetric structures have been identified, while in Zirconium, underlying shell-structure plays an enhanced role. The one-phonon 2^+ mixed-symmetry state was identified from its strong M1 transition to the 2^+1 state. Similar transitions were observed between 4^+ states in ^94Mo and ^92Zr, and shell model calculations indicate that hexadecapole excitations play a role. These phenomena will be investigated within the sdg-Interacting Boson Model-2 in order to gain a better understanding about the structure of the states involved, and to which extent the hexadecapole degree of freedom is important at relatively low energies. First calculations within this model, using an F-spin conserving Hamiltonian to disentangle symmetric and mixed- symmetric structures, will be presented and compared to data.
Quantum theory of atoms in molecules/charge-charge flux-dipole flux models for fundamental vibrational intensity changes on H-bond formation of water and hydrogen fluoride

DOE Office of Scientific and Technical Information (OSTI.GOV)

Silva, Arnaldo F.; Richter, Wagner E.; Bruns, Roy E., E-mail: bruns@iqm.unicamp.br

The Quantum Theory of Atoms In Molecules/Charge-Charge Flux-Dipole Flux (QTAIM/CCFDF) model has been used to investigate the electronic structure variations associated with intensity changes on dimerization for the vibrations of the water and hydrogen fluoride dimers as well as in the water-hydrogen fluoride complex. QCISD/cc-pVTZ wave functions applied in the QTAIM/CCFDF model accurately provide the fundamental band intensities of water and its dimer predicting symmetric and antisymmetric stretching intensity increases for the donor unit of 159 and 47 km mol{sup −1} on H-bond formation compared with the experimental values of 141 and 53 km mol{sup −1}. The symmetric stretching ofmore » the proton donor water in the dimer has intensity contributions parallel and perpendicular to its C{sub 2v} axis. The largest calculated increase of 107 km mol{sup −1} is perpendicular to this axis and owes to equilibrium atomic charge displacements on vibration. Charge flux decreases occurring parallel and perpendicular to this axis result in 42 and 40 km mol{sup −1} total intensity increases for the symmetric and antisymmetric stretches, respectively. These decreases in charge flux result in intensity enhancements because of the interaction contributions to the intensities between charge flux and the other quantities. Even though dipole flux contributions are much smaller than the charge and charge flux ones in both monomer and dimer water they are important for calculating the total intensity values for their stretching vibrations since the charge-charge flux interaction term cancels the charge and charge flux contributions. The QTAIM/CCFDF hydrogen-bonded stretching intensity strengthening of 321 km mol{sup −1} on HF dimerization and 592 km mol{sup −1} on HF:H{sub 2}O complexation can essentially be explained by charge, charge flux and their interaction cross term. Atomic contributions to the intensities are also calculated. The bridge hydrogen atomic contributions alone explain 145, 237, and 574 km mol{sup −1} of the H-bond stretching intensity enhancements for the water and HF dimers and their heterodimer compared with total increments of 149, 321, and 592 km mol{sup −1}, respectively.« less
Asymmetric cell division requires specific mechanisms for adjusting global transcription.

PubMed

Mena, Adriana; Medina, Daniel A; García-Martínez, José; Begley, Victoria; Singh, Abhyudai; Chávez, Sebastián; Muñoz-Centeno, Mari C; Pérez-Ortín, José E

2017-12-01

Most cells divide symmetrically into two approximately identical cells. There are many examples, however, of asymmetric cell division that can generate sibling cell size differences. Whereas physical asymmetric division mechanisms and cell fate consequences have been investigated, the specific problem caused by asymmetric division at the transcription level has not yet been addressed. In symmetrically dividing cells the nascent transcription rate increases in parallel to cell volume to compensate it by keeping the actual mRNA synthesis rate constant. This cannot apply to the yeast Saccharomyces cerevisiae, where this mechanism would provoke a never-ending increasing mRNA synthesis rate in smaller daughter cells. We show here that, contrarily to other eukaryotes with symmetric division, budding yeast keeps the nascent transcription rates of its RNA polymerases constant and increases mRNA stability. This control on RNA pol II-dependent transcription rate is obtained by controlling the cellular concentration of this enzyme. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Advancing MODFLOW Applying the Derived Vector Space Method

NASA Astrophysics Data System (ADS)

Herrera, G. S.; Herrera, I.; Lemus-García, M.; Hernandez-Garcia, G. D.

2015-12-01

The most effective domain decomposition methods (DDM) are non-overlapping DDMs. Recently a new approach, the DVS-framework, based on an innovative discretization method that uses a non-overlapping system of nodes (the derived-nodes), was introduced and developed by I. Herrera et al. [1, 2]. Using the DVS-approach a group of four algorithms, referred to as the 'DVS-algorithms', which fulfill the DDM-paradigm (i.e. the solution of global problems is obtained by resolution of local problems exclusively) has been derived. Such procedures are applicable to any boundary-value problem, or system of such equations, for which a standard discretization method is available and then software with a high degree of parallelization can be constructed. In a parallel talk, in this AGU Fall Meeting, Ismael Herrera will introduce the general DVS methodology. The application of the DVS-algorithms has been demonstrated in the solution of several boundary values problems of interest in Geophysics. Numerical examples for a single-equation, for the cases of symmetric, non-symmetric and indefinite problems were demonstrated before [1,2]. For these problems DVS-algorithms exhibited significantly improved numerical performance with respect to standard versions of DDM algorithms. In view of these results our research group is in the process of applying the DVS method to a widely used simulator for the first time, here we present the advances of the application of this method for the parallelization of MODFLOW. Efficiency results for a group of tests will be presented. References [1] I. Herrera, L.M. de la Cruz and A. Rosas-Medina. Non overlapping discretization methods for partial differential equations, Numer Meth Part D E, (2013). [2] Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
Convergence Speed of a Dynamical System for Sparse Recovery

NASA Astrophysics Data System (ADS)

Balavoine, Aurele; Rozell, Christopher J.; Romberg, Justin

2013-09-01

This paper studies the convergence rate of a continuous-time dynamical system for L1-minimization, known as the Locally Competitive Algorithm (LCA). Solving L1-minimization} problems efficiently and rapidly is of great interest to the signal processing community, as these programs have been shown to recover sparse solutions to underdetermined systems of linear equations and come with strong performance guarantees. The LCA under study differs from the typical L1 solver in that it operates in continuous time: instead of being specified by discrete iterations, it evolves according to a system of nonlinear ordinary differential equations. The LCA is constructed from simple components, giving it the potential to be implemented as a large-scale analog circuit. The goal of this paper is to give guarantees on the convergence time of the LCA system. To do so, we analyze how the LCA evolves as it is recovering a sparse signal from underdetermined measurements. We show that under appropriate conditions on the measurement matrix and the problem parameters, the path the LCA follows can be described as a sequence of linear differential equations, each with a small number of active variables. This allows us to relate the convergence time of the system to the restricted isometry constant of the matrix. Interesting parallels to sparse-recovery digital solvers emerge from this study. Our analysis covers both the noisy and noiseless settings and is supported by simulation results.
Alternatively Constrained Dictionary Learning For Image Superresolution.

PubMed

Lu, Xiaoqiang; Yuan, Yuan; Yan, Pingkun

2014-03-01

Dictionaries are crucial in sparse coding-based algorithm for image superresolution. Sparse coding is a typical unsupervised learning method to study the relationship between the patches of high-and low-resolution images. However, most of the sparse coding methods for image superresolution fail to simultaneously consider the geometrical structure of the dictionary and the corresponding coefficients, which may result in noticeable superresolution reconstruction artifacts. In other words, when a low-resolution image and its corresponding high-resolution image are represented in their feature spaces, the two sets of dictionaries and the obtained coefficients have intrinsic links, which has not yet been well studied. Motivated by the development on nonlocal self-similarity and manifold learning, a novel sparse coding method is reported to preserve the geometrical structure of the dictionary and the sparse coefficients of the data. Moreover, the proposed method can preserve the incoherence of dictionary entries and provide the sparse coefficients and learned dictionary from a new perspective, which have both reconstruction and discrimination properties to enhance the learning performance. Furthermore, to utilize the model of the proposed method more effectively for single-image superresolution, this paper also proposes a novel dictionary-pair learning method, which is named as two-stage dictionary training. Extensive experiments are carried out on a large set of images comparing with other popular algorithms for the same purpose, and the results clearly demonstrate the effectiveness of the proposed sparse representation model and the corresponding dictionary learning algorithm.
A Relaxation Method for Nonlocal and Non-Hermitian Operators

NASA Astrophysics Data System (ADS)

Lagaris, I. E.; Papageorgiou, D. G.; Braun, M.; Sofianos, S. A.

1996-06-01

We present a grid method to solve the time dependent Schrödinger equation (TDSE). It uses the Crank-Nicholson scheme to propagate the wavefunction forward in time and finite differences to approximate the derivative operators. The resulting sparse linear system is solved by the symmetric successive overrelaxation iterative technique. The method handles local and nonlocal interactions and Hamiltonians that correspond to either Hermitian or to non-Hermitian matrices with real eigenvalues. We test the method by solving the TDSE in the imaginary time domain, thus converting the time propagation to asymptotic relaxation. Benchmark problems solved are both in one and two dimensions, with local, nonlocal, Hermitian and non-Hermitian Hamiltonians.
Linkage mechanisms in the vertebrate skull: Structure and function of three-dimensional, parallel transmission systems.

PubMed

Olsen, Aaron M; Westneat, Mark W

2016-12-01

Many musculoskeletal systems, including the skulls of birds, fishes, and some lizards consist of interconnected chains of mobile skeletal elements, analogous to linkage mechanisms used in engineering. Biomechanical studies have applied linkage models to a diversity of musculoskeletal systems, with previous applications primarily focusing on two-dimensional linkage geometries, bilaterally symmetrical pairs of planar linkages, or single four-bar linkages. Here, we present new, three-dimensional (3D), parallel linkage models of the skulls of birds and fishes and use these models (available as free kinematic simulation software), to investigate structure-function relationships in these systems. This new computational framework provides an accessible and integrated workflow for exploring the evolution of structure and function in complex musculoskeletal systems. Linkage simulations show that kinematic transmission, although a suitable functional metric for linkages with single rotating input and output links, can give misleading results when applied to linkages with substantial translational components or multiple output links. To take into account both linear and rotational displacement we define force mechanical advantage for a linkage (analogous to lever mechanical advantage) and apply this metric to measure transmission efficiency in the bird cranial mechanism. For linkages with multiple, expanding output points we propose a new functional metric, expansion advantage, to measure expansion amplification and apply this metric to the buccal expansion mechanism in fishes. Using the bird cranial linkage model, we quantify the inaccuracies that result from simplifying a 3D geometry into two dimensions. We also show that by combining single-chain linkages into parallel linkages, more links can be simulated while decreasing or maintaining the same number of input parameters. This generalized framework for linkage simulation and analysis can accommodate linkages of differing geometries and configurations, enabling novel interpretations of the mechanics of force transmission across a diversity of vertebrate feeding mechanisms and enhancing our understanding of musculoskeletal function and evolution. J. Morphol. 277:1570-1583, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Computational design of a self-assembling symmetrical β-propeller protein.

PubMed

Voet, Arnout R D; Noguchi, Hiroki; Addy, Christine; Simoncini, David; Terada, Daiki; Unzai, Satoru; Park, Sam-Yong; Zhang, Kam Y J; Tame, Jeremy R H

2014-10-21

The modular structure of many protein families, such as β-propeller proteins, strongly implies that duplication played an important role in their evolution, leading to highly symmetrical intermediate forms. Previous attempts to create perfectly symmetrical propeller proteins have failed, however. We have therefore developed a new and rapid computational approach to design such proteins. As a test case, we have created a sixfold symmetrical β-propeller protein and experimentally validated the structure using X-ray crystallography. Each blade consists of 42 residues. Proteins carrying 2-10 identical blades were also expressed and purified. Two or three tandem blades assemble to recreate the highly stable sixfold symmetrical architecture, consistent with the duplication and fusion theory. The other proteins produce different monodisperse complexes, up to 42 blades (180 kDa) in size, which self-assemble according to simple symmetry rules. Our procedure is suitable for creating nano-building blocks from different protein templates of desired symmetry.
Simulation of Devices with Molecular Potentials

DTIC Science & Technology

2013-12-22

10] W. R. Frensley, Wigner - function model of a resonant-tunneling semiconductor de- vice, Phys. Rev. B, 36 (1987), pp. 1570–1580. 6 [11] M. J...develop the principal investigator’s Wigner -Poisson code and extend that code to deal with longer devices and more complex barrier profiles. Over...Research Triangle Park, NC 27709-2211 Molecular Confirmation, Sparse Interpolation, Wigner -Poisson Equation, Parallel Algorithms REPORT DOCUMENTATION PAGE 11
spammpack, Version 2013-06-18

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-01-17

This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler
Advancing Underwater Acoustic Communication for Autonomous Distributed Networks via Sparse Channel Sensing, Coding, and Navigation Support

DTIC Science & Technology

2011-09-30

channel interference mitigation for underwater acoustic MIMO - OFDM . 3) Turbo equalization for OFDM modulated physical layer network coding. 4) Blind CFO...Underwater Acoustic MIMO - OFDM . MIMO - OFDM has been actively studied for high data rate communications over the bandwidthlimited underwater acoustic...with the cochannel interference (CCI) due to parallel transmissions in MIMO - OFDM . Our proposed receiver has the following components: 1
Effect of dilution in asymmetric recurrent neural networks.

PubMed

Folli, Viola; Gosti, Giorgio; Leonetti, Marco; Ruocco, Giancarlo

2018-04-16

We study with numerical simulation the possible limit behaviors of synchronous discrete-time deterministic recurrent neural networks composed of N binary neurons as a function of a network's level of dilution and asymmetry. The network dilution measures the fraction of neuron couples that are connected, and the network asymmetry measures to what extent the underlying connectivity matrix is asymmetric. For each given neural network, we study the dynamical evolution of all the different initial conditions, thus characterizing the full dynamical landscape without imposing any learning rule. Because of the deterministic dynamics, each trajectory converges to an attractor, that can be either a fixed point or a limit cycle. These attractors form the set of all the possible limit behaviors of the neural network. For each network we then determine the convergence times, the limit cycles' length, the number of attractors, and the sizes of the attractors' basin. We show that there are two network structures that maximize the number of possible limit behaviors. The first optimal network structure is fully-connected and symmetric. On the contrary, the second optimal network structure is highly sparse and asymmetric. The latter optimal is similar to what observed in different biological neuronal circuits. These observations lead us to hypothesize that independently from any given learning model, an efficient and effective biologic network that stores a number of limit behaviors close to its maximum capacity tends to develop a connectivity structure similar to one of the optimal networks we found. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Crowley, Kay; Saltz, Joel; Mirchandaney, Ravi; Berryman, Harry

1989-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Saltz, Joel; Crowley, Kathleen; Mirchandaney, Ravi; Berryman, Harry

1990-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.

A symmetry measure for damage detection with mode shapes

NASA Astrophysics Data System (ADS)

Chen, Justin G.; Büyüköztürk, Oral

2017-11-01

This paper introduces a feature for detecting damage or changes in structures, the continuous symmetry measure, which can quantify the amount of a particular rotational, mirror, or translational symmetry in a mode shape of a structure. Many structures in the built environment have geometries that are either symmetric or almost symmetric, however damage typically occurs in a local manner causing asymmetric changes in the structure's geometry or material properties, and alters its mode shapes. The continuous symmetry measure can quantify these changes in symmetry as a novel indicator of damage for data-based structural health monitoring approaches. This paper describes the concept as a basis for detecting changes in mode shapes and detecting structural damage. Application of the method is demonstrated in various structures with different symmetrical properties: a pipe cross-section with a finite element model and experimental study, the NASA 8-bay truss model, and the simulated IASC-ASCE structural health monitoring benchmark structure. The applicability and limitations of the feature in applying it to structures of varying geometries is discussed.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

PubMed Central

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-01-01

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
Scanned-probe field-emission studies of vertically aligned carbon nanofibers

NASA Astrophysics Data System (ADS)

Merkulov, Vladimir I.; Lowndes, Douglas H.; Baylor, Larry R.

2001-02-01

Field emission properties of dense and sparse "forests" of randomly placed, vertically aligned carbon nanofibers (VACNFs) were studied using a scanned probe with a small tip diameter of ˜1 μm. The probe was scanned in directions perpendicular and parallel to the sample plane, which allowed for measuring not only the emission turn-on field at fixed locations but also the emission site density over large surface areas. The results show that dense forests of VACNFs are not good field emitters as they require high extracting (turn-on) fields. This is attributed to the screening of the local electric field by the neighboring VACNFs. In contrast, sparse forests of VACNFs exhibit moderate-to-low turn-on fields as well as high emission site and current densities, and long emission lifetime, which makes them very promising for various field emission applications.
Realization of preconditioned Lanczos and conjugate gradient algorithms on optical linear algebra processors.

PubMed

Ghosh, A

1988-08-01

Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors is described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. The effects of optical round-off errors on the solutions obtained by the optical Lanczos and conjugate gradient algorithms are analyzed, and it is shown that optical preconditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments on solving linear systems of equations arising from partial differential equations are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is also described.
A Layered Searchable Encryption Scheme with Functional Components Independent of Encryption Methods

PubMed Central

Luo, Guangchun; Qin, Ke

2014-01-01

Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted server, which is especially suitable for protecting sensitive data in the cloud. However, various settings (based on symmetric or asymmetric encryption) and functionalities (ranked keyword query, range query, phrase query, etc.) are often realized by different methods with different searchable structures that are generally not compatible with each other, which limits the scope of application and hinders the functional extensions. We prove that asymmetric searchable structure could be converted to symmetric structure, and functions could be modeled separately apart from the core searchable structure. Based on this observation, we propose a layered searchable encryption (LSE) scheme, which provides compatibility, flexibility, and security for various settings and functionalities. In this scheme, the outputs of the core searchable component based on either symmetric or asymmetric setting are converted to some uniform mappings, which are then transmitted to loosely coupled functional components to further filter the results. In such a way, all functional components could directly support both symmetric and asymmetric settings. Based on LSE, we propose two representative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query (previously only available in asymmetric scheme). PMID:24719565
How the Sequence of a Gene Specifies Structural Symmetry in Proteins

PubMed Central

Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin

2015-01-01

Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668
Contemporary ultrasonic signal processing approaches for nondestructive evaluation of multilayered structures

NASA Astrophysics Data System (ADS)

Zhang, Guang-Ming; Harvey, David M.

2012-03-01

Various signal processing techniques have been used for the enhancement of defect detection and defect characterisation. Cross-correlation, filtering, autoregressive analysis, deconvolution, neural network, wavelet transform and sparse signal representations have all been applied in attempts to analyse ultrasonic signals. In ultrasonic nondestructive evaluation (NDE) applications, a large number of materials have multilayered structures. NDE of multilayered structures leads to some specific problems, such as penetration, echo overlap, high attenuation and low signal-to-noise ratio. The signals recorded from a multilayered structure are a class of very special signals comprised of limited echoes. Such signals can be assumed to have a sparse representation in a proper signal dictionary. Recently, a number of digital signal processing techniques have been developed by exploiting the sparse constraint. This paper presents a review of research to date, showing the up-to-date developments of signal processing techniques made in ultrasonic NDE. A few typical ultrasonic signal processing techniques used for NDE of multilayered structures are elaborated. The practical applications and limitations of different signal processing methods in ultrasonic NDE of multilayered structures are analysed.
A new wavelet transform to sparsely represent cortical current densities for EEG/MEG inverse problems.

PubMed

Liao, Ke; Zhu, Min; Ding, Lei

2013-08-01

The present study investigated the use of transform sparseness of cortical current density on human brain surface to improve electroencephalography/magnetoencephalography (EEG/MEG) inverse solutions. Transform sparseness was assessed by evaluating compressibility of cortical current densities in transform domains. To do that, a structure compression method from computer graphics was first adopted to compress cortical surface structure, either regular or irregular, into hierarchical multi-resolution meshes. Then, a new face-based wavelet method based on generated multi-resolution meshes was proposed to compress current density functions defined on cortical surfaces. Twelve cortical surface models were built by three EEG/MEG softwares and their structural compressibility was evaluated and compared by the proposed method. Monte Carlo simulations were implemented to evaluate the performance of the proposed wavelet method in compressing various cortical current density distributions as compared to other two available vertex-based wavelet methods. The present results indicate that the face-based wavelet method can achieve higher transform sparseness than vertex-based wavelet methods. Furthermore, basis functions from the face-based wavelet method have lower coherence against typical EEG and MEG measurement systems than vertex-based wavelet methods. Both high transform sparseness and low coherent measurements suggest that the proposed face-based wavelet method can improve the performance of L1-norm regularized EEG/MEG inverse solutions, which was further demonstrated in simulations and experimental setups using MEG data. Thus, this new transform on complicated cortical structure is promising to significantly advance EEG/MEG inverse source imaging technologies. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Dealing with Activism in Canada: An Ideal Cultural Fit for the Two-Way Symmetrical Public Relations Model.

ERIC Educational Resources Information Center

Guiniven, John E.

2002-01-01

Notes that disputes are seen as much less confrontational, much less zero-sum games in Canada than in the United States. Interviews 15 communications and public relations practitioners and professors with experiences on both sides of the 49th parallel and reviews relevant literature. Concludes that the greater acceptance of two-way symmetrical…
The effect of shear wall location in resisting earthquake

NASA Astrophysics Data System (ADS)

Tarigan, J.; Manggala, J.; Sitorus, T.

2018-02-01

Shear wall is one of lateral resisting structure which is used commonly. Shear wall gives high stiffness to the structure so as the structure will be stable. Applying shear wall can effectively reduce the displacement and story-drift of the structure. This will reduce the destruction comes from lateral loads such as an earthquake. Earlier studies showed that shear wall gives different performance based on its position in structures. In this paper, seismic analysis has been performed using response spectrum method for different Model of structures; they are the open frame, the shear wall at core symmetrically, the shear wall at periphery symmetrically, and the shear wall at periphery asymmetrically. The results are observed by comparing the displacement and story-drift. Based on the analysis, the placement of shear wall at the core of structure symmetrically gives the best performance to reduce the displacement and story-drift. It can reduce the displacement up to 61.16% (X-dir) and 70.60% (Y-dir). The placement of shear wall at periphery symmetrically will reduce the displacement up to 53.85% (X-dir) and 47.87% (Y-dir) while the placement of shear wall at periphery asymmetrically reducing the displacement up to 59.42% (X-dir) and 66.99% (Y-dir).
Compressed sensing for high-resolution nonlipid suppressed 1 H FID MRSI of the human brain at 9.4T.

PubMed

Nassirpour, Sahar; Chang, Paul; Avdievitch, Nikolai; Henning, Anke

2018-04-29

The aim of this study was to apply compressed sensing to accelerate the acquisition of high resolution metabolite maps of the human brain using a nonlipid suppressed ultra-short TR and TE 1 H FID MRSI sequence at 9.4T. X-t sparse compressed sensing reconstruction was optimized for nonlipid suppressed 1 H FID MRSI data. Coil-by-coil x-t sparse reconstruction was compared with SENSE x-t sparse and low rank reconstruction. The effect of matrix size and spatial resolution on the achievable acceleration factor was studied. Finally, in vivo metabolite maps with different acceleration factors of 2, 4, 5, and 10 were acquired and compared. Coil-by-coil x-t sparse compressed sensing reconstruction was not able to reliably recover the nonlipid suppressed data, rather a combination of parallel and sparse reconstruction was necessary (SENSE x-t sparse). For acceleration factors of up to 5, both the low-rank and the compressed sensing methods were able to reconstruct the data comparably well (root mean squared errors [RMSEs] ≤ 10.5% for Cre). However, the reconstruction time of the low rank algorithm was drastically longer than compressed sensing. Using the optimized compressed sensing reconstruction, acceleration factors of 4 or 5 could be reached for the MRSI data with a matrix size of 64 × 64. For lower spatial resolutions, an acceleration factor of up to R∼4 was successfully achieved. By tailoring the reconstruction scheme to the nonlipid suppressed data through parameter optimization and performance evaluation, we present high resolution (97 µL voxel size) accelerated in vivo metabolite maps of the human brain acquired at 9.4T within scan times of 3 to 3.75 min. © 2018 International Society for Magnetic Resonance in Medicine.
Sparse Adaptive Iteratively-Weighted Thresholding Algorithm (SAITA) for Lp-Regularization Using the Multiple Sub-Dictionary Representation

PubMed Central

Zhang, Jie; Fan, Shangang; Xiong, Jian; Cheng, Xiefeng; Sari, Hikmet; Adachi, Fumiyuki

2017-01-01

Both L1/2 and L2/3 are two typical non-convex regularizations of Lp (0
Sparse Adaptive Iteratively-Weighted Thresholding Algorithm (SAITA) for Lp-Regularization Using the Multiple Sub-Dictionary Representation.

PubMed

Li, Yunyi; Zhang, Jie; Fan, Shangang; Yang, Jie; Xiong, Jian; Cheng, Xiefeng; Sari, Hikmet; Adachi, Fumiyuki; Gui, Guan

2017-12-15

Both L 1/2 and L 2/3 are two typical non-convex regularizations of L p (0
BinTree Seeking: A Novel Approach to Mine Both Bi-Sparse and Cohesive Modules in Protein Interaction Networks

PubMed Central

Shen, Hong-Bin

2011-01-01

Modern science of networks has brought significant advances to our understanding of complex systems biology. As a representative model of systems biology, Protein Interaction Networks (PINs) are characterized by a remarkable modular structures, reflecting functional associations between their components. Many methods were proposed to capture cohesive modules so that there is a higher density of edges within modules than those across them. Recent studies reveal that cohesively interacting modules of proteins is not a universal organizing principle in PINs, which has opened up new avenues for revisiting functional modules in PINs. In this paper, functional clusters in PINs are found to be able to form unorthodox structures defined as bi-sparse module. In contrast to the traditional cohesive module, the nodes in the bi-sparse module are sparsely connected internally and densely connected with other bi-sparse or cohesive modules. We present a novel protocol called the BinTree Seeking (BTS) for mining both bi-sparse and cohesive modules in PINs based on Edge Density of Module (EDM) and matrix theory. BTS detects modules by depicting links and nodes rather than nodes alone and its derivation procedure is totally performed on adjacency matrix of networks. The number of modules in a PIN can be automatically determined in the proposed BTS approach. BTS is tested on three real PINs and the results demonstrate that functional modules in PINs are not dominantly cohesive but can be sparse. BTS software and the supporting information are available at: www.csbio.sjtu.edu.cn/bioinf/BTS/. PMID:22140454
Simultaneous analysis of large INTEGRAL/SPI1 datasets: Optimizing the computation of the solution and its variance using sparse matrix algorithms

NASA Astrophysics Data System (ADS)

Bouchet, L.; Amestoy, P.; Buttari, A.; Rouet, F.-H.; Chauvin, M.

2013-02-01

Nowadays, analyzing and reducing the ever larger astronomical datasets is becoming a crucial challenge, especially for long cumulated observation times. The INTEGRAL/SPI X/γ-ray spectrometer is an instrument for which it is essential to process many exposures at the same time in order to increase the low signal-to-noise ratio of the weakest sources. In this context, the conventional methods for data reduction are inefficient and sometimes not feasible at all. Processing several years of data simultaneously requires computing not only the solution of a large system of equations, but also the associated uncertainties. We aim at reducing the computation time and the memory usage. Since the SPI transfer function is sparse, we have used some popular methods for the solution of large sparse linear systems; we briefly review these methods. We use the Multifrontal Massively Parallel Solver (MUMPS) to compute the solution of the system of equations. We also need to compute the variance of the solution, which amounts to computing selected entries of the inverse of the sparse matrix corresponding to our linear system. This can be achieved through one of the latest features of the MUMPS software that has been partly motivated by this work. In this paper we provide a brief presentation of this feature and evaluate its effectiveness on astrophysical problems requiring the processing of large datasets simultaneously, such as the study of the entire emission of the Galaxy. We used these algorithms to solve the large sparse systems arising from SPI data processing and to obtain both their solutions and the associated variances. In conclusion, thanks to these newly developed tools, processing large datasets arising from SPI is now feasible with both a reasonable execution time and a low memory usage.
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pinski, Peter; Riplinger, Christoph; Neese, Frank, E-mail: evaleev@vt.edu, E-mail: frank.neese@cec.mpg.de

2015-07-21

In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implementsmore » sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.« less
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals.

PubMed

Pinski, Peter; Riplinger, Christoph; Valeev, Edward F; Neese, Frank

2015-07-21

In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implements sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.
Parallel momentum input by tangential neutral beam injections in stellarator and heliotron plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishimura, S., E-mail: nishimura.shin@lhd.nifs.ac.jp; Nakamura, Y.; Nishioka, K.

The configuration dependence of parallel momentum inputs to target plasma particle species by tangentially injected neutral beams is investigated in non-axisymmetric stellarator/heliotron model magnetic fields by assuming the existence of magnetic flux-surfaces. In parallel friction integrals of the full Rosenbluth-MacDonald-Judd collision operator in thermal particles' kinetic equations, numerically obtained eigenfunctions are used for excluding trapped fast ions that cannot contribute to the friction integrals. It is found that the momentum inputs to thermal ions strongly depend on magnetic field strength modulations on the flux-surfaces, while the input to electrons is insensitive to the modulation. In future plasma flow studies requiringmore » flow calculations of all particle species in more general non-symmetric toroidal configurations, the eigenfunction method investigated here will be useful.« less
Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
Low bias negative differential conductance and reversal of current in coupled quantum dots in different topological configurations

NASA Astrophysics Data System (ADS)

Devi, Sushila; Brogi, B. B.; Ahluwalia, P. K.; Chand, S.

2018-06-01

Electronic transport through asymmetric parallel coupled quantum dot system hybridized between normal leads has been investigated theoretically in the Coulomb blockade regime by using Non-Equilibrium Green Function formalism. A new decoupling scheme proposed by Rabani and his co-workers has been adopted to close the chain of higher order Green's functions appearing in the equations of motion. For resonant tunneling case; the calculations of current and differential conductance have been presented during transition of coupled quantum dot system from series to symmetric parallel configuration. It has been found that during this transition, increase in current and differential conductance of the system occurs. Furthermore, clear signatures of negative differential conductance and negative current appear in series case, both of which disappear when topology of system is tuned to asymmetric parallel configuration.

MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
PsiQuaSP-A library for efficient computation of symmetric open quantum systems.

PubMed

Gegg, Michael; Richter, Marten

2017-11-24

In a recent publication we showed that permutation symmetry reduces the numerical complexity of Lindblad quantum master equations for identical multi-level systems from exponential to polynomial scaling. This is important for open system dynamics including realistic system bath interactions and dephasing in, for instance, the Dicke model, multi-Λ system setups etc. Here we present an object-oriented C++ library that allows to setup and solve arbitrary quantum optical Lindblad master equations, especially those that are permutationally symmetric in the multi-level systems. PsiQuaSP (Permutation symmetry for identical Quantum Systems Package) uses the PETSc package for sparse linear algebra methods and differential equations as basis. The aim of PsiQuaSP is to provide flexible, storage efficient and scalable code while being as user friendly as possible. It is easily applied to many quantum optical or quantum information systems with more than one multi-level system. We first review the basics of the permutation symmetry for multi-level systems in quantum master equations. The application of PsiQuaSP to quantum dynamical problems is illustrated with several typical, simple examples of open quantum optical systems.
Knowledge-Sparse and Knowledge-Rich Learning in Information Retrieval.

ERIC Educational Resources Information Center

Rada, Roy

1987-01-01

Reviews aspects of the relationship between machine learning and information retrieval. Highlights include learning programs that extend from knowledge-sparse learning to knowledge-rich learning; the role of the thesaurus; knowledge bases; artificial intelligence; weighting documents; work frequency; and merging classification structures. (78…
Layout Study and Application of Mobile App Recommendation Approach Based On Spark Streaming Framework

NASA Astrophysics Data System (ADS)

Wang, H. T.; Chen, T. T.; Yan, C.; Pan, H.

2018-05-01

For App recommended areas of mobile phone software, made while using conduct App application recommended combined weighted Slope One algorithm collaborative filtering algorithm items based on further improvement of the traditional collaborative filtering algorithm in cold start, data matrix sparseness and other issues, will recommend Spark stasis parallel algorithm platform, the introduction of real-time streaming streaming real-time computing framework to improve real-time software applications recommended.
Sparse matrix methods research using the CSM testbed software system

NASA Technical Reports Server (NTRS)

Chu, Eleanor; George, J. Alan

1989-01-01

Research is described on sparse matrix techniques for the Computational Structural Mechanics (CSM) Testbed. The primary objective was to compare the performance of state-of-the-art techniques for solving sparse systems with those that are currently available in the CSM Testbed. Thus, one of the first tasks was to become familiar with the structure of the testbed, and to install some or all of the SPARSPAK package in the testbed. A suite of subroutines to extract from the data base the relevant structural and numerical information about the matrix equations was written, and all the demonstration problems distributed with the testbed were successfully solved. These codes were documented, and performance studies comparing the SPARSPAK technology to the methods currently in the testbed were completed. In addition, some preliminary studies were done comparing some recently developed out-of-core techniques with the performance of the testbed processor INV.
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
Evacuated optical structure comprising optical bench mounted to sidewall of vacuum chamber in a manner which inhibits deflection and rotation of the optical bench

DOEpatents

Bowers, Joel M.

1994-01-01

An improved evacuated optical structure is disclosed comprising an optical bench mounted in a vacuum vessel in a manner which inhibits transmission of movement of the vacuum vessel to the optical bench, yet provides a compact and economical structure. The vacuum vessel is mounted, through a sidewall thereof, to a support wall at four symmetrically positioned and spaced apart areas, each of which comprises a symmetrically positioned group of mounting structures passing through the sidewall of the vacuum vessel. The optical bench is pivotally secured to the vacuum vessel by four symmetrically spaced apart bolts and spherical bearings, each of which is centrally positioned within one of the four symmetrically positioned groups of vacuum vessel mounting structures. Cover plates and o-ring seals are further provided to seal the vacuum vessel mounting structures from the interior of the vacuum vessel, and venting bores are provided to vent trapped gases in the bores used to secure the cover plates and o-rings to the vacuum vessel. Provision for detecting leaks in the mounting structures from the rear surface of the vacuum vessel sidewall facing the support wall are also provided. Deflection to the optical bench within the vacuum vessel is further minimized by tuning the structure for a resonant frequency of at least 100 Hertz.
Evacuated optical structure comprising optical bench mounted to sidewall of vacuum chamber in a manner which inhibits deflection and rotation of the optical bench

DOEpatents

Bowers, J.M.

1994-04-19

An improved evacuated optical structure is disclosed comprising an optical bench mounted in a vacuum vessel in a manner which inhibits transmission of movement of the vacuum vessel to the optical bench, yet provides a compact and economical structure. The vacuum vessel is mounted, through a sidewall thereof, to a support wall at four symmetrically positioned and spaced apart areas, each of which comprises a symmetrically positioned group of mounting structures passing through the sidewall of the vacuum vessel. The optical bench is pivotally secured to the vacuum vessel by four symmetrically spaced apart bolts and spherical bearings, each of which is centrally positioned within one of the four symmetrically positioned groups of vacuum vessel mounting structures. Cover plates and o-ring seals are further provided to seal the vacuum vessel mounting structures from the interior of the vacuum vessel, and venting bores are provided to vent trapped gases in the bores used to secure the cover plates and o-rings to the vacuum vessel. Provision for detecting leaks in the mounting structures from the rear surface of the vacuum vessel sidewall facing the support wall are also provided. Deflection to the optical bench within the vacuum vessel is further minimized by tuning the structure for a resonant frequency of at least 100 Hertz. 10 figures.
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science.

PubMed

Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H

2014-05-28

Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the field of electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias.

PubMed

Trippas, Dries; Thompson, Valerie A; Handley, Simon J

2017-05-01

Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was valid on half the trials or to decide whether the conclusion was believable on the other half. When belief and logic conflict, the default-interventionist view predicts that it should take less time to respond on the basis of belief than logic, and that the believability of a conclusion should interfere with judgments of validity, but not the reverse. The parallel-processing view predicts that beliefs should interfere with logic judgments only if the processing required to evaluate the logical structure exceeds that required to evaluate the knowledge necessary to make a belief-based judgment, and vice versa otherwise. Consistent with this latter view, for the simplest reasoning problems (modus ponens), judgments of belief resulted in lower accuracy than judgments of validity, and believability interfered more with judgments of validity than the converse. For problems of moderate complexity (modus tollens and single-model syllogisms), the interference was symmetrical, in that validity interfered with belief judgments to the same degree that believability interfered with validity judgments. For the most complex (three-term multiple-model syllogisms), conclusion believability interfered more with judgments of validity than vice versa, in spite of the significant interference from conclusion validity on judgments of belief.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Long waves in parallel flow in Hele-Shaw cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zeybek, M.; Yortsos, Y.C.

During the past several years the flow of immiscible flow in Hele-Shaw cells and porous media has been investigated extensively. Of particular interest to most studies has been frontal displacement, specifically viscous fingering instabilities and finger growth. The practical ramifications regarding oil recovery, as well as many other industrial processes in porous media, have served as the primary driving force for most of these investigations. By contrast, little attention has been paid to the motion of lateral fluid interface, which are parallel to the main flow direction. Parallel flow is an often encountered, although much overlooked regime. The evolution ofmore » fluid interfaces in parallel flow in Hele-Shaw cells is studied both theoretically and experimentally in the large capillary number limit. It is shown that such interfaces support wave motion, the amplitude of which for long waves is governed by the KdV equation. Experiments are conducted in a long Hele-Shaw cell that validate the theory in the symmetric case. 35 refs., 16 figs.« less
Parallel implementation of geometrical shock dynamics for two dimensional converging shock waves

NASA Astrophysics Data System (ADS)

Qiu, Shi; Liu, Kuang; Eliasson, Veronica

2016-10-01

Geometrical shock dynamics (GSD) theory is an appealing method to predict the shock motion in the sense that it is more computationally efficient than solving the traditional Euler equations, especially for converging shock waves. However, to solve and optimize large scale configurations, the main bottleneck is the computational cost. Among the existing numerical GSD schemes, there is only one that has been implemented on parallel computers, with the purpose to analyze detonation waves. To extend the computational advantage of the GSD theory to more general applications such as converging shock waves, a numerical implementation using a spatial decomposition method has been coupled with a front tracking approach on parallel computers. In addition, an efficient tridiagonal system solver for massively parallel computers has been applied to resolve the most expensive function in this implementation, resulting in an efficiency of 0.93 while using 32 HPCC cores. Moreover, symmetric boundary conditions have been developed to further reduce the computational cost, achieving a speedup of 19.26 for a 12-sided polygonal converging shock.
Pressure driven currents near magnetic islands in 3D MHD equilibria: Effects of pressure variation within flux surfaces and of symmetry

NASA Astrophysics Data System (ADS)

Reiman, Allan H.

2016-07-01

In toroidal, magnetically confined plasmas, the heat and particle transport is strongly anisotropic, with transport along the field lines sufficiently strong relative to cross-field transport that the equilibrium pressure can generally be regarded as constant on the flux surfaces in much of the plasma. The regions near small magnetic islands, and those near the X-lines of larger islands, are exceptions, having a significant variation of the pressure within the flux surfaces. It is shown here that the variation of the equilibrium pressure within the flux surfaces in those regions has significant consequences for the pressure driven currents. It is further shown that the consequences are strongly affected by the symmetry of the magnetic field if the field is invariant under combined reflection in the poloidal and toroidal angles. (This symmetry property is called "stellarator symmetry.") In non-stellarator-symmetric equilibria, the pressure-driven currents have logarithmic singularities at the X-lines. In stellarator-symmetric MHD equilibria, the singular components of the pressure-driven currents vanish. These equilibria are to be contrasted with equilibria having B ṡ∇p =0 , where the singular components of the pressure-driven currents vanish regardless of the symmetry. They are also to be contrasted with 3D MHD equilibrium solutions that are constrained to have simply nested flux surfaces, where the pressure-driven current goes like 1 /x near rational surfaces, where x is the distance from the rational surface, except in the case of quasi-symmetric flux surfaces. For the purpose of calculating the pressure-driven currents near magnetic islands, we work with a closed subset of the MHD equilibrium equations that involves only perpendicular force balance, and is decoupled from parallel force balance. It is not correct to use the parallel component of the conventional MHD force balance equation, B ṡ∇p =0 , near magnetic islands. Small but nonzero values of B ṡ∇p are important in this region, and small non-MHD contributions to the parallel force balance equation cannot be neglected there. Two approaches are pursued to solve our equations for the pressure driven currents. First, the equilibrium equations are applied to an analytically tractable magnetic field with an island, obtaining explicit expressions for the rotational transform and magnetic coordinates, and for the pressure-driven current and its limiting behavior near the X-line. The second approach utilizes an expansion about the X-line to provide a more general calculation of the pressure-driven current near an X-line and of the rotational transform near a separatrix. The study presented in this paper is motivated, in part, by tokamak experiments with nonaxisymmetric magnetic perturbations, where significant differences are observed between the behavior of stellarator-symmetric and non-stellarator-symmetric configurations with regard to stabilization of edge localized modes by resonant magnetic perturbations. Implications for the coupling between neoclassical tearing modes, and for magnetic island stability calculations, are also discussed.
Peptide Conformation and Supramolecular Organization in Amylin Fibrils: Constraints from Solid State NMR

PubMed Central

Luca, Sorin; Yau, Wai-Ming; Leapman, Richard; Tycko, Robert

2008-01-01

The 37-residue amylin peptide, also known as islet amyloid polypeptide, forms fibrils that are the main peptide or protein component of amyloid that develops in the pancreas of type 2 diabetes patients. Amylin also readily forms amyloid fibrils in vitro that are highly polymorphic under typical experimental conditions. We describe a protocol for the preparation of synthetic amylin fibrils that exhibit a single predominant morphology, which we call a striated ribbon, in electron microscope and atomic force microscope images. Solid state nuclear magnetic resonance (NMR) measurements on a series of isotopically labeled samples indicate a single molecular structure within the striated ribbons. We use scanning transmission electron microscopy and several types of one-dimensional and two-dimensional solid state NMR techniques to obtain constraints on the peptide conformation and supramolecular structure in these amylin fibrils, and derive molecular structural models that are consistent with the experimental data. The basic structural unit in amylin striated ribbons, which we call the protofilament, contains four-layers of parallel β-sheets, formed by two symmetric layers of amylin molecules. The molecular structure of amylin protofilaments in striated ribbons closely resembles the protofilament in amyloid fibrils with similar morphology formed by the 40-residue β-amyloid peptide that is associated with Alzheimer's disease. PMID:17979302
DOE Office of Scientific and Technical Information (OSTI.GOV)

Spotz, William F.

PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
A novel structured dictionary for fast processing of 3D medical images, with application to computed tomography restoration and denoising

NASA Astrophysics Data System (ADS)

Karimi, Davood; Ward, Rabab K.

2016-03-01

Sparse representation of signals in learned overcomplete dictionaries has proven to be a powerful tool with applications in denoising, restoration, compression, reconstruction, and more. Recent research has shown that learned overcomplete dictionaries can lead to better results than analytical dictionaries such as wavelets in almost all image processing applications. However, a major disadvantage of these dictionaries is that their learning and usage is very computationally intensive. In particular, finding the sparse representation of a signal in these dictionaries requires solving an optimization problem that leads to very long computational times, especially in 3D image processing. Moreover, the sparse representation found by greedy algorithms is usually sub-optimal. In this paper, we propose a novel two-level dictionary structure that improves the performance and the speed of standard greedy sparse coding methods. The first (i.e., the top) level in our dictionary is a fixed orthonormal basis, whereas the second level includes the atoms that are learned from the training data. We explain how such a dictionary can be learned from the training data and how the sparse representation of a new signal in this dictionary can be computed. As an application, we use the proposed dictionary structure for removing the noise and artifacts in 3D computed tomography (CT) images. Our experiments with real CT images show that the proposed method achieves results that are comparable with standard dictionary-based methods while substantially reducing the computational time.
Polarization independent thermally tunable erbium-doped fiber amplifier gain equalizer using a cascaded Mach-Zehnder coupler.

PubMed

Sahu, P P

2008-02-10

A thermally tunable erbium-doped fiber amplifier (EDFA) gain equalizer filter based on compact point symmetric cascaded Mach-Zehnder (CMZ) coupler is presented with its mathematical model and is found to be polarization dependent due to stress anisotropy caused by local heating for thermo-optic phase change from its mathematical analysis. A thermo-optic delay line structure with a stress releasing groove is proposed and designed for the reduction of polarization dependent characteristics of the high index contrast point symmetric delay line structure of the device. It is found from thermal analysis by using an implicit finite difference method that temperature gradients of the proposed structure, which mainly causes the release of stress anisotropy, is approximately nine times more than that of the conventional structure. It is also seen that the EDFA gain equalized spectrum by using the point symmetric CMZ device based on the proposed structure is almost polarization independent.
Non-Random Inversion Landscapes in Prokaryotic Genomes Are Shaped by Heterogeneous Selection Pressures

PubMed Central

Repar, Jelena; Warnecke, Tobias

2017-01-01

Abstract Inversions are a major contributor to structural genome evolution in prokaryotes. Here, using a novel alignment-based method, we systematically compare 1,651 bacterial and 98 archaeal genomes to show that inversion landscapes are frequently biased toward (symmetric) inversions around the origin–terminus axis. However, symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades. At the extremes, inversion landscapes in Bacillus–Clostridium and Actinobacteria are dominated by symmetric inversions, while there is little or no systematic bias favoring symmetric rearrangements in archaea with a single origin of replication. Within clades, we find strong but clade-specific relationships between symmetric inversion bias and different features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand. We suggest that heterogeneous selection pressures have converged to produce similar patterns of structural genome evolution across prokaryotes. PMID:28407093
Scalable Static and Dynamic Community Detection Using Grappolo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman

Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less

Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Univariate and multivariate molecular spectral analyses of lipid related molecular structural components in relation to nutrient profile in feed and food mixtures

NASA Astrophysics Data System (ADS)

Abeysekara, Saman; Damiran, Daalkhaijav; Yu, Peiqiang

2013-02-01

The objectives of this study were (i) to determine lipid related molecular structures components (functional groups) in feed combination of cereal grain (barley, Hordeum vulgare) and wheat (Triticum aestivum) based dried distillers grain solubles (wheat DDGSs) from bioethanol processing at five different combination ratios using univariate and multivariate molecular spectral analyses with infrared Fourier transform molecular spectroscopy, and (ii) to correlate lipid-related molecular-functional structure spectral profile to nutrient profiles. The spectral intensity of (i) CH3 asymmetric, CH2 asymmetric, CH3 symmetric and CH2 symmetric groups, (ii) unsaturation (Cdbnd C) group, and (iii) carbonyl ester (Cdbnd O) group were determined. Spectral differences of functional groups were detected by hierarchical cluster analysis (HCA) and principal components analysis (PCA). The results showed that the combination treatments significantly inflicted modifications (P < 0.05) in nutrient profile and lipid related molecular spectral intensity (CH2 asymmetric stretching peak height, CH2 symmetric stretching peak height, ratio of CH2 to CH3 symmetric stretching peak intensity, and carbonyl peak area). Ratio of CH2 to CH3 symmetric stretching peak intensity, and carbonyl peak significantly correlated with nutrient profiles. Both PCA and HCA differentiated lipid-related spectrum. In conclusion, the changes of lipid molecular structure spectral profiles through feed combination could be detected using molecular spectroscopy. These changes were associated with nutrient profiles and functionality.
Mesh-free data transfer algorithms for partitioned multiphysics problems: Conservation, accuracy, and parallelism

DOE PAGES

Slattery, Stuart R.

2015-12-02

In this study we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residual method using the common-refinement scheme and two mesh-free algorithms leveraging compactly supported radial basis functions: one using a spline interpolation and one using a moving least square reconstruction. Through the comparison we assess both the conservation and accuracy of the data transfer obtained from each of the methods. We do so for a varying set of geometries with and without curvature and sharp features and for functions with and without smoothnessmore » and with varying gradients. Our results show that the mesh-based and mesh-free algorithms are complementary with cases where each was demonstrated to perform better than the other. We then focus on the mesh-free methods by developing a set of algorithms to parallelize them based on sparse linear algebra techniques. This includes a discussion of fast parallel radius searching in point clouds and restructuring the interpolation algorithms to leverage data structures and linear algebra services designed for large distributed computing environments. The scalability of our new algorithms is demonstrated on a leadership class computing facility using a set of basic scaling studies. Finally, these scaling studies show that for problems with reasonable load balance, our new algorithms for both spline interpolation and moving least square reconstruction demonstrate both strong and weak scalability using more than 100,000 MPI processes with billions of degrees of freedom in the data transfer operation.« less
Parallel Symmetric Eigenvalue Problem Solvers

DTIC Science & Technology

2015-05-01

tutoring, and mentoring experience as an undergraduate. Last but not least, I thank my family for their love and support. v TABLE OF CONTENTS Page LIST...34 4.6.2 Choice of the Ritz shifts . . . . . . . . . . . . . . . . . . . . 38 4.7 Relationship between TraceMin and...which are determined by the Ritz values of the matrix pencil. We conclude with a discussion of the relationship between TraceMin and simultaneous
The Poisson-Boltzmann theory for the two-plates problem: some exact results.

PubMed

Xing, Xiang-Jun

2011-12-01

The general solution to the nonlinear Poisson-Boltzmann equation for two parallel charged plates, either inside a symmetric electrolyte, or inside a 2q:-q asymmetric electrolyte, is found in terms of Weierstrass elliptic functions. From this we derive some exact asymptotic results for the interaction between charged plates, as well as the exact form of the renormalized surface charge density.
Assessment of the biophysical characteristics of rangeland community using scatterometer and optical measurements

NASA Technical Reports Server (NTRS)

Kanemasu, E. T.; Asrar, Ghassem; Myneni, Ranga; Martin, Robert, Jr.; Burnett, R. Bruce

1987-01-01

Research activities for the following study areas are summarized: single scattering of parallel direct and axially symmetric diffuse solar radiation in vegetative canopies; the use of successive orders of scattering approximations (SOSA) for treating multiple scattering in a plant canopy; reflectance of a soybean canopy using the SOSA method; and C-band scatterometer measurements of the Konza tallgrass prairie.
Normalization for sparse encoding of odors by a wide-field interneuron.

PubMed

Papadopoulou, Maria; Cassenaer, Stijn; Nowotny, Thomas; Laurent, Gilles

2011-05-06

Sparse coding presents practical advantages for sensory representations and memory storage. In the insect olfactory system, the representation of general odors is dense in the antennal lobes but sparse in the mushroom bodies, only one synapse downstream. In locusts, this transformation relies on the oscillatory structure of antennal lobe output, feed-forward inhibitory circuits, intrinsic properties of mushroom body neurons, and connectivity between antennal lobe and mushroom bodies. Here we show the existence of a normalizing negative-feedback loop within the mushroom body to maintain sparse output over a wide range of input conditions. This loop consists of an identifiable "giant" nonspiking inhibitory interneuron with ubiquitous connectivity and graded release properties.
Sparse dictionary learning of resting state fMRI networks.

PubMed

Eavani, Harini; Filipovych, Roman; Davatzikos, Christos; Satterthwaite, Theodore D; Gur, Raquel E; Gur, Ruben C

2012-07-02

Research in resting state fMRI (rsfMRI) has revealed the presence of stable, anti-correlated functional subnetworks in the brain. Task-positive networks are active during a cognitive process and are anti-correlated with task-negative networks, which are active during rest. In this paper, based on the assumption that the structure of the resting state functional brain connectivity is sparse, we utilize sparse dictionary modeling to identify distinct functional sub-networks. We propose two ways of formulating the sparse functional network learning problem that characterize the underlying functional connectivity from different perspectives. Our results show that the whole-brain functional connectivity can be concisely represented with highly modular, overlapping task-positive/negative pairs of sub-networks.
BI-sparsity pursuit for robust subspace recovery

DOE PAGES

Bian, Xiao; Krim, Hamid

2015-09-01

Here, the success of sparse models in computer vision and machine learning in many real-world applications, may be attributed in large part, to the fact that many high dimensional data are distributed in a union of low dimensional subspaces. The underlying structure may, however, be adversely affected by sparse errors, thus inducing additional complexity in recovering it. In this paper, we propose a bi-sparse model as a framework to investigate and analyze this problem, and provide as a result , a novel algorithm to recover the union of subspaces in presence of sparse corruptions. We additionally demonstrate the effectiveness ofmore » our method by experiments on real-world vision data.« less
Riemann-Hilbert technique scattering analysis of metamaterial-based asymmetric 2D open resonators

NASA Astrophysics Data System (ADS)

Kamiński, Piotr M.; Ziolkowski, Richard W.; Arslanagić, Samel

2017-12-01

The scattering properties of metamaterial-based asymmetric two-dimensional open resonators excited by an electric line source are investigated analytically. The resonators are, in general, composed of two infinite and concentric cylindrical layers covered with an infinitely thin, perfect conducting shell that has an infinite axial aperture. The line source is oriented parallel to the cylinder axis. An exact analytical solution of this problem is derived. It is based on the dual-series approach and its transformation to the equivalent Riemann-Hilbert problem. Asymmetric metamaterial-based configurations are found to lead simultaneously to large enhancements of the radiated power and to highly steerable Huygens-like directivity patterns; properties not attainable with the corresponding structurally symmetric resonators. The presented open resonator designs are thus interesting candidates for many scientific and engineering applications where enhanced directional near- and far-field responses, tailored with beam shaping and steering capabilities, are highly desired.
Integrally formed radio frequency quadrupole

DOEpatents

Abbott, Steven R.

1989-01-01

An improved radio frequency quadrupole (10) is provided having an elongate housing (11) with an elongate central axis (12) and top, bottom and two side walls (13a-d) symmetrically disposed about the axis, and vanes (14a-d) formed integrally with the walls (13a-d), the vanes (14a-d) each having a cross-section at right angles to the central axis (12) which tapers inwardly toward the axis to form electrode tips (15a-d) spaced from each other by predetermined distances. Each of the four walls (13a-d), and the vanes (14a-d) integral therewith, is a separate structural element having a central lengthwise plane (16) passing through the tip of the vane, the walls (13a-d) having flat mounting surfaces (17, 18) at right angles to and parallel to the control plane (16), respectively, which are butted together to position the walls and vane tips relative to each other.
Dermal Aged and Fetal Fibroblasts Realign in Response to Mechanical Strain

NASA Technical Reports Server (NTRS)

Sawyer, Christine; Grymes, Rose; Alvarez, Teresa (Technical Monitor)

1994-01-01

Integrins specifically recognize and bind extracellular matrix components, providing physical anchor points and functional setpoints. Focal adhesion complexes, containing integrin and cytoskeletal proteins, are potential mechanoreceptors, poised to distribute applied forces through the cytoskeleton. Pursuing the hypothesis that cells both perceive and respond to external force, we applied a stretch/relaxation regimen to normal human fetal and aged dermal fibroblast monolayers cultured on flexible membranes. The frequency and magnitude of the applied force is precisely controlled by the Flexercell Unit(Trademark). A protocol of stretch (20% elongation of the monolayer) at a frequency of 6 cycles/min caused a progressive change from a randomly distributed pattern of cells to a symmetric, radial distribution with cells aligned parallel to the applied force. We have coined the term 'orienteering' as the process of active alignment of cells in response to applied force. Cytochalasin D was added in graded doses to investigate the role of the actin cytoskeleton in force perception and transmission. A clear dose response was found; at high concentrations orienteering was abolished; and the drug's impact was reversible. The two cell strains used were similar in their alignment behavior and in their responses to cytochalasin D. Orienteering was influenced by cell density, and the cell strains studied differed in this respect. Fetal cells, unlike their aged counterparts, failed to orient at high cell density. In both cell strains, mid-density cultures aligned rapidly and sparse cultures lagged. These results indicate that both cell-cell adhesion and cytoskeleton integrity are critical in mediating the orienteering response. Differences between these two cell strains may relate to their expression of extracellular matrix molecules (fibronectin, collagen type 1) integrins and their relative binding affinities.
Implicit solvers for unstructured meshes

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Mavriplis, Dimitri J.

1991-01-01

Implicit methods were developed and tested for unstructured mesh computations. The approximate system which arises from the Newton linearization of the nonlinear evolution operator is solved by using the preconditioned GMRES (Generalized Minimum Residual) technique. Three different preconditioners were studied, namely, the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over relaxation (SSOR). The preconditioners were optimized to have good vectorization properties. SSOR and ILU were also studied as iterative schemes. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also studied. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
Preconditioned conjugate residual methods for the solution of spectral equations

NASA Technical Reports Server (NTRS)

Wong, Y. S.; Zang, T. A.; Hussaini, M. Y.

1986-01-01

Conjugate residual methods for the solution of spectral equations are described. An inexact finite-difference operator is introduced as a preconditioner in the iterative procedures. Application of these techniques is limited to problems for which the symmetric part of the coefficient matrix is positive definite. Although the spectral equation is a very ill-conditioned and full matrix problem, the computational effort of the present iterative methods for solving such a system is comparable to that for the sparse matrix equations obtained from the application of either finite-difference or finite-element methods to the same problems. Numerical experiments are shown for a self-adjoint elliptic partial differential equation with Dirichlet boundary conditions, and comparison with other solution procedures for spectral equations is presented.
Accelerating Convolutional Sparse Coding for Curvilinear Structures Segmentation by Refining SCIRD-TS Filter Banks.

PubMed

Annunziata, Roberto; Trucco, Emanuele

2016-11-01

Deep learning has shown great potential for curvilinear structure (e.g., retinal blood vessels and neurites) segmentation as demonstrated by a recent auto-context regression architecture based on filter banks learned by convolutional sparse coding. However, learning such filter banks is very time-consuming, thus limiting the amount of filters employed and the adaptation to other data sets (i.e., slow re-training). We address this limitation by proposing a novel acceleration strategy to speed-up convolutional sparse coding filter learning for curvilinear structure segmentation. Our approach is based on a novel initialisation strategy (warm start), and therefore it is different from recent methods improving the optimisation itself. Our warm-start strategy is based on carefully designed hand-crafted filters (SCIRD-TS), modelling appearance properties of curvilinear structures which are then refined by convolutional sparse coding. Experiments on four diverse data sets, including retinal blood vessels and neurites, suggest that the proposed method reduces significantly the time taken to learn convolutional filter banks (i.e., up to -82%) compared to conventional initialisation strategies. Remarkably, this speed-up does not worsen performance; in fact, filters learned with the proposed strategy often achieve a much lower reconstruction error and match or exceed the segmentation performance of random and DCT-based initialisation, when used as input to a random forest classifier.
Coherent perfect absorption mediated enhancement of transverse spin in a gap plasmon guide

NASA Astrophysics Data System (ADS)

Mukherjee, Samyobrata; Dutta Gupta, Subhasish

2017-01-01

We consider a symmetric gap plasmon guide (a folded Kretschmann configuration) supporting both symmetric and antisymmetric coupled surface plasmons. We calculate the transverse spin under illumination from both the sides like in coherent perfect absorption (CPA), whereby all the incident light can be absorbed to excite one of the modes of the structure. Significant enhancement in the transverse spin is shown to be possible when the CPA dip and the mode excitation are at the same frequency. The enhancement results from CPA-mediated total transfer of the incident light to either of the coupled modes and the associated large local fields. The effect is shown to be robust against small deviations from the symmetric structure. The transverse spin is localized in the structure since in the ambient dielectric there are only incident plane waves lacking any structure.
A finite element formulation preserving symmetric and banded diffusion stiffness matrix characteristics for fractional differential equations

NASA Astrophysics Data System (ADS)

Lin, Zeng; Wang, Dongdong

2017-10-01

Due to the nonlocal property of the fractional derivative, the finite element analysis of fractional diffusion equation often leads to a dense and non-symmetric stiffness matrix, in contrast to the conventional finite element formulation with a particularly desirable symmetric and banded stiffness matrix structure for the typical diffusion equation. This work first proposes a finite element formulation that preserves the symmetry and banded stiffness matrix characteristics for the fractional diffusion equation. The key point of the proposed formulation is the symmetric weak form construction through introducing a fractional weight function. It turns out that the stiffness part of the present formulation is identical to its counterpart of the finite element method for the conventional diffusion equation and thus the stiffness matrix formulation becomes trivial. Meanwhile, the fractional derivative effect in the discrete formulation is completely transferred to the force vector, which is obviously much easier and efficient to compute than the dense fractional derivative stiffness matrix. Subsequently, it is further shown that for the general fractional advection-diffusion-reaction equation, the symmetric and banded structure can also be maintained for the diffusion stiffness matrix, although the total stiffness matrix is not symmetric in this case. More importantly, it is demonstrated that under certain conditions this symmetric diffusion stiffness matrix formulation is capable of producing very favorable numerical solutions in comparison with the conventional non-symmetric diffusion stiffness matrix finite element formulation. The effectiveness of the proposed methodology is illustrated through a series of numerical examples.
Differences in the Visual Perception of Symmetric Patterns in Orangutans (Pongo pygmaeus abelii) and Two Human Cultural Groups: A Comparative Eye-Tracking Study.

PubMed

Mühlenbeck, Cordelia; Liebal, Katja; Pritsch, Carla; Jacobsen, Thomas

2016-01-01

Symmetric structures are of importance in relation to aesthetic preference. To investigate whether the preference for symmetric patterns is unique to humans, independent of their cultural background, we compared two human populations with distinct cultural backgrounds (Namibian hunter-gatherers and German town dwellers) with one species of non-human great apes (Orangutans) in their viewing behavior regarding symmetric and asymmetric patterns in two levels of complexity. In addition, the human participants were asked to give their aesthetic evaluation of a subset of the presented patterns. The results showed that humans of both cultural groups fixated on symmetric patterns for a longer period of time, regardless of the pattern's complexity. On the contrary, Orangutans did not clearly differentiate between symmetric and asymmetric patterns, but were much faster in processing the presented stimuli and scanned the complete screen, while both human groups rested on the symmetric pattern after a short scanning time. The aesthetic evaluation test revealed that the fixation preference for symmetric patterns did not match with the aesthetic evaluation in the Hai//om group, whereas in the German group aesthetic evaluation was in accordance with the fixation preference in 60 percent of the cases. It can be concluded that humans prefer well-ordered structures in visual processing tasks, most likely because of a positive processing bias for symmetry, which Orangutans did not show in this task, and that, in humans, an aesthetic preference does not necessarily accompany the fixation preference.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

PubMed

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-08-01

RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Uniform Recovery Bounds for Structured Random Matrices in Corrupted Compressed Sensing

NASA Astrophysics Data System (ADS)

Zhang, Peng; Gan, Lu; Ling, Cong; Sun, Sumei

2018-04-01

We study the problem of recovering an $s$-sparse signal $\\mathbf{x}^{\\star}\\in\\mathbb{C}^n$ from corrupted measurements $\\mathbf{y} = \\mathbf{A}\\mathbf{x}^{\\star}+\\mathbf{z}^{\\star}+\\mathbf{w}$, where $\\mathbf{z}^{\\star}\\in\\mathbb{C}^m$ is a $k$-sparse corruption vector whose nonzero entries may be arbitrarily large and $\\mathbf{w}\\in\\mathbb{C}^m$ is a dense noise with bounded energy. The aim is to exactly and stably recover the sparse signal with tractable optimization programs. In this paper, we prove the uniform recovery guarantee of this problem for two classes of structured sensing matrices. The first class can be expressed as the product of a unit-norm tight frame (UTF), a random diagonal matrix and a bounded columnwise orthonormal matrix (e.g., partial random circulant matrix). When the UTF is bounded (i.e. $\\mu(\\mathbf{U})\\sim1/\\sqrt{m}$), we prove that with high probability, one can recover an $s$-sparse signal exactly and stably by $l_1$ minimization programs even if the measurements are corrupted by a sparse vector, provided $m = \\mathcal{O}(s \\log^2 s \\log^2 n)$ and the sparsity level $k$ of the corruption is a constant fraction of the total number of measurements. The second class considers randomly sub-sampled orthogonal matrix (e.g., random Fourier matrix). We prove the uniform recovery guarantee provided that the corruption is sparse on certain sparsifying domain. Numerous simulation results are also presented to verify and complement the theoretical results.

Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks.

PubMed

Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R; Nguyen, Tuan N; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T

2017-01-01

This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks

PubMed Central

Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R.; Nguyen, Tuan N.; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T.

2017-01-01

This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively. PMID:28326009
Sparse Representation for Infrared Dim Target Detection via a Discriminative Over-Complete Dictionary Learned Online

PubMed Central

Li, Zheng-Zhou; Chen, Jing; Hou, Qian; Fu, Hong-Xia; Dai, Zhen; Jin, Gang; Li, Ru-Zhang; Liu, Chang-Ju

2014-01-01

It is difficult for structural over-complete dictionaries such as the Gabor function and discriminative over-complete dictionary, which are learned offline and classified manually, to represent natural images with the goal of ideal sparseness and to enhance the difference between background clutter and target signals. This paper proposes an infrared dim target detection approach based on sparse representation on a discriminative over-complete dictionary. An adaptive morphological over-complete dictionary is trained and constructed online according to the content of infrared image by K-singular value decomposition (K-SVD) algorithm. Then the adaptive morphological over-complete dictionary is divided automatically into a target over-complete dictionary describing target signals, and a background over-complete dictionary embedding background by the criteria that the atoms in the target over-complete dictionary could be decomposed more sparsely based on a Gaussian over-complete dictionary than the one in the background over-complete dictionary. This discriminative over-complete dictionary can not only capture significant features of background clutter and dim targets better than a structural over-complete dictionary, but also strengthens the sparse feature difference between background and target more efficiently than a discriminative over-complete dictionary learned offline and classified manually. The target and background clutter can be sparsely decomposed over their corresponding over-complete dictionaries, yet couldn't be sparsely decomposed based on their opposite over-complete dictionary, so their residuals after reconstruction by the prescribed number of target and background atoms differ very visibly. Some experiments are included and the results show that this proposed approach could not only improve the sparsity more efficiently, but also enhance the performance of small target detection more effectively. PMID:24871988
Sparse representation for infrared Dim target detection via a discriminative over-complete dictionary learned online.

PubMed

Li, Zheng-Zhou; Chen, Jing; Hou, Qian; Fu, Hong-Xia; Dai, Zhen; Jin, Gang; Li, Ru-Zhang; Liu, Chang-Ju

2014-05-27

It is difficult for structural over-complete dictionaries such as the Gabor function and discriminative over-complete dictionary, which are learned offline and classified manually, to represent natural images with the goal of ideal sparseness and to enhance the difference between background clutter and target signals. This paper proposes an infrared dim target detection approach based on sparse representation on a discriminative over-complete dictionary. An adaptive morphological over-complete dictionary is trained and constructed online according to the content of infrared image by K-singular value decomposition (K-SVD) algorithm. Then the adaptive morphological over-complete dictionary is divided automatically into a target over-complete dictionary describing target signals, and a background over-complete dictionary embedding background by the criteria that the atoms in the target over-complete dictionary could be decomposed more sparsely based on a Gaussian over-complete dictionary than the one in the background over-complete dictionary. This discriminative over-complete dictionary can not only capture significant features of background clutter and dim targets better than a structural over-complete dictionary, but also strengthens the sparse feature difference between background and target more efficiently than a discriminative over-complete dictionary learned offline and classified manually. The target and background clutter can be sparsely decomposed over their corresponding over-complete dictionaries, yet couldn't be sparsely decomposed based on their opposite over-complete dictionary, so their residuals after reconstruction by the prescribed number of target and background atoms differ very visibly. Some experiments are included and the results show that this proposed approach could not only improve the sparsity more efficiently, but also enhance the performance of small target detection more effectively.
Cryo-EM structure of the gasdermin A3 membrane pore.

PubMed

Ruan, Jianbin; Xia, Shiyu; Liu, Xing; Lieberman, Judy; Wu, Hao

2018-05-01

Gasdermins mediate inflammatory cell death after cleavage by caspases or other, unknown enzymes. The cleaved N-terminal fragments bind to acidic membrane lipids to form pores, but the mechanism of pore formation remains unresolved. Here we present the cryo-electron microscopy structures of the 27-fold and 28-fold single-ring pores formed by the N-terminal fragment of mouse GSDMA3 (GSDMA3-NT) at 3.8 and 4.2 Å resolutions, and of a double-ring pore at 4.6 Å resolution. In the 27-fold pore, a 108-stranded anti-parallel β-barrel is formed by two β-hairpins from each subunit capped by a globular domain. We identify a positively charged helix that interacts with the acidic lipid cardiolipin. GSDMA3-NT undergoes radical conformational changes upon membrane insertion to form long, membrane-spanning β-strands. We also observe an unexpected additional symmetric ring of GSDMA3-NT subunits that does not insert into the membrane in the double-ring pore, which may represent a pre-pore state of GSDMA3-NT. These structures provide a basis that explains the activities of several mutant gasdermins, including defective mutants that are associated with cancer.
Application of a sparse representation method using K-SVD to data compression of experimental ambient vibration data for SHM

NASA Astrophysics Data System (ADS)

Noh, Hae Young; Kiremidjian, Anne S.

2011-04-01

This paper introduces a data compression method using the K-SVD algorithm and its application to experimental ambient vibration data for structural health monitoring purposes. Because many damage diagnosis algorithms that use system identification require vibration measurements of multiple locations, it is necessary to transmit long threads of data. In wireless sensor networks for structural health monitoring, however, data transmission is often a major source of battery consumption. Therefore, reducing the amount of data to transmit can significantly lengthen the battery life and reduce maintenance cost. The K-SVD algorithm was originally developed in information theory for sparse signal representation. This algorithm creates an optimal over-complete set of bases, referred to as a dictionary, using singular value decomposition (SVD) and represents the data as sparse linear combinations of these bases using the orthogonal matching pursuit (OMP) algorithm. Since ambient vibration data are stationary, we can segment them and represent each segment sparsely. Then only the dictionary and the sparse vectors of the coefficients need to be transmitted wirelessly for restoration of the original data. We applied this method to ambient vibration data measured from a four-story steel moment resisting frame. The results show that the method can compress the data efficiently and restore the data with very little error.
Fast and low-dose computed laminography using compressive sensing based technique

NASA Astrophysics Data System (ADS)

Abbas, Sajid; Park, Miran; Cho, Seungryong

2015-03-01

Computed laminography (CL) is well known for inspecting microstructures in the materials, weldments and soldering defects in high density packed components or multilayer printed circuit boards. The overload problem on x-ray tube and gross failure of the radio-sensitive electronics devices during a scan are among important issues in CL which needs to be addressed. The sparse-view CL can be one of the viable option to overcome such issues. In this work a numerical aluminum welding phantom was simulated to collect sparsely sampled projection data at only 40 views using a conventional CL scanning scheme i.e. oblique scan. A compressive-sensing inspired total-variation (TV) minimization algorithm was utilized to reconstruct the images. It is found that the images reconstructed using sparse view data are visually comparable with the images reconstructed using full scan data set i.e. at 360 views on regular interval. We have quantitatively confirmed that tiny structures such as copper and tungsten slags, and copper flakes in the reconstructed images from sparsely sampled data are comparable with the corresponding structure present in the fully sampled data case. A blurring effect can be seen near the edges of few pores at the bottom of the reconstructed images from sparsely sampled data, despite the overall image quality is reasonable for fast and low-dose NDT.
Electroluminescence of fluorescent-phosphorescent organic light-emitting diodes with regular, inverted, and symmetrical structures

NASA Astrophysics Data System (ADS)

Yang, Su-Hua; Shih, Po-Jen; Wu, Wen-Jie

2014-11-01

The influence of the device structure on the electroluminescence (EL) properties of fluorescent-phosphorescent organic light emitting diodes (OLEDs) was demonstrated. Four devices with regular-, inverted-, compensated- and symmetrical-emission layers (EMLs) were prepared. In regular-EML device, DCJTB emission increased when the phosphorescent sensitized EML was thickened. In inverted-EML device, low electron energy barrier at the Bphen/BCzVB interface resulted in weakened blue emission. The compensated-EML device, prepared with a red color-compensated layer, showed a color-tunable broadband white emission. Conversely, device with a quantum-like symmetrical-EML showed a narrow color-temperature range. Stable EL efficiency was obtained from regular, compensated, and symmetrical-EML devices. In contrast, EL efficiency of inverted-EML device rolled off significantly, though it had the highest EL efficiency of 11.4 cd/A.
Coaxial microreactor for particle synthesis

DOEpatents

Bartsch, Michael; Kanouff, Michael P; Ferko, Scott M; Crocker, Robert W; Wally, Karl

2013-10-22

A coaxial fluid flow microreactor system disposed on a microfluidic chip utilizing laminar flow for synthesizing particles from solution. Flow geometries produced by the mixing system make use of hydrodynamic focusing to confine a core flow to a small axially-symmetric, centrally positioned and spatially well-defined portion of a flow channel cross-section to provide highly uniform diffusional mixing between a reactant core and sheath flow streams. The microreactor is fabricated in such a way that a substantially planar two-dimensional arrangement of microfluidic channels will produce a three-dimensional core/sheath flow geometry. The microreactor system can comprise one or more coaxial mixing stages that can be arranged singly, in series, in parallel or nested concentrically in parallel.
A Computing Platform for Parallel Sparse Matrix Computations

DTIC Science & Technology

2016-01-05

REPORT NUMBER 19a. NAME OF RESPONSIBLE PERSON 19b. TELEPHONE NUMBER Ahmed Sameh Ahmed H. Sameh, Alicia Klinvex, Yao Zhu 611103 c. THIS PAGE The...PERCENT_SUPPORTEDNAME FTE Equivalent: Total Number: Discipline Yao Zhu 0.50 Alicia Klinvex 0.10 0.60 2 Names of Post Doctorates Names of Faculty Supported...PERCENT_SUPPORTEDNAME FTE Equivalent: Total Number: NAME Total Number: NAME Total Number: Yao Zhu Alicia Klinvex 2 ...... ...... Sub Contractors (DD882) Names of other
Exploiting Data Sparsity in Parallel Matrix Powers Computations

DTIC Science & Technology

2013-05-03

2013 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour...matrices of the form A = D+USV H, where D is sparse and USV H has low rank but may be dense. Matrices of this form arise in many practical applications...methods numerical partial di erential equation solvers, and preconditioned iterative methods. If A has this form , our algorithm enables a communication
Simulation of Locking Space Truss Deployments for a Large Deployable Sparse Aperture Reflector

DTIC Science & Technology

2015-03-01

Dr. Alan Jennings, for his unending patience with my struggles through this entire process . Without his expertise, guidance, and trust I would have...engineer since they are not automatically meshed. Fortunately, the mesh process is quite swift. Figure 13 shows both a linear hexahedral element as well...less than that of the serial process . Therefore, COMSOL’s partially parallelized algorithms will not be sped up as a function of cores added and is
Sparse PDF Volumes for Consistent Multi-Resolution Volume Rendering.

PubMed

Sicat, Ronell; Krüger, Jens; Möller, Torsten; Hadwiger, Markus

2014-12-01

This paper presents a new multi-resolution volume representation called sparse pdf volumes, which enables consistent multi-resolution volume rendering based on probability density functions (pdfs) of voxel neighborhoods. These pdfs are defined in the 4D domain jointly comprising the 3D volume and its 1D intensity range. Crucially, the computation of sparse pdf volumes exploits data coherence in 4D, resulting in a sparse representation with surprisingly low storage requirements. At run time, we dynamically apply transfer functions to the pdfs using simple and fast convolutions. Whereas standard low-pass filtering and down-sampling incur visible differences between resolution levels, the use of pdfs facilitates consistent results independent of the resolution level used. We describe the efficient out-of-core computation of large-scale sparse pdf volumes, using a novel iterative simplification procedure of a mixture of 4D Gaussians. Finally, our data structure is optimized to facilitate interactive multi-resolution volume rendering on GPUs.
Asymmetrical external effects on transmission, conductance and giant tunneling magnetoresistance in silicene

NASA Astrophysics Data System (ADS)

Oubram, O.; Navarro, O.; Guzmán, E. J.; Rodríguez-Vargas, I.

2018-01-01

Electron transport in a silicene structure, composed of a pair of magnetic gates, is studied in a ferromagnetic and antiferromagnetic configuration. The transport properties are investigated for asymmetrical external effects like an electrostatic potential, a magnetic field and for asymmetrical geometric structure. This theoretical study, has been done using the matrix transfer method to calculate the transmission, the conductance for parallel and antiparallel magnetic alignment and the tunneling magnetoresistance (TMR). In Particular, we have found that the transmission, conductance and magnetoresistance oscillate as a function of the width of barriers. It is also found that a best control and high values of TMR spectrum are achieved by an asymmetrical application of the contact voltage. Besides, we have shown that the TMR is enhanced several orders of magnitude by the combined asymmetrical magnetization effect with an adequate applied electrostatic potential. Whereby, the asymmetrical external effects play an important role to improve TMR than symmetrical ones. Finally, the giant TMR can be flexibly modulated by incident energy and a specific asymmetrical application of control voltage. These results could be useful to design filters and digital nanodevices.
Polarized excitons and optical activity in single-wall carbon nanotubes

NASA Astrophysics Data System (ADS)

Chang, Yao-Wen; Jin, Bih-Yaw

2018-05-01

The polarized excitons and optical activity of single-wall carbon nanotubes (SWNTs) are studied theoretically by π -electron Hamiltonian and helical-rotational symmetry. By taking advantage of the symmetrization, the single-particle energy and properties of a SWNT are characterized with the corresponding helical band structure. The dipole-moment matrix elements, magnetic-moment matrix elements, and the selection rules can also be derived. Based on different selection rules, the optical transitions can be assigned as the parallel-polarized, left-handed circularly-polarized, and right-handed circularly-polarized transitions, where the combination of the last two gives the cross-polarized transition. The absorption and circular dichroism (CD) spectra are simulated by exciton calculation. The calculated results are well comparable with the reported measurements. Built on the foundation, magnetic-field effects on the polarized excitons and optical activity of SWNTs are studied. Dark-bright exciton splitting and interband Faraday effect in the CD spectrum of SWNTs under an axial magnetic field are predicted. The Faraday rotation dispersion can be analyzed according to the selection rules of circular polarizations and the helical band structure.
Single image super-resolution based on compressive sensing and improved TV minimization sparse recovery

NASA Astrophysics Data System (ADS)

Vishnukumar, S.; Wilscy, M.

2017-12-01

In this paper, we propose a single image Super-Resolution (SR) method based on Compressive Sensing (CS) and Improved Total Variation (TV) Minimization Sparse Recovery. In the CS framework, low-resolution (LR) image is treated as the compressed version of high-resolution (HR) image. Dictionary Training and Sparse Recovery are the two phases of the method. K-Singular Value Decomposition (K-SVD) method is used for dictionary training and the dictionary represents HR image patches in a sparse manner. Here, only the interpolated version of the LR image is used for training purpose and thereby the structural self similarity inherent in the LR image is exploited. In the sparse recovery phase the sparse representation coefficients with respect to the trained dictionary for LR image patches are derived using Improved TV Minimization method. HR image can be reconstructed by the linear combination of the dictionary and the sparse coefficients. The experimental results show that the proposed method gives better results quantitatively as well as qualitatively on both natural and remote sensing images. The reconstructed images have better visual quality since edges and other sharp details are preserved.
Topology-dependent density optima for efficient simultaneous network exploration

NASA Astrophysics Data System (ADS)

Wilson, Daniel B.; Baker, Ruth E.; Woodhouse, Francis G.

2018-06-01

A random search process in a networked environment is governed by the time it takes to visit every node, termed the cover time. Often, a networked process does not proceed in isolation but competes with many instances of itself within the same environment. A key unanswered question is how to optimize this process: How many concurrent searchers can a topology support before the benefits of parallelism are outweighed by competition for space? Here, we introduce the searcher-averaged parallel cover time (APCT) to quantify these economies of scale. We show that the APCT of the networked symmetric exclusion process is optimized at a searcher density that is well predicted by the spectral gap. Furthermore, we find that nonequilibrium processes, realized through the addition of bias, can support significantly increased density optima. Our results suggest alternative hybrid strategies of serial and parallel search for efficient information gathering in social interaction and biological transport networks.
Analysis techniques for diagnosing runaway ion distributions in the reversed field pinch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, J., E-mail: jkim536@wisc.edu; Anderson, J. K.; Capecchi, W.

2016-11-15

An advanced neutral particle analyzer (ANPA) on the Madison Symmetric Torus measures deuterium ions of energy ranges 8-45 keV with an energy resolution of 2-4 keV and time resolution of 10 μs. Three different experimental configurations measure distinct portions of the naturally occurring fast ion distributions: fast ions moving parallel, anti-parallel, or perpendicular to the plasma current. On a radial-facing port, fast ions moving perpendicular to the current have the necessary pitch to be measured by the ANPA. With the diagnostic positioned on a tangent line through the plasma core, a chord integration over fast ion density, background neutral density,more » and local appropriate pitch defines the measured sample. The plasma current can be reversed to measure anti-parallel fast ions in the same configuration. Comparisons of energy distributions for the three configurations show an anisotropic fast ion distribution favoring high pitch ions.« less
Dynamic Snap-Through of Thermally Buckled Structures by a Reduced Order Method

NASA Technical Reports Server (NTRS)

Przekop, Adam; Rizzi, Stephen A.

2007-01-01

The goal of this investigation is to further develop nonlinear modal numerical simulation methods for application to geometrically nonlinear response of structures exposed to combined high intensity random pressure fluctuations and thermal loadings. The study is conducted on a flat aluminum beam, which permits a comparison of results obtained by a reduced-order analysis with those obtained from a numerically intensive simulation in physical degrees-of-freedom. A uniformly distributed thermal loading is first applied to investigate the dynamic instability associated with thermal buckling. A uniformly distributed random loading is added to investigate the combined thermal-acoustic response. In the latter case, three types of response characteristics are considered, namely: (i) small amplitude vibration around one of the two stable buckling equilibrium positions, (ii) intermittent snap-through response between the two equilibrium positions, and (iii) persistent snap-through response between the two equilibrium positions. For the reduced-order analysis, four categories of modal basis functions are identified including those having symmetric transverse, anti-symmetric transverse, symmetric in-plane, and anti-symmetric in-plane displacements. The effect of basis selection on the quality of results is investigated for the dynamic thermal buckling and combined thermal-acoustic response. It is found that despite symmetric geometry, loading, and boundary conditions, the anti-symmetric transverse and symmetric in-plane modes must be included in the basis as they participate in the snap-through behavior.
Evidence for using Monte Carlo calculated wall attenuation and scatter correction factors for three styles of graphite-walled ion chamber.

PubMed

McCaffrey, J P; Mainegra-Hing, E; Kawrakow, I; Shortt, K R; Rogers, D W O

2004-06-21

The basic equation for establishing a 60Co air-kerma standard based on a cavity ionization chamber includes a wall correction term that corrects for the attenuation and scatter of photons in the chamber wall. For over a decade, the validity of the wall correction terms determined by extrapolation methods (K(w)K(cep)) has been strongly challenged by Monte Carlo (MC) calculation methods (K(wall)). Using the linear extrapolation method with experimental data, K(w)K(cep) was determined in this study for three different styles of primary-standard-grade graphite ionization chamber: cylindrical, spherical and plane-parallel. For measurements taken with the same 60Co source, the air-kerma rates for these three chambers, determined using extrapolated K(w)K(cep) values, differed by up to 2%. The MC code 'EGSnrc' was used to calculate the values of K(wall) for these three chambers. Use of the calculated K(wall) values gave air-kerma rates that agreed within 0.3%. The accuracy of this code was affirmed by its reliability in modelling the complex structure of the response curve obtained by rotation of the non-rotationally symmetric plane-parallel chamber. These results demonstrate that the linear extrapolation technique leads to errors in the determination of air-kerma.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.