quality parallel preconditioners: Topics by Science.gov

Sample records for quality parallel preconditioners

The multigrid preconditioned conjugate gradient method

NASA Technical Reports Server (NTRS)

Tatebe, Osamu

1993-01-01

A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
Element-topology-independent preconditioners for parallel finite element computations

NASA Technical Reports Server (NTRS)

Park, K. C.; Alexander, Scott

1992-01-01

A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures

DTIC Science & Technology

2008-10-01

42 3.4 Residual history of WSO banded preconditioner for problem 2D 54019 HIGHK . . . . . . . . . . . . . . . . . . . . . . . . . . 43...3.5 Residual history of WSO banded preconditioner for problem Appu 43 3.6 Residual history of WSO banded preconditioner for problem ASIC 680k...44 3.7 Residual history of WSO banded preconditioner for problem BUN- DLE1
FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics

NASA Astrophysics Data System (ADS)

Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio

2016-12-01

Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
Performance Models for the Spike Banded Linear System Solver

DOE PAGES

Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...

2011-01-01

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
Algorithms for parallel and vector computations

NASA Technical Reports Server (NTRS)

Ortega, James M.

1995-01-01

This is a final report on work performed under NASA grant NAG-1-1112-FOP during the period March, 1990 through February 1995. Four major topics are covered: (1) solution of nonlinear poisson-type equations; (2) parallel reduced system conjugate gradient method; (3) orderings for conjugate gradient preconditioners, and (4) SOR as a preconditioner.
Newton-Krylov-Schwarz: An implicit solver for CFD

NASA Technical Reports Server (NTRS)

Cai, Xiao-Chuan; Keyes, David E.; Venkatakrishnan, V.

1995-01-01

Newton-Krylov methods and Krylov-Schwarz (domain decomposition) methods have begun to become established in computational fluid dynamics (CFD) over the past decade. The former employ a Krylov method inside of Newton's method in a Jacobian-free manner, through directional differencing. The latter employ an overlapping Schwarz domain decomposition to derive a preconditioner for the Krylov accelerator that relies primarily on local information, for data-parallel concurrency. They may be composed as Newton-Krylov-Schwarz (NKS) methods, which seem particularly well suited for solving nonlinear elliptic systems in high-latency, distributed-memory environments. We give a brief description of this family of algorithms, with an emphasis on domain decomposition iterative aspects. We then describe numerical simulations with Newton-Krylov-Schwarz methods on aerodynamics applications emphasizing comparisons with a standard defect-correction approach, subdomain preconditioner consistency, subdomain preconditioner quality, and the effect of a coarse grid.
Matrix-Free Polynomial-Based Nonlinear Least Squares Optimized Preconditioning and its Application to Discontinuous Galerkin Discretizations of the Euler Equations

DTIC Science & Technology

2015-06-01

cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
Using Perturbed QR Factorizations To Solve Linear Least-Squares Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avron, Haim; Ng, Esmond G.; Toledo, Sivan

2008-03-21

We propose and analyze a new tool to help solve sparse linear least-squares problems min{sub x} {parallel}Ax-b{parallel}{sub 2}. Our method is based on a sparse QR factorization of a low-rank perturbation {cflx A} of A. More precisely, we show that the R factor of {cflx A} is an effective preconditioner for the least-squares problem min{sub x} {parallel}Ax-b{parallel}{sub 2}, when solved using LSQR. We propose applications for the new technique. When A is rank deficient we can add rows to ensure that the preconditioner is well-conditioned without column pivoting. When A is sparse except for a few dense rows we canmore » drop these dense rows from A to obtain {cflx A}. Another application is solving an updated or downdated problem. If R is a good preconditioner for the original problem A, it is a good preconditioner for the updated/downdated problem {cflx A}. We can also solve what-if scenarios, where we want to find the solution if a column of the original matrix is changed/removed. We present a spectral theory that analyzes the generalized spectrum of the pencil (A*A,R*R) and analyze the applications.« less
Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

NASA Astrophysics Data System (ADS)

Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

2014-06-01

We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the parallel context.
Incomplete Sparse Approximate Inverses for Parallel Preconditioning

DOE PAGES

Anzt, Hartwig; Huckle, Thomas K.; Bräckle, Jürgen; ...

2017-10-28

In this study, we propose a new preconditioning method that can be seen as a generalization of block-Jacobi methods, or as a simplification of the sparse approximate inverse (SAI) preconditioners. The “Incomplete Sparse Approximate Inverses” (ISAI) is in particular efficient in the solution of sparse triangular linear systems of equations. Those arise, for example, in the context of incomplete factorization preconditioning. ISAI preconditioners can be generated via an algorithm providing fine-grained parallelism, which makes them attractive for hardware with a high concurrency level. Finally, in a study covering a large number of matrices, we identify the ISAI preconditioner as anmore » attractive alternative to exact triangular solves in the context of incomplete factorization preconditioning.« less
Research in parallel computing

NASA Technical Reports Server (NTRS)

Ortega, James M.; Henderson, Charles

1994-01-01

This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Paul T.; Shadid, John N.; Sala, Marzio

In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
A taxonomy and comparison of parallel block multi-level preconditioners for the incompressible Navier-Stokes equations.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shadid, John Nicolas; Elman, Howard; Shuttleworth, Robert R.

2007-04-01

In recent years, considerable effort has been placed on developing efficient and robust solution algorithms for the incompressible Navier-Stokes equations based on preconditioned Krylov methods. These include physics-based methods, such as SIMPLE, and purely algebraic preconditioners based on the approximation of the Schur complement. All these techniques can be represented as approximate block factorization (ABF) type preconditioners. The goal is to decompose the application of the preconditioner into simplified sub-systems in which scalable multi-level type solvers can be applied. In this paper we develop a taxonomy of these ideas based on an adaptation of a generalized approximate factorization of themore » Navier-Stokes system first presented in [25]. This taxonomy illuminates the similarities and differences among these preconditioners and the central role played by efficient approximation of certain Schur complement operators. We then present a parallel computational study that examines the performance of these methods and compares them to an additive Schwarz domain decomposition (DD) algorithm. Results are presented for two and three-dimensional steady state problems for enclosed domains and inflow/outflow systems on both structured and unstructured meshes. The numerical experiments are performed using MPSalsa, a stabilized finite element code.« less
A Performance Comparison of the Parallel Preconditioners for Iterative Methods for Large Sparse Linear Systems Arising from Partial Differential Equations on Structured Grids

NASA Astrophysics Data System (ADS)

Ma, Sangback

In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
Domain decomposition preconditioners for the spectral collocation method

NASA Technical Reports Server (NTRS)

Quarteroni, Alfio; Sacchilandriani, Giovanni

1988-01-01

Several block iteration preconditioners are proposed and analyzed for the solution of elliptic problems by spectral collocation methods in a region partitioned into several rectangles. It is shown that convergence is achieved with a rate which does not depend on the polynomial degree of the spectral solution. The iterative methods here presented can be effectively implemented on multiprocessor systems due to their high degree of parallelism.
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE PAGES

Li, Ruipeng; Saad, Yousef

2017-08-01

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Ruipeng; Saad, Yousef

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Domain decomposition algorithms and computation fluid dynamics

NASA Technical Reports Server (NTRS)

Chan, Tony F.

1988-01-01

In the past several years, domain decomposition was a very popular topic, partly motivated by the potential of parallelization. While a large body of theory and algorithms were developed for model elliptic problems, they are only recently starting to be tested on realistic applications. The application of some of these methods to two model problems in computational fluid dynamics are investigated. Some examples are two dimensional convection-diffusion problems and the incompressible driven cavity flow problem. The construction and analysis of efficient preconditioners for the interface operator to be used in the iterative solution of the interface solution is described. For the convection-diffusion problems, the effect of the convection term and its discretization on the performance of some of the preconditioners is discussed. For the driven cavity problem, the effectiveness of a class of boundary probe preconditioners is discussed.

Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Widlund, Olof B.

2015-06-09

The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less
Scalable parallel elastic-plastic finite element analysis using a quasi-Newton method with a balancing domain decomposition preconditioner

NASA Astrophysics Data System (ADS)

Yusa, Yasunori; Okada, Hiroshi; Yamada, Tomonori; Yoshimura, Shinobu

2018-04-01

A domain decomposition method for large-scale elastic-plastic problems is proposed. The proposed method is based on a quasi-Newton method in conjunction with a balancing domain decomposition preconditioner. The use of a quasi-Newton method overcomes two problems associated with the conventional domain decomposition method based on the Newton-Raphson method: (1) avoidance of a double-loop iteration algorithm, which generally has large computational complexity, and (2) consideration of the local concentration of nonlinear deformation, which is observed in elastic-plastic problems with stress concentration. Moreover, the application of a balancing domain decomposition preconditioner ensures scalability. Using the conventional and proposed domain decomposition methods, several numerical tests, including weak scaling tests, were performed. The convergence performance of the proposed method is comparable to that of the conventional method. In particular, in elastic-plastic analysis, the proposed method exhibits better convergence performance than the conventional method.
AZTEC: A parallel iterative package for the solving linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1996-12-31

We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.
Scalable Preconditioners for Structure Preserving Discretizations of Maxwell Equations in First Order Form

DOE PAGES

Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.

2018-05-01

Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Scalable Preconditioners for Structure Preserving Discretizations of Maxwell Equations in First Order Form

DOE Office of Scientific and Technical Information (OSTI.GOV)

Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.

Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
3-D modeling of ductile tearing using finite elements: Computational aspects and techniques

NASA Astrophysics Data System (ADS)

Gullerud, Arne Stewart

This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.
Parallel computing techniques for rotorcraft aerodynamics

NASA Astrophysics Data System (ADS)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures

PubMed Central

Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone

2018-01-01

We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
An accurate, fast, and scalable solver for high-frequency wave propagation

NASA Astrophysics Data System (ADS)

Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.

2017-12-01

In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
Condition number estimation of preconditioned matrices.

PubMed

Kushida, Noriyuki

2015-01-01

The present paper introduces a condition number estimation method for preconditioned matrices. The newly developed method provides reasonable results, while the conventional method which is based on the Lanczos connection gives meaningless results. The Lanczos connection based method provides the condition numbers of coefficient matrices of systems of linear equations with information obtained through the preconditioned conjugate gradient method. Estimating the condition number of preconditioned matrices is sometimes important when describing the effectiveness of new preconditionerers or selecting adequate preconditioners. Operating a preconditioner on a coefficient matrix is the simplest method of estimation. However, this is not possible for large-scale computing, especially if computation is performed on distributed memory parallel computers. This is because, the preconditioned matrices become dense, even if the original matrices are sparse. Although the Lanczos connection method can be used to calculate the condition number of preconditioned matrices, it is not considered to be applicable to large-scale problems because of its weakness with respect to numerical errors. Therefore, we have developed a robust and parallelizable method based on Hager's method. The feasibility studies are curried out for the diagonal scaling preconditioner and the SSOR preconditioner with a diagonal matrix, a tri-daigonal matrix and Pei's matrix. As a result, the Lanczos connection method contains around 10% error in the results even with a simple problem. On the other hand, the new method contains negligible errors. In addition, the newly developed method returns reasonable solutions when the Lanczos connection method fails with Pei's matrix, and matrices generated with the finite element method.
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics

NASA Technical Reports Server (NTRS)

Kurdila, Andrew J.

1990-01-01

While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
A scalable parallel black oil simulator on distributed memory parallel computers

NASA Astrophysics Data System (ADS)

Wang, Kun; Liu, Hui; Chen, Zhangxin

2015-11-01

This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Condition Number Estimation of Preconditioned Matrices

PubMed Central

Kushida, Noriyuki

2015-01-01

The present paper introduces a condition number estimation method for preconditioned matrices. The newly developed method provides reasonable results, while the conventional method which is based on the Lanczos connection gives meaningless results. The Lanczos connection based method provides the condition numbers of coefficient matrices of systems of linear equations with information obtained through the preconditioned conjugate gradient method. Estimating the condition number of preconditioned matrices is sometimes important when describing the effectiveness of new preconditionerers or selecting adequate preconditioners. Operating a preconditioner on a coefficient matrix is the simplest method of estimation. However, this is not possible for large-scale computing, especially if computation is performed on distributed memory parallel computers. This is because, the preconditioned matrices become dense, even if the original matrices are sparse. Although the Lanczos connection method can be used to calculate the condition number of preconditioned matrices, it is not considered to be applicable to large-scale problems because of its weakness with respect to numerical errors. Therefore, we have developed a robust and parallelizable method based on Hager’s method. The feasibility studies are curried out for the diagonal scaling preconditioner and the SSOR preconditioner with a diagonal matrix, a tri-daigonal matrix and Pei’s matrix. As a result, the Lanczos connection method contains around 10% error in the results even with a simple problem. On the other hand, the new method contains negligible errors. In addition, the newly developed method returns reasonable solutions when the Lanczos connection method fails with Pei’s matrix, and matrices generated with the finite element method. PMID:25816331
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
Inversion of potential field data using the finite element method on parallel computers

NASA Astrophysics Data System (ADS)

Gross, L.; Altinay, C.; Shaw, S.

2015-11-01

In this paper we present a formulation of the joint inversion of potential field anomaly data as an optimization problem with partial differential equation (PDE) constraints. The problem is solved using the iterative Broyden-Fletcher-Goldfarb-Shanno (BFGS) method with the Hessian operator of the regularization and cross-gradient component of the cost function as preconditioner. We will show that each iterative step requires the solution of several PDEs namely for the potential fields, for the adjoint defects and for the application of the preconditioner. In extension to the traditional discrete formulation the BFGS method is applied to continuous descriptions of the unknown physical properties in combination with an appropriate integral form of the dot product. The PDEs can easily be solved using standard conforming finite element methods (FEMs) with potentially different resolutions. For two examples we demonstrate that the number of PDE solutions required to reach a given tolerance in the BFGS iteration is controlled by weighting regularization and cross-gradient but is independent of the resolution of PDE discretization and that as a consequence the method is weakly scalable with the number of cells on parallel computers. We also show a comparison with the UBC-GIF GRAV3D code.
Use of general purpose graphics processing units with MODFLOW

USGS Publications Warehouse

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Meros Preconditioner Package

DOE Office of Scientific and Technical Information (OSTI.GOV)

2004-04-01

Meros uses the compositional, aggregation, and overload operator capabilities of TSF to provide an object-oriented package providing segregated/block preconditioners for linear systems related to fully-coupled Navier-Stokes problems. This class of preconditioners exploits the special properties of these problems to segregate the equations and use multi-level preconditioners (through ML) on the matrix sub-blocks. Several preconditioners are provided, including the Fp and BFB preconditioners of Kay & Loghin and Silvester, Elman, Kay & Wathen. The overall performance and scalability of these preconditioners approaches that of multigrid for certain types of problems. Meros also provides more traditional pressure projection methods including SIMPLE andmore » SIMPLEC.« less
A nonrecursive 'Order N' preconditioned conjugate gradient/range space formulation of MDOF dynamics

NASA Technical Reports Server (NTRS)

Kurdila, A. J.; Menon, R.; Sunkel, John

1991-01-01

This paper addresses the requirements of present-day mechanical system simulations of algorithms that induce parallelism on a fine scale and of transient simulation methods which must be automatically load balancing for a wide collection of system topologies and hardware configurations. To this end, a combination range space/preconditioned conjugage gradient formulation of multidegree-of-freedon dynamics is developed, which, by employing regular ordering of the system connectivity graph, makes it possible to derive an extremely efficient preconditioner from the range space metric (as opposed to the system coefficient matrix). Because of the effectiveness of the preconditioner, the method can achieve performance rates that depend linearly on the number of substructures. The method, termed 'Order N' does not require the assembly of system mass or stiffness matrices, and is therefore amenable to implementation on work stations. Using this method, a 13-substructure model of the Space Station was constructed.
Parallel Performance of Linear Solvers and Preconditioners

DTIC Science & Technology

2014-01-01

are produced by a discrete dislocation dynamics ( DDD ) simulation and change with each timestep of the DDD simulation as the dislocation structure...evolves. However, the coefficient—or stiffness matrix— remains constant during the DDD simulation and some expensive matrix factorizations only occur once...discrete dislocation dynamics ( DDD ) simulations. This can be achieved by coupling a DDD simulator for bulk material (Arsenlis et al., 2007) to a
Array-based, parallel hierarchical mesh refinement algorithms for unstructured meshes

DOE PAGES

Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...

2016-08-18

In this paper, we describe an array-based hierarchical mesh refinement capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial coarse mesh that can be used for a variety of purposes such as in multigrid solvers/preconditioners, to do solution convergence and verification studies and to improve overall parallel efficiency by decreasing I/O bandwidth requirements (by loading smaller meshes and in memory refinement). We also describe a high-order boundary reconstruction capability that can be used tomore » project the new points after refinement using high-order approximations instead of linear projection in order to minimize and provide more control on geometrical errors introduced by curved boundaries.The capability is developed under the parallel unstructured mesh framework "Mesh Oriented dAtaBase" (MOAB Tautges et al. (2004)). We describe the underlying data structures and algorithms to generate such hierarchies in parallel and present numerical results for computational efficiency and effect on mesh quality. Furthermore, we also present results to demonstrate the applicability of the developed capability to study convergence properties of different point projection schemes for various mesh hierarchies and to a multigrid finite-element solver for elliptic problems.« less

Multilevel filtering elliptic preconditioners

NASA Technical Reports Server (NTRS)

Kuo, C. C. Jay; Chan, Tony F.; Tong, Charles

1989-01-01

A class of preconditioners is presented for elliptic problems built on ideas borrowed from the digital filtering theory and implemented on a multilevel grid structure. They are designed to be both rapidly convergent and highly parallelizable. The digital filtering viewpoint allows the use of filter design techniques for constructing elliptic preconditioners and also provides an alternative framework for understanding several other recently proposed multilevel preconditioners. Numerical results are presented to assess the convergence behavior of the new methods and to compare them with other preconditioners of multilevel type, including the usual multigrid method as preconditioner, the hierarchical basis method and a recent method proposed by Bramble-Pasciak-Xu.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

NASA Technical Reports Server (NTRS)

Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

1996-01-01

We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

NASA Astrophysics Data System (ADS)

Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods

NASA Astrophysics Data System (ADS)

Pazner, Will; Persson, Per-Olof

2018-02-01

In this paper, we develop a new tensor-product based preconditioner for discontinuous Galerkin methods with polynomial degrees higher than those typically employed. This preconditioner uses an automatic, purely algebraic method to approximate the exact block Jacobi preconditioner by Kronecker products of several small, one-dimensional matrices. Traditional matrix-based preconditioners require O (p2d) storage and O (p3d) computational work, where p is the degree of basis polynomials used, and d is the spatial dimension. Our SVD-based tensor-product preconditioner requires O (p d + 1) storage, O (p d + 1) work in two spatial dimensions, and O (p d + 2) work in three spatial dimensions. Combined with a matrix-free Newton-Krylov solver, these preconditioners allow for the solution of DG systems in linear time in p per degree of freedom in 2D, and reduce the computational complexity from O (p9) to O (p5) in 3D. Numerical results are shown in 2D and 3D for the advection, Euler, and Navier-Stokes equations, using polynomials of degree up to p = 30. For many test cases, the preconditioner results in similar iteration counts when compared with the exact block Jacobi preconditioner, and performance is significantly improved for high polynomial degrees p.
A Comparison of Solver Performance for Complex Gastric Electrophysiology Models

PubMed Central

Sathar, Shameer; Cheng, Leo K.; Trew, Mark L.

2016-01-01

Computational techniques for solving systems of equations arising in gastric electrophysiology have not been studied for efficient solution process. We present a computationally challenging problem of simulating gastric electrophysiology in anatomically realistic stomach geometries with multiple intracellular and extracellular domains. The multiscale nature of the problem and mesh resolution required to capture geometric and functional features necessitates efficient solution methods if the problem is to be tractable. In this study, we investigated and compared several parallel preconditioners for the linear systems arising from tetrahedral discretisation of electrically isotropic and anisotropic problems, with and without stimuli. The results showed that the isotropic problem was computationally less challenging than the anisotropic problem and that the application of extracellular stimuli increased workload considerably. Preconditioning based on block Jacobi and algebraic multigrid solvers were found to have the best overall solution times and least iteration counts, respectively. The algebraic multigrid preconditioner would be expected to perform better on large problems. PMID:26736543
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Efficient Preconditioning for the p-Version Finite Element Method in Two Dimensions

DTIC Science & Technology

1989-10-01

paper, we study fast parallel preconditioners for systems of equations arising from the p-version finite element method. The p-version finite element...computations and the solution of a relatively small global auxiliary problem. We study two different methods. In the first (Section 3), the global...20], will be studied in the next section. Problem (3.12) is obviously much more easily solved than the original problem ,nd the procedure is highly
Scalable smoothing strategies for a geometric multigrid method for the immersed boundary equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhalla, Amneet Pal Singh; Knepley, Matthew G.; Adams, Mark F.

2016-12-20

The immersed boundary (IB) method is a widely used approach to simulating fluid-structure interaction (FSI). Although explicit versions of the IB method can suffer from severe time step size restrictions, these methods remain popular because of their simplicity and generality. In prior work (Guy et al., Adv Comput Math, 2015), some of us developed a geometric multigrid preconditioner for a stable semi-implicit IB method under Stokes flow conditions; however, this solver methodology used a Vanka-type smoother that presented limited opportunities for parallelization. This work extends this Stokes-IB solver methodology by developing smoothing techniques that are suitable for parallel implementation. Specifically,more » we demonstrate that an additive version of the Vanka smoother can yield an effective multigrid preconditioner for the Stokes-IB equations, and we introduce an efficient Schur complement-based smoother that is also shown to be effective for the Stokes-IB equations. We investigate the performance of these solvers for a broad range of material stiffnesses, both for Stokes flows and flows at nonzero Reynolds numbers, and for thick and thin structural models. We show here that linear solver performance degrades with increasing Reynolds number and material stiffness, especially for thin interface cases. Nonetheless, the proposed approaches promise to yield effective solution algorithms, especially at lower Reynolds numbers and at modest-to-high elastic stiffnesses.« less
New preconditioning strategy for Jacobian-free solvers for variably saturated flows with Richards’ equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lipnikov, Konstantin; Moulton, David; Svyatskiy, Daniil

2016-04-29

We develop a new approach for solving the nonlinear Richards’ equation arising in variably saturated flow modeling. The growing complexity of geometric models for simulation of subsurface flows leads to the necessity of using unstructured meshes and advanced discretization methods. Typically, a numerical solution is obtained by first discretizing PDEs and then solving the resulting system of nonlinear discrete equations with a Newton-Raphson-type method. Efficiency and robustness of the existing solvers rely on many factors, including an empiric quality control of intermediate iterates, complexity of the employed discretization method and a customized preconditioner. We propose and analyze a new preconditioningmore » strategy that is based on a stable discretization of the continuum Jacobian. We will show with numerical experiments for challenging problems in subsurface hydrology that this new preconditioner improves convergence of the existing Jacobian-free solvers 3-20 times. Furthermore, we show that the Picard method with this preconditioner becomes a more efficient nonlinear solver than a few widely used Jacobian-free solvers.« less
Domain decomposition in time for PDE-constrained optimization

DOE PAGES

Barker, Andrew T.; Stoll, Martin

2015-08-28

Here, PDE-constrained optimization problems have a wide range of applications, but they lead to very large and ill-conditioned linear systems, especially if the problems are time dependent. In this paper we outline an approach for dealing with such problems by decomposing them in time and applying an additive Schwarz preconditioner in time, so that we can take advantage of parallel computers to deal with the very large linear systems. We then illustrate the performance of our method on a variety of problems.
On the scalability of the Albany/FELIX first-order Stokes approximation ice sheet solver for large-scale simulations of the Greenland and Antarctic ice sheets

DOE PAGES

Tezaur, Irina K.; Tuminaro, Raymond S.; Perego, Mauro; ...

2015-01-01

We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems that arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multigrid (AMG) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the AMG preconditionermore » results in faster linear solve times but the ILU preconditioner exhibits better scalability. In addition, a weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. We show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the AMG preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the AMG preconditioner.« less
On optimal improvements of classical iterative schemes for Z-matrices

NASA Astrophysics Data System (ADS)

Noutsos, D.; Tzoumas, M.

2006-04-01

Many researchers have considered preconditioners, applied to linear systems, whose matrix coefficient is a Z- or an M-matrix, that make the associated Jacobi and Gauss-Seidel methods converge asymptotically faster than the unpreconditioned ones. Such preconditioners are chosen so that they eliminate the off-diagonal elements of the same column or the elements of the first upper diagonal [Milaszewicz, LAA 93 (1987) 161-170], Gunawardena et al. [LAA 154-156 (1991) 123-143]. In this work we generalize the previous preconditioners to obtain optimal methods. "Good" Jacobi and Gauss-Seidel algorithms are given and preconditioners, that eliminate more than one entry per row, are also proposed and analyzed. Moreover, the behavior of the above preconditioners to the Krylov subspace methods is studied.
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE PAGES

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...

2016-11-07

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Spotz, William F.

PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

NASA Astrophysics Data System (ADS)

Marx, Alain; Lütjens, Hinrich

2017-03-01

A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGhee, J.M.; Roberts, R.M.; Morel, J.E.

1997-06-01

A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saad, Yousef

2014-01-16

The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less
Optimal parallel solution of sparse triangular systems

NASA Technical Reports Server (NTRS)

Alvarado, Fernando L.; Schreiber, Robert

1990-01-01

A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.
Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction.

PubMed

Fessler, J A; Booth, S D

1999-01-01

Gradient-based iterative methods often converge slowly for tomographic image reconstruction and image restoration problems, but can be accelerated by suitable preconditioners. Diagonal preconditioners offer some improvement in convergence rate, but do not incorporate the structure of the Hessian matrices in imaging problems. Circulant preconditioners can provide remarkable acceleration for inverse problems that are approximately shift-invariant, i.e., for those with approximately block-Toeplitz or block-circulant Hessians. However, in applications with nonuniform noise variance, such as arises from Poisson statistics in emission tomography and in quantum-limited optical imaging, the Hessian of the weighted least-squares objective function is quite shift-variant, and circulant preconditioners perform poorly. Additional shift-variance is caused by edge-preserving regularization methods based on nonquadratic penalty functions. This paper describes new preconditioners that approximate more accurately the Hessian matrices of shift-variant imaging problems. Compared to diagonal or circulant preconditioning, the new preconditioners lead to significantly faster convergence rates for the unconstrained conjugate-gradient (CG) iteration. We also propose a new efficient method for the line-search step required by CG methods. Applications to positron emission tomography (PET) illustrate the method.

Construction, classification and parametrization of complex Hadamard matrices

NASA Astrophysics Data System (ADS)

Szöllősi, Ferenc

To improve the design of nuclear systems, high-fidelity neutron fluxes are required. Leadership-class machines provide platforms on which very large problems can be solved. Computing such fluxes efficiently requires numerical methods with good convergence properties and algorithms that can scale to hundreds of thousands of cores. Many 3-D deterministic transport codes are decomposable in space and angle only, limiting them to tens of thousands of cores. Most codes rely on methods such as Gauss Seidel for fixed source problems and power iteration for eigenvalue problems, which can be slow to converge for challenging problems like those with highly scattering materials or high dominance ratios. Three methods have been added to the 3-D SN transport code Denovo that are designed to improve convergence and enable the full use of cutting-edge computers. The first is a multigroup Krylov solver that converges more quickly than Gauss Seidel and parallelizes the code in energy such that Denovo can use hundreds of thousand of cores effectively. The second is Rayleigh quotient iteration (RQI), an old method applied in a new context. This eigenvalue solver finds the dominant eigenvalue in a mathematically optimal way and should converge in fewer iterations than power iteration. RQI creates energy-block-dense equations that the new Krylov solver treats efficiently. However, RQI can have convergence problems because it creates poorly conditioned systems. This can be overcome with preconditioning. The third method is a multigrid-in-energy preconditioner. The preconditioner takes advantage of the new energy decomposition because the grids are in energy rather than space or angle. The preconditioner greatly reduces iteration count for many problem types and scales well in energy. It also allows RQI to be successful for problems it could not solve otherwise. The methods added to Denovo accomplish the goals of this work. They converge in fewer iterations than traditional methods and enable the use of hundreds of thousands of cores. Each method can be used individually, with the multigroup Krylov solver and multigrid-in-energy preconditioner being particularly successful on their own. The largest benefit, though, comes from using these methods in concert.
Two new modified Gauss-Seidel methods for linear system with M-matrices

NASA Astrophysics Data System (ADS)

Zheng, Bing; Miao, Shu-Xin

2009-12-01

In 2002, H. Kotakemori et al. proposed the modified Gauss-Seidel (MGS) method for solving the linear system with the preconditioner [H. Kotakemori, K. Harada, M. Morimoto, H. Niki, A comparison theorem for the iterative method with the preconditioner () J. Comput. Appl. Math. 145 (2002) 373-378]. Since this preconditioner is constructed by only the largest element on each row of the upper triangular part of the coefficient matrix, the preconditioning effect is not observed on the nth row. In the present paper, to deal with this drawback, we propose two new preconditioners. The convergence and comparison theorems of the modified Gauss-Seidel methods with these two preconditioners for solving the linear system are established. The convergence rates of the new proposed preconditioned methods are compared. In addition, numerical experiments are used to show the effectiveness of the new MGS methods.
Two variants of minimum discarded fill ordering

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, E.F.; Forsyth, P.A.; Tang, Wei-Pai

1991-01-01

It is well known that the ordering of the unknowns can have a significant effect on the convergence of Preconditioned Conjugate Gradient (PCG) methods. There has been considerable experimental work on the effects of ordering for regular finite difference problems. In many cases, good results have been obtained with preconditioners based on diagonal, spiral or natural row orderings. However, for finite element problems having unstructured grids or grids generated by a local refinement approach, it is difficult to define many of the orderings for more regular problems. A recently proposed Minimum Discarded Fill (MDF) ordering technique is effective in findingmore » high quality Incomplete LU (ILU) preconditioners, especially for problems arising from unstructured finite element grids. Testing indicates this algorithm can identify a rather complicated physical structure in an anisotropic problem and orders the unknowns in the preferred'' direction. The MDF technique may be viewed as the numerical analogue of the minimum deficiency algorithm in sparse matrix technology. At any stage of the partial elimination, the MDF technique chooses the next pivot node so as to minimize the amount of discarded fill. In this work, two efficient variants of the MDF technique are explored to produce cost-effective high-order ILU preconditioners. The Threshold MDF orderings combine MDF ideas with drop tolerance techniques to identify the sparsity pattern in the ILU preconditioners. These techniques identify an ordering that encourages fast decay of the entries in the ILU factorization. The Minimum Update Matrix (MUM) ordering technique is a simplification of the MDF ordering and is closely related to the minimum degree algorithm. The MUM ordering is especially for large problems arising from Navier-Stokes problems. Some interesting pictures of the orderings are presented using a visualization tool. 22 refs., 4 figs., 7 tabs.« less
Block recursive LU preconditioners for the thermally coupled incompressible inductionless MHD problem

NASA Astrophysics Data System (ADS)

Badia, Santiago; Martín, Alberto F.; Planas, Ramon

2014-10-01

The thermally coupled incompressible inductionless magnetohydrodynamics (MHD) problem models the flow of an electrically charged fluid under the influence of an external electromagnetic field with thermal coupling. This system of partial differential equations is strongly coupled and highly nonlinear for real cases of interest. Therefore, fully implicit time integration schemes are very desirable in order to capture the different physical scales of the problem at hand. However, solving the multiphysics linear systems of equations resulting from such algorithms is a very challenging task which requires efficient and scalable preconditioners. In this work, a new family of recursive block LU preconditioners is designed and tested for solving the thermally coupled inductionless MHD equations. These preconditioners are obtained after splitting the fully coupled matrix into one-physics problems for every variable (velocity, pressure, current density, electric potential and temperature) that can be optimally solved, e.g., using preconditioned domain decomposition algorithms. The main idea is to arrange the original matrix into an (arbitrary) 2 × 2 block matrix, and consider an LU preconditioner obtained by approximating the corresponding Schur complement. For every one of the diagonal blocks in the LU preconditioner, if it involves more than one type of unknowns, we proceed the same way in a recursive fashion. This approach is stated in an abstract way, and can be straightforwardly applied to other multiphysics problems. Further, we precisely explain a flexible and general software design for the code implementation of this type of preconditioners.
A scalable nonlinear fluid-structure interaction solver based on a Schwarz preconditioner with isogeometric unstructured coarse spaces in 3D

NASA Astrophysics Data System (ADS)

Kong, Fande; Cai, Xiao-Chuan

2017-07-01

Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear in many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexact Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here "geometry" includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.
Shedding New Light on Exploding Stars: Terascale Simulations of Nuetrino-Dreiven Supernovas and Their Nucleosynthesis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dennis C. Smolarski, S.J.

Project Abstract This project was a continuation of work begun under a subcontract issued off of TSI-DOE Grant 1528746, awarded to the University of Illinois Urbana-Champaign. Dr. Anthony Mezzacappa is the Principal Investigator on the Illinois award. A separate award was issued to Santa Clara University to continue the collaboration during the time period May 2003 ? 2004. Smolarski continued to work on preconditioner technology and its interface with various iterative methods. He worked primarily with F. Dough Swesty (SUNY-Stony Brook) in continuing software development started in the 2002-03 academic year. Special attention was paid to the development and testingmore » of difference sparse approximate inverse preconditioners and their use in the solution of linear systems arising from radiation transport equations. The target was a high performance platform on which efficient implementation is a critical component of the overall effort. Smolarski also focused on the integration of the adaptive iterative algorithm, Chebycode, developed by Tom Manteuffel and Steve Ashby and adapted by Ryan Szypowski for parallel platforms, into the radiation transport code being developed at SUNY-Stony Brook.« less
Performance of fully-coupled algebraic multigrid preconditioners for large-scale VMS resistive MHD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, P. T.; Shadid, J. N.; Hu, J. J.

Here, we explore the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. Our study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of themore » original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.« less
Performance of fully-coupled algebraic multigrid preconditioners for large-scale VMS resistive MHD

DOE PAGES

Lin, P. T.; Shadid, J. N.; Hu, J. J.; ...

2017-11-06

Here, we explore the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. Our study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of themore » original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.« less
A scalable nonlinear fluid–structure interaction solver based on a Schwarz preconditioner with isogeometric unstructured coarse spaces in 3D

DOE PAGES

Kong, Fande; Cai, Xiao-Chuan

2017-03-24

Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexactmore » Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here ''geometry'' includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.« less
Physics-Based Preconditioning of a Compressible Flow Solver for Large-Scale Simulations of Additive Manufacturing Processes

NASA Astrophysics Data System (ADS)

Weston, Brian; Nourgaliev, Robert; Delplanque, Jean-Pierre

2017-11-01

We present a new block-based Schur complement preconditioner for simulating all-speed compressible flow with phase change. The conservation equations are discretized with a reconstructed Discontinuous Galerkin method and integrated in time with fully implicit time discretization schemes. The resulting set of non-linear equations is converged using a robust Newton-Krylov framework. Due to the stiffness of the underlying physics associated with stiff acoustic waves and viscous material strength effects, we solve for the primitive-variables (pressure, velocity, and temperature). To enable convergence of the highly ill-conditioned linearized systems, we develop a physics-based preconditioner, utilizing approximate block factorization techniques to reduce the fully-coupled 3×3 system to a pair of reduced 2×2 systems. We demonstrate that our preconditioned Newton-Krylov framework converges on very stiff multi-physics problems, corresponding to large CFL and Fourier numbers, with excellent algorithmic and parallel scalability. Results are shown for the classic lid-driven cavity flow problem as well as for 3D laser-induced phase change. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
On polynomial preconditioning for indefinite Hermitian matrices

NASA Technical Reports Server (NTRS)

Freund, Roland W.

1989-01-01

The minimal residual method is studied combined with polynomial preconditioning for solving large linear systems (Ax = b) with indefinite Hermitian coefficient matrices (A). The standard approach for choosing the polynomial preconditioners leads to preconditioned systems which are positive definite. Here, a different strategy is studied which leaves the preconditioned coefficient matrix indefinite. More precisely, the polynomial preconditioner is designed to cluster the positive, resp. negative eigenvalues of A around 1, resp. around some negative constant. In particular, it is shown that such indefinite polynomial preconditioners can be obtained as the optimal solutions of a certain two parameter family of Chebyshev approximation problems. Some basic results are established for these approximation problems and a Remez type algorithm is sketched for their numerical solution. The problem of selecting the parameters such that the resulting indefinite polynomial preconditioners speeds up the convergence of minimal residual method optimally is also addressed. An approach is proposed based on the concept of asymptotic convergence factors. Finally, some numerical examples of indefinite polynomial preconditioners are given.
Incomplete augmented Lagrangian preconditioner for steady incompressible Navier-Stokes equations.

PubMed

Tan, Ning-Bo; Huang, Ting-Zhu; Hu, Ze-Jun

2013-01-01

An incomplete augmented Lagrangian preconditioner, for the steady incompressible Navier-Stokes equations discretized by stable finite elements, is proposed. The eigenvalues of the preconditioned matrix are analyzed. Numerical experiments show that the incomplete augmented Lagrangian-based preconditioner proposed is very robust and performs quite well by the Picard linearization or the Newton linearization over a wide range of values of the viscosity on both uniform and stretched grids.
A Newton-Krylov method with an approximate analytical Jacobian for implicit solution of Navier-Stokes equations on staggered overset-curvilinear grids with immersed boundaries.

PubMed

Asgharzadeh, Hafez; Borazjani, Iman

2017-02-15

The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 - 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.
A Newton–Krylov method with an approximate analytical Jacobian for implicit solution of Navier–Stokes equations on staggered overset-curvilinear grids with immersed boundaries

PubMed Central

Asgharzadeh, Hafez; Borazjani, Iman

2016-01-01

The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 – 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80–90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future. PMID:28042172
A Newton-Krylov method with an approximate analytical Jacobian for implicit solution of Navier-Stokes equations on staggered overset-curvilinear grids with immersed boundaries

NASA Astrophysics Data System (ADS)

Asgharzadeh, Hafez; Borazjani, Iman

2017-02-01

The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for non-linear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form a preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42-74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal and full Jacobian, respectivley, when the stretching factor was increased. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.
Incomplete Augmented Lagrangian Preconditioner for Steady Incompressible Navier-Stokes Equations

PubMed Central

Tan, Ning-Bo; Huang, Ting-Zhu; Hu, Ze-Jun

2013-01-01

An incomplete augmented Lagrangian preconditioner, for the steady incompressible Navier-Stokes equations discretized by stable finite elements, is proposed. The eigenvalues of the preconditioned matrix are analyzed. Numerical experiments show that the incomplete augmented Lagrangian-based preconditioner proposed is very robust and performs quite well by the Picard linearization or the Newton linearization over a wide range of values of the viscosity on both uniform and stretched grids. PMID:24235888
Fast Multilevel Solvers for a Class of Discrete Fourth Order Parabolic Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheng, Bin; Chen, Luoping; Hu, Xiaozhe

2016-03-05

In this paper, we study fast iterative solvers for the solution of fourth order parabolic equations discretized by mixed finite element methods. We propose to use consistent mass matrix in the discretization and use lumped mass matrix to construct efficient preconditioners. We provide eigenvalue analysis for the preconditioned system and estimate the convergence rate of the preconditioned GMRes method. Furthermore, we show that these preconditioners only need to be solved inexactly by optimal multigrid algorithms. Our numerical examples indicate that the proposed preconditioners are very efficient and robust with respect to both discretization parameters and diffusion coefficients. We also investigatemore » the performance of multigrid algorithms with either collective smoothers or distributive smoothers when solving the preconditioner systems.« less
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

NASA Astrophysics Data System (ADS)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-01

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
The role of grape seed extract in the remineralization of demineralized dentine: micromorphological and physical analyses.

PubMed

Tang, Cheng-fang; Fang, Ming; Liu, Rui-rui; Dou, Qi; Chai, Zhi-guo; Xiao, Yu-hong; Chen, Ji-hua

2013-12-01

Grape seed extract (GSE) is known to have a positive effect on the demineralization and/or remineralization of artificial root caries lesions. The present study aimed to investigate whether biomodification of caries-like acid-etched demineralized dentine, using proanthocyanidins-rich GSE, would promote its remineralization potential. Dentine specimens were acid-etched for 30s, then biomodified using proanthocyanidin-based preconditioners (at different concentrations and pH values) for 2min, followed by a 15-day artificial remineralization regimen. They were subsequently subjected to microhardness measurements, micromorphological evaluation and X-ray diffraction analyses. Stability of the preconditioners was also analyzed, spectrophotometrically. A concentration-dependent increase was observed in the microhardness of the specimens that were biomodified using GSE preconditioners, without pH adjustment. Field emission scanning electron microscopy revealed greater mineral deposition on their surfaces, which was further identified mainly as hydroxylapatite. The absorbances of preconditioner dilutions at pH 7.4 and pH 10.0 decreased at the two typical polyphenol bands. Transient GSE biomodification promoted remineralization on the surface of demineralized dentine, and this process was influenced by the concentration and pH value of the preconditioner. GSE preconditioner at a concentration of 15%, without pH adjustment, presented with the best results, and this may be attributed to its high polyphenolic content. Copyright © 2013 Elsevier Ltd. All rights reserved.

Semi-automatic sparse preconditioners for high-order finite element methods on non-uniform meshes

NASA Astrophysics Data System (ADS)

Austin, Travis M.; Brezina, Marian; Jamroz, Ben; Jhurani, Chetan; Manteuffel, Thomas A.; Ruge, John

2012-05-01

High-order finite elements often have a higher accuracy per degree of freedom than the classical low-order finite elements. However, in the context of implicit time-stepping methods, high-order finite elements present challenges to the construction of efficient simulations due to the high cost of inverting the denser finite element matrix. There are many cases where simulations are limited by the memory required to store the matrix and/or the algorithmic components of the linear solver. We are particularly interested in preconditioned Krylov methods for linear systems generated by discretization of elliptic partial differential equations with high-order finite elements. Using a preconditioner like Algebraic Multigrid can be costly in terms of memory due to the need to store matrix information at the various levels. We present a novel method for defining a preconditioner for systems generated by high-order finite elements that is based on a much sparser system than the original high-order finite element system. We investigate the performance for non-uniform meshes on a cube and a cubed sphere mesh, showing that the sparser preconditioner is more efficient and uses significantly less memory. Finally, we explore new methods to construct the sparse preconditioner and examine their effectiveness for non-uniform meshes. We compare results to a direct use of Algebraic Multigrid as a preconditioner and to a two-level additive Schwarz method.
A universal preconditioner for simulating condensed phase materials.

PubMed

Packwood, David; Kermode, James; Mones, Letif; Bernstein, Noam; Woolley, John; Gould, Nicholas; Ortner, Christoph; Csányi, Gábor

2016-04-28

We introduce a universal sparse preconditioner that accelerates geometry optimisation and saddle point search tasks that are common in the atomic scale simulation of materials. Our preconditioner is based on the neighbourhood structure and we demonstrate the gain in computational efficiency in a wide range of materials that include metals, insulators, and molecular solids. The simple structure of the preconditioner means that the gains can be realised in practice not only when using expensive electronic structure models but also for fast empirical potentials. Even for relatively small systems of a few hundred atoms, we observe speedups of a factor of two or more, and the gain grows with system size. An open source Python implementation within the Atomic Simulation Environment is available, offering interfaces to a wide range of atomistic codes.
A universal preconditioner for simulating condensed phase materials

NASA Astrophysics Data System (ADS)

Packwood, David; Kermode, James; Mones, Letif; Bernstein, Noam; Woolley, John; Gould, Nicholas; Ortner, Christoph; Csányi, Gábor

2016-04-01

We introduce a universal sparse preconditioner that accelerates geometry optimisation and saddle point search tasks that are common in the atomic scale simulation of materials. Our preconditioner is based on the neighbourhood structure and we demonstrate the gain in computational efficiency in a wide range of materials that include metals, insulators, and molecular solids. The simple structure of the preconditioner means that the gains can be realised in practice not only when using expensive electronic structure models but also for fast empirical potentials. Even for relatively small systems of a few hundred atoms, we observe speedups of a factor of two or more, and the gain grows with system size. An open source Python implementation within the Atomic Simulation Environment is available, offering interfaces to a wide range of atomistic codes.
Multilevel Preconditioners for Reaction-Diffusion Problems with Discontinuous Coefficients

DOE PAGES

Kolev, Tzanio V.; Xu, Jinchao; Zhu, Yunrong

2015-08-23

In this study, we extend some of the multilevel convergence results obtained by Xu and Zhu, to the case of second order linear reaction-diffusion equations. Specifically, we consider the multilevel preconditioners for solving the linear systems arising from the linear finite element approximation of the problem, where both diffusion and reaction coefficients are piecewise-constant functions. We discuss in detail the influence of both the discontinuous reaction and diffusion coefficients to the performance of the classical BPX and multigrid V-cycle preconditioner.
Incomplete Gröbner basis as a preconditioner for polynomial systems

NASA Astrophysics Data System (ADS)

Sun, Yang; Tao, Yu-Hui; Bai, Feng-Shan

2009-04-01

Precondition plays a critical role in the numerical methods for large and sparse linear systems. It is also true for nonlinear algebraic systems. In this paper incomplete Gröbner basis (IGB) is proposed as a preconditioner of homotopy methods for polynomial systems of equations, which transforms a deficient system into a system with the same finite solutions, but smaller degree. The reduced system can thus be solved faster. Numerical results show the efficiency of the preconditioner.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using amore » variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.« less
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

PubMed

Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
PRIM: An Efficient Preconditioning Iterative Reweighted Least Squares Method for Parallel Brain MRI Reconstruction.

PubMed

Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou

2018-02-08

The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.
Sub-domain decomposition methods and computational controls for multibody dynamical systems. [of spacecraft structures

NASA Technical Reports Server (NTRS)

Menon, R. G.; Kurdila, A. J.

1992-01-01

This paper presents a concurrent methodology to simulate the dynamics of flexible multibody systems with a large number of degrees of freedom. A general class of open-loop structures is treated and a redundant coordinate formulation is adopted. A range space method is used in which the constraint forces are calculated using a preconditioned conjugate gradient method. By using a preconditioner motivated by the regular ordering of the directed graph of the structures, it is shown that the method is order N in the total number of coordinates of the system. The overall formulation has the advantage that it permits fine parallelization and does not rely on system topology to induce concurrency. It can be efficiently implemented on the present generation of parallel computers with a large number of processors. Validation of the method is presented via numerical simulations of space structures incorporating large number of flexible degrees of freedom.
A universal preconditioner for simulating condensed phase materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Packwood, David; Ortner, Christoph, E-mail: c.ortner@warwick.ac.uk; Kermode, James, E-mail: j.r.kermode@warwick.ac.uk

2016-04-28

We introduce a universal sparse preconditioner that accelerates geometry optimisation and saddle point search tasks that are common in the atomic scale simulation of materials. Our preconditioner is based on the neighbourhood structure and we demonstrate the gain in computational efficiency in a wide range of materials that include metals, insulators, and molecular solids. The simple structure of the preconditioner means that the gains can be realised in practice not only when using expensive electronic structure models but also for fast empirical potentials. Even for relatively small systems of a few hundred atoms, we observe speedups of a factor ofmore » two or more, and the gain grows with system size. An open source Python implementation within the Atomic Simulation Environment is available, offering interfaces to a wide range of atomistic codes.« less
Performance of an MPI-only semiconductor device simulator on a quad socket/quad core InfiniBand platform.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shadid, John Nicolas; Lin, Paul Tinphone

2009-01-01

This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device simulator on a capacity cluster with 272 compute nodes based on a homogeneous multicore node architecture utilizing 16 cores. The inter-node communication backbone for this Tri-Lab Linux Capacity Cluster (TLCC) machine is comprised of an InfiniBand interconnect. The nonuniform memory access (NUMA) nodes consist of 2.2 GHz quad socket/quad core AMD Opteron processors. The performance results for this study are obtained with a FE semiconductor device simulation code (Charon) that is based on a fully-coupled Newton-Krylov solver with domain decomposition and multilevel preconditioners. Scaling andmore » multicore performance results are presented for large-scale problems of 100+ million unknowns on up to 4096 cores. A parallel scaling comparison is also presented with the Cray XT3/4 Red Storm capability platform. The results indicate that an MPI-only programming model for utilizing the multicore nodes is reasonably efficient on all 16 cores per compute node. However, the results also indicated that the multilevel preconditioner, which is critical for large-scale capability type simulations, scales better on the Red Storm machine than the TLCC machine.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, Benjamin S.

The Futility package contains the following: 1) Definition of the size of integers and real numbers; 2) A generic Unit test harness; 3) Definitions for some basic extensions to the Fortran language: arbitrary length strings, a parameter list construct, exception handlers, command line processor, timers; 4) Geometry definitions: point, line, plane, box, cylinder, polyhedron; 5) File wrapper functions: standard Fortran input/output files, Fortran binary files, HDF5 files; 6) Parallel wrapper functions: MPI, and Open MP abstraction layers, partitioning algorithms; 7) Math utilities: BLAS, Matrix and Vector definitions, Linear Solver methods and wrappers for other TPLs (PETSC, MKL, etc), preconditioner classes;more » 8) Misc: random number generator, water saturation properties, sorting algorithms.« less
Graph Embedding Techniques for Bounding Condition Numbers of Incomplete Factor Preconditioning

NASA Technical Reports Server (NTRS)

Guattery, Stephen

1997-01-01

We extend graph embedding techniques for bounding the spectral condition number of preconditioned systems involving symmetric, irreducibly diagonally dominant M-matrices to systems where the preconditioner is not diagonally dominant. In particular, this allows us to bound the spectral condition number when the preconditioner is based on an incomplete factorization. We provide a review of previous techniques, describe our extension, and give examples both of a bound for a model problem, and of ways in which our techniques give intuitive way of looking at incomplete factor preconditioners.
PIXIE3D: A Parallel, Implicit, eXtended MHD 3D Code.

NASA Astrophysics Data System (ADS)

Chacon, L.; Knoll, D. A.

2004-11-01

We report on the development of PIXIE3D, a 3D parallel, fully implicit Newton-Krylov extended primitive-variable MHD code in general curvilinear geometry. PIXIE3D employs a second-order, finite-volume-based spatial discretization that satisfies remarkable properties such as being conservative, solenoidal in the magnetic field, non-dissipative, and stable in the absence of physical dissipation.(L. Chacón , phComput. Phys. Comm.) submitted (2004) PIXIE3D employs fully-implicit Newton-Krylov methods for the time advance. Currently, first and second-order implicit schemes are available, although higher-order temporal implicit schemes can be effortlessly implemented within the Newton-Krylov framework. A successful, scalable, MG physics-based preconditioning strategy, similar in concept to previous 2D MHD efforts,(L. Chacón et al., phJ. Comput. Phys). 178 (1), 15- 36 (2002); phJ. Comput. Phys., 188 (2), 573-592 (2003) has been developed. We are currently in the process of parallelizing the code using the PETSc library, and a Newton-Krylov-Schwarz approach for the parallel treatment of the preconditioner. In this poster, we will report on both the serial and parallel performance of PIXIE3D, focusing primarily on scalability and CPU speedup vs. an explicit approach.
Algorithmically scalable block preconditioner for fully implicit shallow-water equations in CAM-SE

DOE PAGES

Lott, P. Aaron; Woodward, Carol S.; Evans, Katherine J.

2014-10-19

Performing accurate and efficient numerical simulation of global atmospheric climate models is challenging due to the disparate length and time scales over which physical processes interact. Implicit solvers enable the physical system to be integrated with a time step commensurate with processes being studied. The dominant cost of an implicit time step is the ancillary linear system solves, so we have developed a preconditioner aimed at improving the efficiency of these linear system solves. Our preconditioner is based on an approximate block factorization of the linearized shallow-water equations and has been implemented within the spectral element dynamical core within themore » Community Atmospheric Model (CAM-SE). Furthermore, in this paper we discuss the development and scalability of the preconditioner for a suite of test cases with the implicit shallow-water solver within CAM-SE.« less
Solving Graph Laplacian Systems Through Recursive Bisections and Two-Grid Preconditioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ponce, Colin; Vassilevski, Panayot S.

2016-02-18

We present a parallelizable direct method for computing the solution to graph Laplacian-based linear systems derived from graphs that can be hierarchically bipartitioned with small edge cuts. For a graph of size n with constant-size edge cuts, our method decomposes a graph Laplacian in time O(n log n), and then uses that decomposition to perform a linear solve in time O(n log n). We then use the developed technique to design a preconditioner for graph Laplacians that do not have this property. Finally, we augment this preconditioner with a two-grid method that accounts for much of the preconditioner's weaknesses. Wemore » present an analysis of this method, as well as a general theorem for the condition number of a general class of two-grid support graph-based preconditioners. Numerical experiments illustrate the performance of the studied methods.« less
A note on the preconditioner Pm=(I+Sm)

NASA Astrophysics Data System (ADS)

Kohno, Toshiyuki; Niki, Hiroshi

2009-03-01

Kotakemori et al. [H. Kotakemori, K. Harada, M. Morimoto, H. Niki, A comparison theorem for the iterative method with the preconditioner (I+Smax), Journal of Computational and Applied Mathematics 145 (2002) 373-378] have reported that the convergence rate of the iterative method with a preconditioner Pm=(I+Sm) was superior to one of the modified Gauss-Seidel method under the condition. These authors derived a theorem comparing the Gauss-Seidel method with the proposed method. However, through application of a counter example, Wen Li [Wen Li, A note on the preconditioned GaussSeidel (GS) method for linear systems, Journal of Computational and Applied Mathematics 182 (2005) 81-91] pointed out that there exists a special matrix that does not satisfy this comparison theorem. In this note, we analyze the reason why such a to counter example may be produced, and propose a preconditioner to overcome this problem.
Local multiplicative Schwarz algorithms for convection-diffusion equations

NASA Technical Reports Server (NTRS)

Cai, Xiao-Chuan; Sarkis, Marcus

1995-01-01

We develop a new class of overlapping Schwarz type algorithms for solving scalar convection-diffusion equations discretized by finite element or finite difference methods. The preconditioners consist of two components, namely, the usual two-level additive Schwarz preconditioner and the sum of some quadratic terms constructed by using products of ordered neighboring subdomain preconditioners. The ordering of the subdomain preconditioners is determined by considering the direction of the flow. We prove that the algorithms are optimal in the sense that the convergence rates are independent of the mesh size, as well as the number of subdomains. We show by numerical examples that the new algorithms are less sensitive to the direction of the flow than either the classical multiplicative Schwarz algorithms, and converge faster than the additive Schwarz algorithms. Thus, the new algorithms are more suitable for fluid flow applications than the classical additive or multiplicative Schwarz algorithms.
Efficiency and flexibility using implicit methods within atmosphere dycores

NASA Astrophysics Data System (ADS)

Evans, K. J.; Archibald, R.; Norman, M. R.; Gardner, D. J.; Woodward, C. S.; Worley, P.; Taylor, M.

2016-12-01

A suite of explicit and implicit methods are evaluated for a range of configurations of the shallow water dynamical core within the spectral-element Community Atmosphere Model (CAM-SE) to explore their relative computational performance. The configurations are designed to explore the attributes of each method under different but relevant model usage scenarios including varied spectral order within an element, static regional refinement, and scaling to large problem sizes. The limitations and benefits of using explicit versus implicit, with different discretizations and parameters, are discussed in light of trade-offs such as MPI communication, memory, and inherent efficiency bottlenecks. For the regionally refined shallow water configurations, the implicit BDF2 method is about the same efficiency as an explicit Runge-Kutta method, without including a preconditioner. Performance of the implicit methods with the residual function executed on a GPU is also presented; there is speed up for the residual relative to a CPU, but overwhelming transfer costs motivate moving more of the solver to the device. Given the performance behavior of implicit methods within the shallow water dynamical core, the recommendation for future work using implicit solvers is conditional based on scale separation and the stiffness of the problem. The strong growth of linear iterations with increasing resolution or time step size is the main bottleneck to computational efficiency. Within the hydrostatic dynamical core, of CAM-SE, we present results utilizing approximate block factorization preconditioners implemented using the Trilinos library of solvers. They reduce the cost of linear system solves and improve parallel scalability. We provide a summary of the remaining efficiency considerations within the preconditioner and utilization of the GPU, as well as a discussion about the benefits of a time stepping method that provides converged and stable solutions for a much wider range of time step sizes. As more complex model components, for example new physics and aerosols, are connected in the model, having flexibility in the time stepping will enable more options for combining and resolving multiple scales of behavior.
Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

NASA Astrophysics Data System (ADS)

Xing, F.; Masson, R.; Lopez, S.

2017-09-01

This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.

Parallel iterative solution for h and p approximations of the shallow water equations

USGS Publications Warehouse

Barragy, E.J.; Walters, R.A.

1998-01-01

A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.
A parallel multi-domain solution methodology applied to nonlinear thermal transport problems in nuclear fuel pins

DOE PAGES

Philip, Bobby; Berrill, Mark A.; Allu, Srikanth; ...

2015-01-26

We describe an efficient and nonlinearly consistent parallel solution methodology for solving coupled nonlinear thermal transport problems that occur in nuclear reactor applications over hundreds of individual 3D physical subdomains. Efficiency is obtained by leveraging knowledge of the physical domains, the physics on individual domains, and the couplings between them for preconditioning within a Jacobian Free Newton Krylov method. Details of the computational infrastructure that enabled this work, namely the open source Advanced Multi-Physics (AMP) package developed by the authors are described. The details of verification and validation experiments, and parallel performance analysis in weak and strong scaling studies demonstratingmore » the achieved efficiency of the algorithm are presented. Moreover, numerical experiments demonstrate that the preconditioner developed is independent of the number of fuel subdomains in a fuel rod, which is particularly important when simulating different types of fuel rods. Finally, we demonstrate the power of the coupling methodology by considering problems with couplings between surface and volume physics and coupling of nonlinear thermal transport in fuel rods to an external radiation transport code.« less
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carleton, James Brian; Parks, Michael L.

Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov; Weare, Jonathan Q., E-mail: weare@uchicago.edu; Weare, John H., E-mail: jweare@ucsd.edu

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for themore » trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.« less
Cosmic Microwave Background Mapmaking with a Messenger Field

NASA Astrophysics Data System (ADS)

Huffenberger, Kevin M.; Næss, Sigurd K.

2018-01-01

We apply a messenger field method to solve the linear minimum-variance mapmaking equation in the context of Cosmic Microwave Background (CMB) observations. In simulations, the method produces sky maps that converge significantly faster than those from a conjugate gradient descent algorithm with a diagonal preconditioner, even though the computational cost per iteration is similar. The messenger method recovers large scales in the map better than conjugate gradient descent, and yields a lower overall χ2. In the single, pencil beam approximation, each iteration of the messenger mapmaking procedure produces an unbiased map, and the iterations become more optimal as they proceed. A variant of the method can handle differential data or perform deconvolution mapmaking. The messenger method requires no preconditioner, but a high-quality solution needs a cooling parameter to control the convergence. We study the convergence properties of this new method and discuss how the algorithm is feasible for the large data sets of current and future CMB experiments.
Un-collided-flux preconditioning for the first order transport equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rigley, M.; Koebbe, J.; Drumm, C.

2013-07-01

Two codes were tested for the first order neutron transport equation using finite element methods. The un-collided-flux solution is used as a preconditioner for each of these methods. These codes include a least squares finite element method and a discontinuous finite element method. The performance of each code is shown on problems in one and two dimensions. The un-collided-flux preconditioner shows good speedup on each of the given methods. The un-collided-flux preconditioner has been used on the second-order equation, and here we extend those results to the first order equation. (authors)
An iterative solver for the 3D Helmholtz equation

NASA Astrophysics Data System (ADS)

Belonosov, Mikhail; Dmitriev, Maxim; Kostin, Victor; Neklyudov, Dmitry; Tcheverda, Vladimir

2017-09-01

We develop a frequency-domain iterative solver for numerical simulation of acoustic waves in 3D heterogeneous media. It is based on the application of a unique preconditioner to the Helmholtz equation that ensures convergence for Krylov subspace iteration methods. Effective inversion of the preconditioner involves the Fast Fourier Transform (FFT) and numerical solution of a series of boundary value problems for ordinary differential equations. Matrix-by-vector multiplication for iterative inversion of the preconditioned matrix involves inversion of the preconditioner and pointwise multiplication of grid functions. Our solver has been verified by benchmarking against exact solutions and a time-domain solver.
Multigrid solvers and multigrid preconditioners for the solution of variational data assimilation problems

NASA Astrophysics Data System (ADS)

Debreu, Laurent; Neveu, Emilie; Simon, Ehouarn; Le Dimet, Francois Xavier; Vidard, Arthur

2014-05-01

In order to lower the computational cost of the variational data assimilation process, we investigate the use of multigrid methods to solve the associated optimal control system. On a linear advection equation, we study the impact of the regularization term on the optimal control and the impact of discretization errors on the efficiency of the coarse grid correction step. We show that even if the optimal control problem leads to the solution of an elliptic system, numerical errors introduced by the discretization can alter the success of the multigrid methods. The view of the multigrid iteration as a preconditioner for a Krylov optimization method leads to a more robust algorithm. A scale dependent weighting of the multigrid preconditioner and the usual background error covariance matrix based preconditioner is proposed and brings significant improvements. [1] Laurent Debreu, Emilie Neveu, Ehouarn Simon, François-Xavier Le Dimet and Arthur Vidard, 2014: Multigrid solvers and multigrid preconditioners for the solution of variational data assimilation problems, submitted to QJRMS, http://hal.inria.fr/hal-00874643 [2] Emilie Neveu, Laurent Debreu and François-Xavier Le Dimet, 2011: Multigrid methods and data assimilation - Convergence study and first experiments on non-linear equations, ARIMA, 14, 63-80, http://intranet.inria.fr/international/arima/014/014005.html
Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation

NASA Astrophysics Data System (ADS)

Costa, Carlos A. N.; Campos, Itamara S.; Costa, Jessé C.; Neto, Francisco A.; Schleicher, Jörg; Novais, Amélia

2013-08-01

Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality.
Repeated Red-Black ordering

NASA Astrophysics Data System (ADS)

Ciarlet, P.

1994-09-01

Hereafter, we describe and analyze, from both a theoretical and a numerical point of view, an iterative method for efficiently solving symmetric elliptic problems with possibly discontinuous coefficients. In the following, we use the Preconditioned Conjugate Gradient method to solve the symmetric positive definite linear systems which arise from the finite element discretization of the problems. We focus our interest on sparse and efficient preconditioners. In order to define the preconditioners, we perform two steps: first we reorder the unknowns and then we carry out a (modified) incomplete factorization of the original matrix. We study numerically and theoretically two preconditioners, the second preconditioner corresponding to the one investigated by Brand and Heinemann [2]. We prove convergence results about the Poisson equation with either Dirichlet or periodic boundary conditions. For a meshsizeh, Brand proved that the condition number of the preconditioned system is bounded byO(h-1/2) for Dirichlet boundary conditions. By slightly modifying the preconditioning process, we prove that the condition number is bounded byO(h-1/3).
Preconditioned conjugate gradient methods for the compressible Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.

1990-01-01

The compressible Navier-Stokes equations are solved for a variety of two-dimensional inviscid and viscous problems by preconditioned conjugate gradient-like algorithms. Roe's flux difference splitting technique is used to discretize the inviscid fluxes. The viscous terms are discretized by using central differences. An algebraic turbulence model is also incorporated. The system of linear equations which arises out of the linearization of a fully implicit scheme is solved iteratively by the well known methods of GMRES (Generalized Minimum Residual technique) and Chebyschev iteration. Incomplete LU factorization and block diagonal factorization are used as preconditioners. The resulting algorithm is competitive with the best current schemes, but has wide applications in parallel computing and unstructured mesh computations.
Analysis of Monte Carlo accelerated iterative methods for sparse linear systems: Analysis of Monte Carlo accelerated iterative methods for sparse linear systems

DOE PAGES

Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...

2017-03-05

Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Domain decomposition methods in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Gropp, William D.; Keyes, David E.

1991-01-01

The divide-and-conquer paradigm of iterative domain decomposition, or substructuring, has become a practical tool in computational fluid dynamic applications because of its flexibility in accommodating adaptive refinement through locally uniform (or quasi-uniform) grids, its ability to exploit multiple discretizations of the operator equations, and the modular pathway it provides towards parallelism. These features are illustrated on the classic model problem of flow over a backstep using Newton's method as the nonlinear iteration. Multiple discretizations (second-order in the operator and first-order in the preconditioner) and locally uniform mesh refinement pay dividends separately, and they can be combined synergistically. Sample performance results are included from an Intel iPSC/860 hypercube implementation.
Efficient solvers for coupled models in respiratory mechanics.

PubMed

Verdugo, Francesc; Roth, Christian J; Yoshihara, Lena; Wall, Wolfgang A

2017-02-01

We present efficient preconditioners for one of the most physiologically relevant pulmonary models currently available. Our underlying motivation is to enable the efficient simulation of such a lung model on high-performance computing platforms in order to assess mechanical ventilation strategies and contributing to design more protective patient-specific ventilation treatments. The system of linear equations to be solved using the proposed preconditioners is essentially the monolithic system arising in fluid-structure interaction (FSI) extended by additional algebraic constraints. The introduction of these constraints leads to a saddle point problem that cannot be solved with usual FSI preconditioners available in the literature. The key ingredient in this work is to use the idea of the semi-implicit method for pressure-linked equations (SIMPLE) for getting rid of the saddle point structure, resulting in a standard FSI problem that can be treated with available techniques. The numerical examples show that the resulting preconditioners approach the optimal performance of multigrid methods, even though the lung model is a complex multiphysics problem. Moreover, the preconditioners are robust enough to deal with physiologically relevant simulations involving complex real-world patient-specific lung geometries. The same approach is applicable to other challenging biomedical applications where coupling between flow and tissue deformations is modeled with additional algebraic constraints. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Aztec user`s guide. Version 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1995-10-01

Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less
Parallel Computation of Flow in Heterogeneous Media Modelled by Mixed Finite Elements

NASA Astrophysics Data System (ADS)

Cliffe, K. A.; Graham, I. G.; Scheichl, R.; Stals, L.

2000-11-01

In this paper we describe a fast parallel method for solving highly ill-conditioned saddle-point systems arising from mixed finite element simulations of stochastic partial differential equations (PDEs) modelling flow in heterogeneous media. Each realisation of these stochastic PDEs requires the solution of the linear first-order velocity-pressure system comprising Darcy's law coupled with an incompressibility constraint. The chief difficulty is that the permeability may be highly variable, especially when the statistical model has a large variance and a small correlation length. For reasonable accuracy, the discretisation has to be extremely fine. We solve these problems by first reducing the saddle-point formulation to a symmetric positive definite (SPD) problem using a suitable basis for the space of divergence-free velocities. The reduced problem is solved using parallel conjugate gradients preconditioned with an algebraically determined additive Schwarz domain decomposition preconditioner. The result is a solver which exhibits a good degree of robustness with respect to the mesh size as well as to the variance and to physically relevant values of the correlation length of the underlying permeability field. Numerical experiments exhibit almost optimal levels of parallel efficiency. The domain decomposition solver (DOUG, http://www.maths.bath.ac.uk/~parsoft) used here not only is applicable to this problem but can be used to solve general unstructured finite element systems on a wide range of parallel architectures.
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH

NASA Astrophysics Data System (ADS)

Lee, D.; Gopal, S.; Mohapatra, P.

2012-07-01

We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
A scalable block-preconditioning strategy for divergence-conforming B-spline discretizations of the Stokes problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cortes, Adriano M.; Dalcin, Lisandro; Sarmiento, Adel F.

The recently introduced divergence-conforming B-spline discretizations allow the construction of smooth discrete velocity–pressure pairs for viscous incompressible flows that are at the same time inf–sup stable and pointwise divergence-free. When applied to the discretized Stokes problem, these spaces generate a symmetric and indefinite saddle-point linear system. The iterative method of choice to solve such system is the Generalized Minimum Residual Method. This method lacks robustness, and one remedy is to use preconditioners. For linear systems of saddle-point type, a large family of preconditioners can be obtained by using a block factorization of the system. In this paper, we show howmore » the nesting of “black-box” solvers and preconditioners can be put together in a block triangular strategy to build a scalable block preconditioner for the Stokes system discretized by divergence-conforming B-splines. Lastly, besides the known cavity flow problem, we used for benchmark flows defined on complex geometries: an eccentric annulus and hollow torus of an eccentric annular cross-section.« less
Assessment of Preconditioner for a USM3D Hierarchical Adaptive Nonlinear Method (HANIM) (Invited)

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.

2016-01-01

Enhancements to the previously reported mixed-element USM3D Hierarchical Adaptive Nonlinear Iteration Method (HANIM) framework have been made to further improve robustness, efficiency, and accuracy of computational fluid dynamic simulations. The key enhancements include a multi-color line-implicit preconditioner, a discretely consistent symmetry boundary condition, and a line-mapping method for the turbulence source term discretization. The USM3D iterative convergence for the turbulent flows is assessed on four configurations. The configurations include a two-dimensional (2D) bump-in-channel, the 2D NACA 0012 airfoil, a three-dimensional (3D) bump-in-channel, and a 3D hemisphere cylinder. The Reynolds Averaged Navier Stokes (RANS) solutions have been obtained using a Spalart-Allmaras turbulence model and families of uniformly refined nested grids. Two types of HANIM solutions using line- and point-implicit preconditioners have been computed. Additional solutions using the point-implicit preconditioner alone (PA) method that broadly represents the baseline solver technology have also been computed. The line-implicit HANIM shows superior iterative convergence in most cases with progressively increasing benefits on finer grids.
A scalable block-preconditioning strategy for divergence-conforming B-spline discretizations of the Stokes problem

DOE PAGES

Cortes, Adriano M.; Dalcin, Lisandro; Sarmiento, Adel F.; ...

2016-10-19

The recently introduced divergence-conforming B-spline discretizations allow the construction of smooth discrete velocity–pressure pairs for viscous incompressible flows that are at the same time inf–sup stable and pointwise divergence-free. When applied to the discretized Stokes problem, these spaces generate a symmetric and indefinite saddle-point linear system. The iterative method of choice to solve such system is the Generalized Minimum Residual Method. This method lacks robustness, and one remedy is to use preconditioners. For linear systems of saddle-point type, a large family of preconditioners can be obtained by using a block factorization of the system. In this paper, we show howmore » the nesting of “black-box” solvers and preconditioners can be put together in a block triangular strategy to build a scalable block preconditioner for the Stokes system discretized by divergence-conforming B-splines. Lastly, besides the known cavity flow problem, we used for benchmark flows defined on complex geometries: an eccentric annulus and hollow torus of an eccentric annular cross-section.« less

Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE PAGES

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

2017-09-21

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, Ajit; Khalil, Mohammad; Pettit, Chris

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD

NASA Technical Reports Server (NTRS)

Gropp, W. D.; Keyes, D. E.; McInnes, L. C.; Tidriri, M. D.

1998-01-01

Implicit solution methods are important in applications modeled by PDEs with disparate temporal and spatial scales. Because such applications require high resolution with reasonable turnaround, "routine" parallelization is essential. The pseudo-transient matrix-free Newton-Krylov-Schwarz (Psi-NKS) algorithmic framework is presented as an answer. We show that, for the classical problem of three-dimensional transonic Euler flow about an M6 wing, Psi-NKS can simultaneously deliver: globalized, asymptotically rapid convergence through adaptive pseudo- transient continuation and Newton's method-, reasonable parallelizability for an implicit method through deferred synchronization and favorable communication-to-computation scaling in the Krylov linear solver; and high per- processor performance through attention to distributed memory and cache locality, especially through the Schwarz preconditioner. Two discouraging features of Psi-NKS methods are their sensitivity to the coding of the underlying PDE discretization and the large number of parameters that must be selected to govern convergence. We therefore distill several recommendations from our experience and from our reading of the literature on various algorithmic components of Psi-NKS, and we describe a freely available, MPI-based portable parallel software implementation of the solver employed here.
Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

NASA Astrophysics Data System (ADS)

Arteaga, Santiago Egido

1998-12-01

The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the linearization strategies considered and whose computational cost is negligible. The algebraic properties of these systems depend on both the discretization and nonlinear method used. We study in detail the positive definiteness and skewsymmetry of the advection submatrices (essentially, convection-diffusion problems). We propose a discretization based on a new trilinear form for Newton's method. We solve the linear systems using three Krylov subspace methods, GMRES, QMR and TFQMR, and compare the advantages of each. Our emphasis is on parallel algorithms, and so we consider preconditioners suitable for parallel computers such as line variants of the Jacobi and Gauss- Seidel methods, alternating direction implicit methods, and Chebyshev and least squares polynomial preconditioners. These work well for moderate viscosities (moderate Reynolds number). For small viscosities we show that effective parallel solution of the advection subproblem is a critical factor to improve performance. Implementation details on a CM-5 are presented.
Keeping it Together: Advanced algorithms and software for magma dynamics (and other coupled multi-physics problems)

NASA Astrophysics Data System (ADS)

Spiegelman, M.; Wilson, C. R.

2011-12-01

A quantitative theory of magma production and transport is essential for understanding the dynamics of magmatic plate boundaries, intra-plate volcanism and the geochemical evolution of the planet. It also provides one of the most challenging computational problems in solid Earth science, as it requires consistent coupling of fluid and solid mechanics together with the thermodynamics of melting and reactive flows. Considerable work on these problems over the past two decades shows that small changes in assumptions of coupling (e.g. the relationship between melt fraction and solid rheology), can have profound changes on the behavior of these systems which in turn affects critical computational choices such as discretizations, solvers and preconditioners. To make progress in exploring and understanding this physically rich system requires a computational framework that allows more flexible, high-level description of multi-physics problems as well as increased flexibility in composing efficient algorithms for solution of the full non-linear coupled system. Fortunately, recent advances in available computational libraries and algorithms provide a platform for implementing such a framework. We present results from a new model building system that leverages functionality from both the FEniCS project (www.fenicsproject.org) and PETSc libraries (www.mcs.anl.gov/petsc) along with a model independent options system and gui, Spud (amcg.ese.ic.ac.uk/Spud). Key features from FEniCS include fully unstructured FEM with a wide range of elements; a high-level language (ufl) and code generation compiler (FFC) for describing the weak forms of residuals and automatic differentiation for calculation of exact and approximate jacobians. The overall strategy is to monitor/calculate residuals and jacobians for the entire non-linear system of equations within a global non-linear solve based on PETSc's SNES routines. PETSc already provides a wide range of solvers and preconditioners, from parallel sparse direct to algebraic multigrid, that can be chosen at runtime. In particular, we make extensive use of PETSc's FieldSplit block preconditioners that allow us to use optimal solvers for subproblems (such as Stokes, or advection/diffusion of temperature) as preconditioners for the full problem. Thus these routines let us reuse effective solving recipes/splittings from previous experience while monitoring the convergence of the global problem. These techniques often yield quadratic (Newton like) convergence for the work of standard Picard schemes. We will illustrate this new framework with examples from the Magma Dynamic Demonstration suite (MADDs) of well understood magma dynamics benchmark problems including stokes flow in ridge geometries, magmatic solitary waves and shear-driven melt bands. While development of this system has been driven by magma dynamics, this framework is much more general and can be used for a wide range of PDE based multi-physics models.
Nonlinear study of the parallel velocity/tearing instability using an implicit, nonlinear resistive MHD solver

NASA Astrophysics Data System (ADS)

Chacon, L.; Finn, J. M.; Knoll, D. A.

2000-10-01

Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
Preconditioner and convergence study for the Quantum Computer Aided Design (QCAD) nonlinear poisson problem posed on the Ottawa Flat 270 design geometry.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kalashnikova, Irina

2012-05-01

A numerical study aimed to evaluate different preconditioners within the Trilinos Ifpack and ML packages for the Quantum Computer Aided Design (QCAD) non-linear Poisson problem implemented within the Albany code base and posed on the Ottawa Flat 270 design geometry is performed. This study led to some new development of Albany that allows the user to select an ML preconditioner with Zoltan repartitioning based on nodal coordinates, which is summarized. Convergence of the numerical solutions computed within the QCAD computational suite with successive mesh refinement is examined in two metrics, the mean value of the solution (an L{sup 1} norm)more » and the field integral of the solution (L{sup 2} norm).« less
Tensor-product preconditioners for higher-order space-time discontinuous Galerkin methods

NASA Astrophysics Data System (ADS)

Diosady, Laslo T.; Murman, Scott M.

2017-02-01

A space-time discontinuous-Galerkin spectral-element discretization is presented for direct numerical simulation of the compressible Navier-Stokes equations. An efficient solution technique based on a matrix-free Newton-Krylov method is developed in order to overcome the stiffness associated with high solution order. The use of tensor-product basis functions is key to maintaining efficiency at high-order. Efficient preconditioning methods are presented which can take advantage of the tensor-product formulation. A diagonalized Alternating-Direction-Implicit (ADI) scheme is extended to the space-time discontinuous Galerkin discretization. A new preconditioner for the compressible Euler/Navier-Stokes equations based on the fast-diagonalization method is also presented. Numerical results demonstrate the effectiveness of these preconditioners for the direct numerical simulation of subsonic turbulent flows.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lott, P. Aaron; Woodward, Carol S.; Evans, Katherine J.

Performing accurate and efficient numerical simulation of global atmospheric climate models is challenging due to the disparate length and time scales over which physical processes interact. Implicit solvers enable the physical system to be integrated with a time step commensurate with processes being studied. The dominant cost of an implicit time step is the ancillary linear system solves, so we have developed a preconditioner aimed at improving the efficiency of these linear system solves. Our preconditioner is based on an approximate block factorization of the linearized shallow-water equations and has been implemented within the spectral element dynamical core within themore » Community Atmospheric Model (CAM-SE). Furthermore, in this paper we discuss the development and scalability of the preconditioner for a suite of test cases with the implicit shallow-water solver within CAM-SE.« less
Tensor-Product Preconditioners for Higher-Order Space-Time Discontinuous Galerkin Methods

NASA Technical Reports Server (NTRS)

Diosady, Laslo T.; Murman, Scott M.

2016-01-01

space-time discontinuous-Galerkin spectral-element discretization is presented for direct numerical simulation of the compressible Navier-Stokes equat ions. An efficient solution technique based on a matrix-free Newton-Krylov method is developed in order to overcome the stiffness associated with high solution order. The use of tensor-product basis functions is key to maintaining efficiency at high order. Efficient preconditioning methods are presented which can take advantage of the tensor-product formulation. A diagonalized Alternating-Direction-Implicit (ADI) scheme is extended to the space-time discontinuous Galerkin discretization. A new preconditioner for the compressible Euler/Navier-Stokes equations based on the fast-diagonalization method is also presented. Numerical results demonstrate the effectiveness of these preconditioners for the direct numerical simulation of subsonic turbulent flows.
Weighted graph based ordering techniques for preconditioned conjugate gradient methods

NASA Technical Reports Server (NTRS)

Clift, Simon S.; Tang, Wei-Pai

1994-01-01

We describe the basis of a matrix ordering heuristic for improving the incomplete factorization used in preconditioned conjugate gradient techniques applied to anisotropic PDE's. Several new matrix ordering techniques, derived from well-known algorithms in combinatorial graph theory, which attempt to implement this heuristic, are described. These ordering techniques are tested against a number of matrices arising from linear anisotropic PDE's, and compared with other matrix ordering techniques. A variation of RCM is shown to generally improve the quality of incomplete factorization preconditioners.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.

1999-10-14

Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less
A hierarchical preconditioner for the electric field integral equation on unstructured meshes based on primal and dual Haar bases

NASA Astrophysics Data System (ADS)

Adrian, S. B.; Andriulli, F. P.; Eibert, T. F.

2017-02-01

A new hierarchical basis preconditioner for the electric field integral equation (EFIE) operator is introduced. In contrast to existing hierarchical basis preconditioners, it works on arbitrary meshes and preconditions both the vector and the scalar potential within the EFIE operator. This is obtained by taking into account that the vector and the scalar potential discretized with loop-star basis functions are related to the hypersingular and the single layer operator (i.e., the well known integral operators from acoustics). For the single layer operator discretized with piecewise constant functions, a hierarchical preconditioner can easily be constructed. Thus the strategy we propose in this work for preconditioning the EFIE is the transformation of the scalar and the vector potential into operators equivalent to the single layer operator and to its inverse. More specifically, when the scalar potential is discretized with star functions as source and testing functions, the resulting matrix is a single layer operator discretized with piecewise constant functions and multiplied left and right with two additional graph Laplacian matrices. By inverting these graph Laplacian matrices, the discretized single layer operator is obtained, which can be preconditioned with the hierarchical basis. Dually, when the vector potential is discretized with loop functions, the resulting matrix can be interpreted as a hypersingular operator discretized with piecewise linear functions. By leveraging on a scalar Calderón identity, we can interpret this operator as spectrally equivalent to the inverse single layer operator. Then we use a linear-in-complexity, closed-form inverse of the dual hierarchical basis to precondition the hypersingular operator. The numerical results show the effectiveness of the proposed preconditioner and the practical impact of theoretical developments in real case scenarios.
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

NASA Astrophysics Data System (ADS)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.

2018-01-01

We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
Three dimensional modelling of earthquake rupture cycles on frictional faults

NASA Astrophysics Data System (ADS)

Simpson, Guy; May, Dave

2017-04-01

We are developing an efficient MPI-parallel numerical method to simulate earthquake sequences on preexisting faults embedding within a three dimensional viscoelastic half-space. We solve the velocity form of the elasto(visco)dynamic equations using a continuous Galerkin Finite Element Method on an unstructured pentahedral mesh, which thus permits local spatial refinement in the vicinity of the fault. Friction sliding is coupled to the viscoelastic solid via rate- and state-dependent friction laws using the split-node technique. Our coupled formulation employs a picard-type non-linear solver with a fully implicit, first order accurate time integrator that utilises an adaptive time step that efficiently evolves the system through multiple seismic cycles. The implementation leverages advanced parallel solvers, preconditioners and linear algebra from the Portable Extensible Toolkit for Scientific Computing (PETSc) library. The model can treat heterogeneous frictional properties and stress states on the fault and surrounding solid as well as non-planar fault geometries. Preliminary tests show that the model successfully reproduces dynamic rupture on a vertical strike-slip fault in a half-space governed by rate-state friction with the ageing law.
High-Order Methods for Incompressible Fluid Flow

NASA Astrophysics Data System (ADS)

Deville, M. O.; Fischer, P. F.; Mund, E. H.

2002-08-01

High-order numerical methods provide an efficient approach to simulating many physical problems. This book considers the range of mathematical, engineering, and computer science topics that form the foundation of high-order numerical methods for the simulation of incompressible fluid flows in complex domains. Introductory chapters present high-order spatial and temporal discretizations for one-dimensional problems. These are extended to multiple space dimensions with a detailed discussion of tensor-product forms, multi-domain methods, and preconditioners for iterative solution techniques. Numerous discretizations of the steady and unsteady Stokes and Navier-Stokes equations are presented, with particular sttention given to enforcement of imcompressibility. Advanced discretizations. implementation issues, and parallel and vector performance are considered in the closing sections. Numerous examples are provided throughout to illustrate the capabilities of high-order methods in actual applications.
S{sub 2}SA preconditioning for the S{sub n} equations with strictly non negative spatial discretization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bruss, D. E.; Morel, J. E.; Ragusa, J. C.

2013-07-01

Preconditioners based upon sweeps and diffusion-synthetic acceleration have been constructed and applied to the zeroth and first spatial moments of the 1-D S{sub n} transport equation using a strictly non negative nonlinear spatial closure. Linear and nonlinear preconditioners have been analyzed. The effectiveness of various combinations of these preconditioners are compared. In one dimension, nonlinear sweep preconditioning is shown to be superior to linear sweep preconditioning, and DSA preconditioning using nonlinear sweeps in conjunction with a linear diffusion equation is found to be essentially equivalent to nonlinear sweeps in conjunction with a nonlinear diffusion equation. The ability to use amore » linear diffusion equation has important implications for preconditioning the S{sub n} equations with a strictly non negative spatial discretization in multiple dimensions. (authors)« less
Improved preconditioned conjugate gradient algorithm and application in 3D inversion of gravity-gradiometry data

NASA Astrophysics Data System (ADS)

Wang, Tai-Han; Huang, Da-Nian; Ma, Guo-Qing; Meng, Zhao-Hai; Li, Ye

2017-06-01

With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noisecontaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airborne gravity-gradiometry data from Vinton salt dome (southwest Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.
Udzawa-type iterative method with parareal preconditioner for a parabolic optimal control problem

NASA Astrophysics Data System (ADS)

Lapin, A.; Romanenko, A.

2016-11-01

The article deals with the optimal control problem with the parabolic equation as state problem. There are point-wise constraints on the state and control functions. The objective functional involves the observation given in the domain at each moment. The conditions for convergence Udzawa's type iterative method are given. The parareal method to inverse preconditioner is given. The results of calculations are presented.
Tensor-product preconditioners for a space-time discontinuous Galerkin method

NASA Astrophysics Data System (ADS)

Diosady, Laslo T.; Murman, Scott M.

2014-10-01

A space-time discontinuous Galerkin spectral element discretization is presented for direct numerical simulation of the compressible Navier-Stokes equations. An efficient solution technique based on a matrix-free Newton-Krylov method is presented. A diagonalized alternating direction implicit preconditioner is extended to a space-time formulation using entropy variables. The effectiveness of this technique is demonstrated for the direct numerical simulation of turbulent flow in a channel.

Preconditioned Alternating Projection Algorithms for Maximum a Posteriori ECT Reconstruction

PubMed Central

Krol, Andrzej; Li, Si; Shen, Lixin; Xu, Yuesheng

2012-01-01

We propose a preconditioned alternating projection algorithm (PAPA) for solving the maximum a posteriori (MAP) emission computed tomography (ECT) reconstruction problem. Specifically, we formulate the reconstruction problem as a constrained convex optimization problem with the total variation (TV) regularization. We then characterize the solution of the constrained convex optimization problem and show that it satisfies a system of fixed-point equations defined in terms of two proximity operators raised from the convex functions that define the TV-norm and the constrain involved in the problem. The characterization (of the solution) via the proximity operators that define two projection operators naturally leads to an alternating projection algorithm for finding the solution. For efficient numerical computation, we introduce to the alternating projection algorithm a preconditioning matrix (the EM-preconditioner) for the dense system matrix involved in the optimization problem. We prove theoretically convergence of the preconditioned alternating projection algorithm. In numerical experiments, performance of our algorithms, with an appropriately selected preconditioning matrix, is compared with performance of the conventional MAP expectation-maximization (MAP-EM) algorithm with TV regularizer (EM-TV) and that of the recently developed nested EM-TV algorithm for ECT reconstruction. Based on the numerical experiments performed in this work, we observe that the alternating projection algorithm with the EM-preconditioner outperforms significantly the EM-TV in all aspects including the convergence speed, the noise in the reconstructed images and the image quality. It also outperforms the nested EM-TV in the convergence speed while providing comparable image quality. PMID:23271835
Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction

NASA Astrophysics Data System (ADS)

Krol, Andrzej; Li, Si; Shen, Lixin; Xu, Yuesheng

2012-11-01

We propose a preconditioned alternating projection algorithm (PAPA) for solving the maximum a posteriori (MAP) emission computed tomography (ECT) reconstruction problem. Specifically, we formulate the reconstruction problem as a constrained convex optimization problem with the total variation (TV) regularization. We then characterize the solution of the constrained convex optimization problem and show that it satisfies a system of fixed-point equations defined in terms of two proximity operators raised from the convex functions that define the TV-norm and the constraint involved in the problem. The characterization (of the solution) via the proximity operators that define two projection operators naturally leads to an alternating projection algorithm for finding the solution. For efficient numerical computation, we introduce to the alternating projection algorithm a preconditioning matrix (the EM-preconditioner) for the dense system matrix involved in the optimization problem. We prove theoretically convergence of the PAPA. In numerical experiments, performance of our algorithms, with an appropriately selected preconditioning matrix, is compared with performance of the conventional MAP expectation-maximization (MAP-EM) algorithm with TV regularizer (EM-TV) and that of the recently developed nested EM-TV algorithm for ECT reconstruction. Based on the numerical experiments performed in this work, we observe that the alternating projection algorithm with the EM-preconditioner outperforms significantly the EM-TV in all aspects including the convergence speed, the noise in the reconstructed images and the image quality. It also outperforms the nested EM-TV in the convergence speed while providing comparable image quality.
Incompressible SPH (ISPH) with fast Poisson solver on a GPU

NASA Astrophysics Data System (ADS)

Chow, Alex D.; Rogers, Benedict D.; Lind, Steven J.; Stansby, Peter K.

2018-05-01

This paper presents a fast incompressible SPH (ISPH) solver implemented to run entirely on a graphics processing unit (GPU) capable of simulating several millions of particles in three dimensions on a single GPU. The ISPH algorithm is implemented by converting the highly optimised open-source weakly-compressible SPH (WCSPH) code DualSPHysics to run ISPH on the GPU, combining it with the open-source linear algebra library ViennaCL for fast solutions of the pressure Poisson equation (PPE). Several challenges are addressed with this research: constructing a PPE matrix every timestep on the GPU for moving particles, optimising the limited GPU memory, and exploiting fast matrix solvers. The ISPH pressure projection algorithm is implemented as 4 separate stages, each with a particle sweep, including an algorithm for the population of the PPE matrix suitable for the GPU, and mixed precision storage methods. An accurate and robust ISPH boundary condition ideal for parallel processing is also established by adapting an existing WCSPH boundary condition for ISPH. A variety of validation cases are presented: an impulsively started plate, incompressible flow around a moving square in a box, and dambreaks (2-D and 3-D) which demonstrate the accuracy, flexibility, and speed of the methodology. Fragmentation of the free surface is shown to influence the performance of matrix preconditioners and therefore the PPE matrix solution time. The Jacobi preconditioner demonstrates robustness and reliability in the presence of fragmented flows. For a dambreak simulation, GPU speed ups demonstrate up to 10-18 times and 1.1-4.5 times compared to single-threaded and 16-threaded CPU run times respectively.
A preconditioner for the finite element computation of incompressible, nonlinear elastic deformations

NASA Astrophysics Data System (ADS)

Whiteley, J. P.

2017-10-01

Large, incompressible elastic deformations are governed by a system of nonlinear partial differential equations. The finite element discretisation of these partial differential equations yields a system of nonlinear algebraic equations that are usually solved using Newton's method. On each iteration of Newton's method, a linear system must be solved. We exploit the structure of the Jacobian matrix to propose a preconditioner, comprising two steps. The first step is the solution of a relatively small, symmetric, positive definite linear system using the preconditioned conjugate gradient method. This is followed by a small number of multigrid V-cycles for a larger linear system. Through the use of exemplar elastic deformations, the preconditioner is demonstrated to facilitate the iterative solution of the linear systems arising. The number of GMRES iterations required has only a very weak dependence on the number of degrees of freedom of the linear systems.
Preconditioning strategies for nonlinear conjugate gradient methods, based on quasi-Newton updates

NASA Astrophysics Data System (ADS)

Andrea, Caliciotti; Giovanni, Fasano; Massimo, Roma

2016-10-01

This paper reports two proposals of possible preconditioners for the Nonlinear Conjugate Gradient (NCG) method, in large scale unconstrained optimization. On one hand, the common idea of our preconditioners is inspired to L-BFGS quasi-Newton updates, on the other hand we aim at explicitly approximating in some sense the inverse of the Hessian matrix. Since we deal with large scale optimization problems, we propose matrix-free approaches where the preconditioners are built using symmetric low-rank updating formulae. Our distinctive new contributions rely on using information on the objective function collected as by-product of the NCG, at previous iterations. Broadly speaking, our first approach exploits the secant equation, in order to impose interpolation conditions on the objective function. In the second proposal we adopt and ad hoc modified-secant approach, in order to possibly guarantee some additional theoretical properties.
High performance computation of radiative transfer equation using the finite element method

NASA Astrophysics Data System (ADS)

Badri, M. A.; Jolivet, P.; Rousseau, B.; Favennec, Y.

2018-05-01

This article deals with an efficient strategy for numerically simulating radiative transfer phenomena using distributed computing. The finite element method alongside the discrete ordinate method is used for spatio-angular discretization of the monochromatic steady-state radiative transfer equation in an anisotropically scattering media. Two very different methods of parallelization, angular and spatial decomposition methods, are presented. To do so, the finite element method is used in a vectorial way. A detailed comparison of scalability, performance, and efficiency on thousands of processors is established for two- and three-dimensional heterogeneous test cases. Timings show that both algorithms scale well when using proper preconditioners. It is also observed that our angular decomposition scheme outperforms our domain decomposition method. Overall, we perform numerical simulations at scales that were previously unattainable by standard radiative transfer equation solvers.
Effective matrix-free preconditioning for the augmented immersed interface method

NASA Astrophysics Data System (ADS)

Xia, Jianlin; Li, Zhilin; Ye, Xin

2015-12-01

We present effective and efficient matrix-free preconditioning techniques for the augmented immersed interface method (AIIM). AIIM has been developed recently and is shown to be very effective for interface problems and problems on irregular domains. GMRES is often used to solve for the augmented variable(s) associated with a Schur complement A in AIIM that is defined along the interface or the irregular boundary. The efficiency of AIIM relies on how quickly the system for A can be solved. For some applications, there are substantial difficulties involved, such as the slow convergence of GMRES (particularly for free boundary and moving interface problems), and the inconvenience in finding a preconditioner (due to the situation that only the products of A and vectors are available). Here, we propose matrix-free structured preconditioning techniques for AIIM via adaptive randomized sampling, using only the products of A and vectors to construct a hierarchically semiseparable matrix approximation to A. Several improvements over existing schemes are shown so as to enhance the efficiency and also avoid potential instability. The significance of the preconditioners includes: (1) they do not require the entries of A or the multiplication of AT with vectors; (2) constructing the preconditioners needs only O (log ⁡ N) matrix-vector products and O (N) storage, where N is the size of A; (3) applying the preconditioners needs only O (N) flops; (4) they are very flexible and do not require any a priori knowledge of the structure of A. The preconditioners are observed to significantly accelerate the convergence of GMRES, with heuristical justifications of the effectiveness. Comprehensive tests on several important applications are provided, such as Navier-Stokes equations on irregular domains with traction boundary conditions, interface problems in incompressible flows, mixed boundary problems, and free boundary problems. The preconditioning techniques are also useful for several other problems and methods.
Toward an optimal solver for time-spectral fluid-dynamic and aeroelastic solutions on unstructured meshes

NASA Astrophysics Data System (ADS)

Mundis, Nathan L.; Mavriplis, Dimitri J.

2017-09-01

The time-spectral method applied to the Euler and coupled aeroelastic equations theoretically offers significant computational savings for purely periodic problems when compared to standard time-implicit methods. However, attaining superior efficiency with time-spectral methods over traditional time-implicit methods hinges on the ability rapidly to solve the large non-linear system resulting from time-spectral discretizations which become larger and stiffer as more time instances are employed or the period of the flow becomes especially short (i.e. the maximum resolvable wave-number increases). In order to increase the efficiency of these solvers, and to improve robustness, particularly for large numbers of time instances, the Generalized Minimal Residual Method (GMRES) is used to solve the implicit linear system over all coupled time instances. The use of GMRES as the linear solver makes time-spectral methods more robust, allows them to be applied to a far greater subset of time-accurate problems, including those with a broad range of harmonic content, and vastly improves the efficiency of time-spectral methods. In previous work, a wave-number independent preconditioner that mitigates the increased stiffness of the time-spectral method when applied to problems with large resolvable wave numbers has been developed. This preconditioner, however, directly inverts a large matrix whose size increases in proportion to the number of time instances. As a result, the computational time of this method scales as the cube of the number of time instances. In the present work, this preconditioner has been reworked to take advantage of an approximate-factorization approach that effectively decouples the spatial and temporal systems. Once decoupled, the time-spectral matrix can be inverted in frequency space, where it has entries only on the main diagonal and therefore can be inverted quite efficiently. This new GMRES/preconditioner combination is shown to be over an order of magnitude more efficient than the previous wave-number independent preconditioner for problems with large numbers of time instances and/or large reduced frequencies.
Efficient Iterative Methods Applied to the Solution of Transonic Flows

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.

1996-02-01

We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE PAGES

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...

2017-09-14

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Computational simulations of supersonic magnetohydrodynamic flow control, power and propulsion systems

NASA Astrophysics Data System (ADS)

Wan, Tian

This work is motivated by the lack of fully coupled computational tool that solves successfully the turbulent chemically reacting Navier-Stokes equation, the electron energy conservation equation and the electric current Poisson equation. In the present work, the abovementioned equations are solved in a fully coupled manner using fully implicit parallel GMRES methods. The system of Navier-Stokes equations are solved using a GMRES method with combined Schwarz and ILU(0) preconditioners. The electron energy equation and the electric current Poisson equation are solved using a GMRES method with combined SOR and Jacobi preconditioners. The fully coupled method has also been implemented successfully in an unstructured solver, US3D, and convergence test results were presented. This new method is shown two to five times faster than the original DPLR method. The Poisson solver is validated with analytic test problems. Then, four problems are selected; two of them are computed to explore the possibility of onboard MHD control and power generation, and the other two are simulation of experiments. First, the possibility of onboard reentry shock control by a magnetic field is explored. As part of a previous project, MHD power generation onboard a re-entry vehicle is also simulated. Then, the MHD acceleration experiments conducted at NASA Ames research center are simulated. Lastly, the MHD power generation experiments known as the HVEPS project are simulated. For code validation, the scramjet experiments at University of Queensland are simulated first. The generator section of the HVEPS test facility is computed then. The main conclusion is that the computational tool is accurate for different types of problems and flow conditions, and its accuracy and efficiency are necessary when the flow complexity increases.
Augmenting the one-shot framework by additional constraints

DOE PAGES

Bosse, Torsten

2016-05-12

The (multistep) one-shot method for design optimization problems has been successfully implemented for various applications. To this end, a slowly convergent primal fixed-point iteration of the state equation is augmented by an adjoint iteration and a corresponding preconditioned design update. In this paper we present a modification of the method that allows for additional equality constraints besides the usual state equation. Finally, a retardation analysis and the local convergence of the method in terms of necessary and sufficient conditions are given, which depend on key characteristics of the underlying problem and the quality of the utilized preconditioner.
Augmenting the one-shot framework by additional constraints

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bosse, Torsten

The (multistep) one-shot method for design optimization problems has been successfully implemented for various applications. To this end, a slowly convergent primal fixed-point iteration of the state equation is augmented by an adjoint iteration and a corresponding preconditioned design update. In this paper we present a modification of the method that allows for additional equality constraints besides the usual state equation. Finally, a retardation analysis and the local convergence of the method in terms of necessary and sufficient conditions are given, which depend on key characteristics of the underlying problem and the quality of the utilized preconditioner.
Layout optimization with algebraic multigrid methods

NASA Technical Reports Server (NTRS)

Regler, Hans; Ruede, Ulrich

1993-01-01

Finding the optimal position for the individual cells (also called functional modules) on the chip surface is an important and difficult step in the design of integrated circuits. This paper deals with the problem of relative placement, that is the minimization of a quadratic functional with a large, sparse, positive definite system matrix. The basic optimization problem must be augmented by constraints to inhibit solutions where cells overlap. Besides classical iterative methods, based on conjugate gradients (CG), we show that algebraic multigrid methods (AMG) provide an interesting alternative. For moderately sized examples with about 10000 cells, AMG is already competitive with CG and is expected to be superior for larger problems. Besides the classical 'multiplicative' AMG algorithm where the levels are visited sequentially, we propose an 'additive' variant of AMG where levels may be treated in parallel and that is suitable as a preconditioner in the CG algorithm.
A 3D finite-difference BiCG iterative solver with the Fourier-Jacobi preconditioner for the anisotropic EIT/EEG forward problem.

PubMed

Turovets, Sergei; Volkov, Vasily; Zherdetsky, Aleksej; Prakonina, Alena; Malony, Allen D

2014-01-01

The Electrical Impedance Tomography (EIT) and electroencephalography (EEG) forward problems in anisotropic inhomogeneous media like the human head belongs to the class of the three-dimensional boundary value problems for elliptic equations with mixed derivatives. We introduce and explore the performance of several new promising numerical techniques, which seem to be more suitable for solving these problems. The proposed numerical schemes combine the fictitious domain approach together with the finite-difference method and the optimally preconditioned Conjugate Gradient- (CG-) type iterative method for treatment of the discrete model. The numerical scheme includes the standard operations of summation and multiplication of sparse matrices and vector, as well as FFT, making it easy to implement and eligible for the effective parallel implementation. Some typical use cases for the EIT/EEG problems are considered demonstrating high efficiency of the proposed numerical technique.
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics

NASA Astrophysics Data System (ADS)

Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.

2003-09-01

Multiconjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wave-front control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10-2 Hz, i.e., 4-5 orders of magnitude lower than the typical 103 Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
The numerical solution of the Helmholtz equation for wave propagation problems in underwater acoustics

NASA Technical Reports Server (NTRS)

Bayliss, A.; Goldstein, C. I.; Turkel, E.

1984-01-01

The Helmholtz Equation (-delta-K(2)n(2))u=0 with a variable index of refraction, n, and a suitable radiation condition at infinity serves as a model for a wide variety of wave propagation problems. A numerical algorithm was developed and a computer code implemented that can effectively solve this equation in the intermediate frequency range. The equation is discretized using the finite element method, thus allowing for the modeling of complicated geometrices (including interfaces) and complicated boundary conditions. A global radiation boundary condition is imposed at the far field boundary that is exact for an arbitrary number of propagating modes. The resulting large, non-selfadjoint system of linear equations with indefinite symmetric part is solved using the preconditioned conjugate gradient method applied to the normal equations. A new preconditioner is developed based on the multigrid method. This preconditioner is vectorizable and is extremely effective over a wide range of frequencies provided the number of grid levels is reduced for large frequencies. A heuristic argument is given that indicates the superior convergence properties of this preconditioner.
An extended GS method for dense linear systems

NASA Astrophysics Data System (ADS)

Niki, Hiroshi; Kohno, Toshiyuki; Abe, Kuniyoshi

2009-09-01

Davey and Rosindale [K. Davey, I. Rosindale, An iterative solution scheme for systems of boundary element equations, Internat. J. Numer. Methods Engrg. 37 (1994) 1399-1411] derived the GSOR method, which uses an upper triangular matrix [Omega] in order to solve dense linear systems. By applying functional analysis, the authors presented an expression for the optimum [Omega]. Moreover, Davey and Bounds [K. Davey, S. Bounds, A generalized SOR method for dense linear systems of boundary element equations, SIAM J. Comput. 19 (1998) 953-967] also introduced further interesting results. In this note, we employ a matrix analysis approach to investigate these schemes, and derive theorems that compare these schemes with existing preconditioners for dense linear systems. We show that the convergence rate of the Gauss-Seidel method with preconditioner PG is superior to that of the GSOR method. Moreover, we define some splittings associated with the iterative schemes. Some numerical examples are reported to confirm the theoretical analysis. We show that the EGS method with preconditioner produces an extremely small spectral radius in comparison with the other schemes considered.
A Polynomial-Based Nonlinear Least Squares Optimized Preconditioner for Continuous and Discontinuous Element-Based Discretizations of the Euler Equations

DTIC Science & Technology

2014-01-01

system (here using left- preconditioning ) (KÃ)x = Kb̃, (3.1) where K is a low-order polynomial in Ã given by K = s(Ã) = m∑ i=0 kiÃ i, (3.2) and has a... system with a complex spectrum, region E in the complex plane must be some convex form (e.g., an ellipse or polygon) that approximately encloses the...preconditioners with p = 2 and p = 20 on the spectrum of the preconditioned system matrices KÃ and KH̃ for both CG Schur-complement form and DG form cases

Higher Order, Hybrid BEM/FEM Methods Applied to Antenna Modeling

NASA Technical Reports Server (NTRS)

Fink, P. W.; Wilton, D. R.; Dobbins, J. A.

2002-01-01

In this presentation, the authors address topics relevant to higher order modeling using hybrid BEM/FEM formulations. The first of these is the limitation on convergence rates imposed by geometric modeling errors in the analysis of scattering by a dielectric sphere. The second topic is the application of an Incomplete LU Threshold (ILUT) preconditioner to solve the linear system resulting from the BEM/FEM formulation. The final tOpic is the application of the higher order BEM/FEM formulation to antenna modeling problems. The authors have previously presented work on the benefits of higher order modeling. To achieve these benefits, special attention is required in the integration of singular and near-singular terms arising in the surface integral equation. Several methods for handling these terms have been presented. It is also well known that achieving he high rates of convergence afforded by higher order bases may als'o require the employment of higher order geometry models. A number of publications have described the use of quadratic elements to model curved surfaces. The authors have shown in an EFIE formulation, applied to scattering by a PEC .sphere, that quadratic order elements may be insufficient to prevent the domination of modeling errors. In fact, on a PEC sphere with radius r = 0.58 Lambda(sub 0), a quartic order geometry representation was required to obtain a convergence benefi.t from quadratic bases when compared to the convergence rate achieved with linear bases. Initial trials indicate that, for a dielectric sphere of the same radius, - requirements on the geometry model are not as severe as for the PEC sphere. The authors will present convergence results for higher order bases as a function of the geometry model order in the hybrid BEM/FEM formulation applied to dielectric spheres. It is well known that the system matrix resulting from the hybrid BEM/FEM formulation is ill -conditioned. For many real applications, a good preconditioner is required to obtain usable convergence from an iterative solver. The authors have examined the use of an Incomplete LU Threshold (ILUT) preconditioner . to solver linear systems stemming from higher order BEM/FEM formulations in 2D scattering problems. Although the resulting preconditioner provided aD excellent approximation to the system inverse, its size in terms of non-zero entries represented only a modest improvement when compared with the fill-in associated with a sparse direct solver. Furthermore, the fill-in of the preconditioner could not be substantially reduced without the occurrence of instabilities. In addition to the results for these 2D problems, the authors will present iterative solution data from the application of the ILUT preconditioner to 3D problems.
Development of the PARVMEC Code for Rapid Analysis of 3D MHD Equilibrium

NASA Astrophysics Data System (ADS)

Seal, Sudip; Hirshman, Steven; Cianciosa, Mark; Wingen, Andreas; Unterberg, Ezekiel; Wilcox, Robert; ORNL Collaboration

2015-11-01

The VMEC three-dimensional (3D) MHD equilibrium has been used extensively for designing stellarator experiments and analyzing experimental data in such strongly 3D systems. Recent applications of VMEC include 2D systems such as tokamaks (in particular, the D3D experiment), where application of very small (delB/B ~ 10-3) 3D resonant magnetic field perturbations render the underlying assumption of axisymmetry invalid. In order to facilitate the rapid analysis of such equilibria (for example, for reconstruction purposes), we have undertaken the task of parallelizing the VMEC code (PARVMEC) to produce a scalable and temporally rapidly convergent equilibrium code for use on parallel distributed memory platforms. The parallelization task naturally splits into three distinct parts 1) radial surfaces in the fixed-boundary part of the calculation; 2) two 2D angular meshes needed to compute the Green's function integrals over the plasma boundary for the free-boundary part of the code; and 3) block tridiagonal matrix needed to compute the full (3D) pre-conditioner near the final equilibrium state. Preliminary results show that scalability is achieved for tasks 1 and 3, with task 2 still nearing completion. The impact of this work on the rapid reconstruction of D3D plasmas using PARVMEC in the V3FIT code will be discussed. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale

NASA Astrophysics Data System (ADS)

Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.

2016-12-01

Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics.

PubMed

Gilles, Luc; Ellerbroek, Brent L; Vogel, Curtis R

2003-09-10

Multiconjugate adaptive optics (MCAO) systems with 10(4)-10(5) degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 10(4) actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10(-2) Hz, i.e., 4-5 orders of magnitude lower than the typical 10(3) Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
Iterative methods for mixed finite element equations

NASA Technical Reports Server (NTRS)

Nakazawa, S.; Nagtegaal, J. C.; Zienkiewicz, O. C.

1985-01-01

Iterative strategies for the solution of indefinite system of equations arising from the mixed finite element method are investigated in this paper with application to linear and nonlinear problems in solid and structural mechanics. The augmented Hu-Washizu form is derived, which is then utilized to construct a family of iterative algorithms using the displacement method as the preconditioner. Two types of iterative algorithms are implemented. Those are: constant metric iterations which does not involve the update of preconditioner; variable metric iterations, in which the inverse of the preconditioning matrix is updated. A series of numerical experiments is conducted to evaluate the numerical performance with application to linear and nonlinear model problems.
Implicit solvers for unstructured meshes

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Mavriplis, Dimitri J.

1991-01-01

Implicit methods for unstructured mesh computations are developed and tested. The approximate system which arises from the Newton-linearization of the nonlinear evolution operator is solved by using the preconditioned generalized minimum residual technique. These different preconditioners are investigated: the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over-relaxation (SSOR). The preconditioners have been optimized to have good vectorization properties. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also investigated. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
Analysis of Preconditioning and Relaxation Operators for the Discontinuous Galerkin Method Applied to Diffusion

NASA Technical Reports Server (NTRS)

Atkins, H. L.; Shu, Chi-Wang

2001-01-01

The explicit stability constraint of the discontinuous Galerkin method applied to the diffusion operator decreases dramatically as the order of the method is increased. Block Jacobi and block Gauss-Seidel preconditioner operators are examined for their effectiveness at accelerating convergence. A Fourier analysis for methods of order 2 through 6 reveals that both preconditioner operators bound the eigenvalues of the discrete spatial operator. Additionally, in one dimension, the eigenvalues are grouped into two or three regions that are invariant with order of the method. Local relaxation methods are constructed that rapidly damp high frequencies for arbitrarily large time step.
A new preconditioner update strategy for the solution of sequences of linear systems in structural mechanics: application to saddle point problems in elasticity

NASA Astrophysics Data System (ADS)

Mercier, Sylvain; Gratton, Serge; Tardieu, Nicolas; Vasseur, Xavier

2017-12-01

Many applications in structural mechanics require the numerical solution of sequences of linear systems typically issued from a finite element discretization of the governing equations on fine meshes. The method of Lagrange multipliers is often used to take into account mechanical constraints. The resulting matrices then exhibit a saddle point structure and the iterative solution of such preconditioned linear systems is considered as challenging. A popular strategy is then to combine preconditioning and deflation to yield an efficient method. We propose an alternative that is applicable to the general case and not only to matrices with a saddle point structure. In this approach, we consider to update an existing algebraic or application-based preconditioner, using specific available information exploiting the knowledge of an approximate invariant subspace or of matrix-vector products. The resulting preconditioner has the form of a limited memory quasi-Newton matrix and requires a small number of linearly independent vectors. Numerical experiments performed on three large-scale applications in elasticity highlight the relevance of the new approach. We show that the proposed method outperforms the deflation method when considering sequences of linear systems with varying matrices.
Computational Challenges of 3D Radiative Transfer in Atmospheric Models

NASA Astrophysics Data System (ADS)

Jakub, Fabian; Bernhard, Mayer

2017-04-01

The computation of radiative heating and cooling rates is one of the most expensive components in todays atmospheric models. The high computational cost stems not only from the laborious integration over a wide range of the electromagnetic spectrum but also from the fact that solving the integro-differential radiative transfer equation for monochromatic light is already rather involved. This lead to the advent of numerous approximations and parameterizations to reduce the cost of the solver. One of the most prominent one is the so called independent pixel approximations (IPA) where horizontal energy transfer is neglected whatsoever and radiation may only propagate in the vertical direction (1D). Recent studies implicate that the IPA introduces significant errors in high resolution simulations and affects the evolution and development of convective systems. However, using fully 3D solvers such as for example MonteCarlo methods is not even on state of the art supercomputers feasible. The parallelization of atmospheric models is often realized by a horizontal domain decomposition, and hence, horizontal transfer of energy necessitates communication. E.g. a cloud's shadow at a low zenith angle will cast a long shadow and potentially needs to communication through a multitude of processors. Especially light in the solar spectral range may travel long distances through the atmosphere. Concerning highly parallel simulations, it is vital that 3D radiative transfer solvers put a special emphasis on parallel scalability. We will present an introduction to intricacies computing 3D radiative heating and cooling rates as well as report on the parallel performance of the TenStream solver. The TenStream is a 3D radiative transfer solver using the PETSc framework to iteratively solve a set of partial differential equation. We investigate two matrix preconditioners, (a) geometric algebraic multigrid preconditioning(MG+GAMG) and (b) block Jacobi incomplete LU (ILU) factorization. The TenStream solver is tested for up to 4096 cores and shows a parallel scaling efficiency of 80-90% on various supercomputers.
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
Coupled Modeling of Hydrodynamics and Sound in Coastal Ocean for Renewable Ocean Energy Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Long, Wen; Jung, Ki Won; Yang, Zhaoqing

An underwater sound model was developed to simulate sound propagation from marine and hydrokinetic energy (MHK) devices or offshore wind (OSW) energy platforms. Finite difference methods were developed to solve the 3D Helmholtz equation for sound propagation in the coastal environment. A 3D sparse matrix solver with complex coefficients was formed for solving the resulting acoustic pressure field. The Complex Shifted Laplacian Preconditioner (CSLP) method was applied to solve the matrix system iteratively with MPI parallelization using a high performance cluster. The sound model was then coupled with the Finite Volume Community Ocean Model (FVCOM) for simulating sound propagation generatedmore » by human activities, such as construction of OSW turbines or tidal stream turbine operations, in a range-dependent setting. As a proof of concept, initial validation of the solver is presented for two coastal wedge problems. This sound model can be useful for evaluating impacts on marine mammals due to deployment of MHK devices and OSW energy platforms.« less
Multilevel acceleration of scattering-source iterations with application to electron transport

DOE PAGES

Drumm, Clif; Fan, Wesley

2017-08-18

Acceleration/preconditioning strategies available in the SCEPTRE radiation transport code are described. A flexible transport synthetic acceleration (TSA) algorithm that uses a low-order discrete-ordinates (S N) or spherical-harmonics (P N) solve to accelerate convergence of a high-order S N source-iteration (SI) solve is described. Convergence of the low-order solves can be further accelerated by applying off-the-shelf incomplete-factorization or algebraic-multigrid methods. Also available is an algorithm that uses a generalized minimum residual (GMRES) iterative method rather than SI for convergence, using a parallel sweep-based solver to build up a Krylov subspace. TSA has been applied as a preconditioner to accelerate the convergencemore » of the GMRES iterations. The methods are applied to several problems involving electron transport and problems with artificial cross sections with large scattering ratios. These methods were compared and evaluated by considering material discontinuities and scattering anisotropy. Observed accelerations obtained are highly problem dependent, but speedup factors around 10 have been observed in typical applications.« less
Large Scale, High Resolution, Mantle Dynamics Modeling

NASA Astrophysics Data System (ADS)

Geenen, T.; Berg, A. V.; Spakman, W.

2007-12-01

To model the geodynamic evolution of plate convergence, subduction and collision and to allow for a connection to various types of observational data, geophysical, geodetical and geological, we developed a 4D (space-time) numerical mantle convection code. The model is based on a spherical 3D Eulerian fem model, with quadratic elements, on top of which we constructed a 3D Lagrangian particle in cell(PIC) method. We use the PIC method to transport material properties and to incorporate a viscoelastic rheology. Since capturing small scale processes associated with localization phenomena require a high resolution, we spend a considerable effort on implementing solvers suitable to solve for models with over 100 million degrees of freedom. We implemented Additive Schwartz type ILU based methods in combination with a Krylov solver, GMRES. However we found that for problems with over 500 thousend degrees of freedom the convergence of the solver degraded severely. This observation is known from the literature [Saad, 2003] and results from the local character of the ILU preconditioner resulting in a poor approximation of the inverse of A for large A. The size of A for which ILU is no longer usable depends on the condition of A and on the amount of fill in allowed for the ILU preconditioner. We found that for our problems with over 5×105 degrees of freedom convergence became to slow to solve the system within an acceptable amount of walltime, one minute, even when allowing for considerable amount of fill in. We also implemented MUMPS and found good scaling results for problems up to 107 degrees of freedom for up to 32 CPU¡¯s. For problems with over 100 million degrees of freedom we implemented Algebraic Multigrid type methods (AMG) from the ML library [Sala, 2006]. Since multigrid methods are most effective for single parameter problems, we rebuild our model to use the SIMPLE method in the Stokes solver [Patankar, 1980]. We present scaling results from these solvers for 3D spherical models. We also applied the above mentioned method to a high resolution (~ 1 km) 2D mantle convection model with temperature, pressure and phase dependent rheology including several phase transitions. We focus on a model of a subducting lithospheric slab which is subject to strong folding at the bottom of the mantle's D" region which includes the postperovskite phase boundary. For a detailed description of this model we refer to poster [Mantel convection models of the D" region, U17] [Saad, 2003] Saad, Y. (2003). Iterative methods for sparse linear systems. [Sala, 2006] Sala. M (2006) An Object-Oriented Framework for the Development of Scalable Parallel Multilevel Preconditioners. ACM Transactions on Mathematical Software, 32 (3), 2006 [Patankar, 1980] Patankar, S. V.(1980) Numerical Heat Transfer and Fluid Flow, Hemisphere, Washington.
Solving groundwater flow problems by conjugate-gradient methods and the strongly implicit procedure

USGS Publications Warehouse

Hill, Mary C.

1990-01-01

The performance of the preconditioned conjugate-gradient method with three preconditioners is compared with the strongly implicit procedure (SIP) using a scalar computer. The preconditioners considered are the incomplete Cholesky (ICCG) and the modified incomplete Cholesky (MICCG), which require the same computer storage as SIP as programmed for a problem with a symmetric matrix, and a polynomial preconditioner (POLCG), which requires less computer storage than SIP. Although POLCG is usually used on vector computers, it is included here because of its small storage requirements. In this paper, published comparisons of the solvers are evaluated, all four solvers are compared for the first time, and new test cases are presented to provide a more complete basis by which the solvers can be judged for typical groundwater flow problems. Based on nine test cases, the following conclusions are reached: (1) SIP is actually as efficient as ICCG for some of the published, linear, two-dimensional test cases that were reportedly solved much more efficiently by ICCG; (2) SIP is more efficient than other published comparisons would indicate when common convergence criteria are used; and (3) for problems that are three-dimensional, nonlinear, or both, and for which common convergence criteria are used, SIP is often more efficient than ICCG, and is sometimes more efficient than MICCG.
A SEMI-LAGRANGIAN TWO-LEVEL PRECONDITIONED NEWTON-KRYLOV SOLVER FOR CONSTRAINED DIFFEOMORPHIC IMAGE REGISTRATION.

PubMed

Mang, Andreas; Biros, George

2017-01-01

We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations. We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20 × speedup for a two dimensional, real world multi-subject medical image registration problem.
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

DOE PAGES

Grayver, Alexander V.; Kolev, Tzanio V.

2015-11-01

Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grayver, Alexander V.; Kolev, Tzanio V.

Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Implicit solvers for unstructured meshes

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Mavriplis, Dimitri J.

1991-01-01

Implicit methods were developed and tested for unstructured mesh computations. The approximate system which arises from the Newton linearization of the nonlinear evolution operator is solved by using the preconditioned GMRES (Generalized Minimum Residual) technique. Three different preconditioners were studied, namely, the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over relaxation (SSOR). The preconditioners were optimized to have good vectorization properties. SSOR and ILU were also studied as iterative schemes. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also studied. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
Wide-angle full-vector beam propagation method based on an alternating direction implicit preconditioner

NASA Astrophysics Data System (ADS)

Chui, Siu Lit; Lu, Ya Yan

2004-03-01

Wide-angle full-vector beam propagation methods (BPMs) for three-dimensional wave-guiding structures can be derived on the basis of rational approximants of a square root operator or its exponential (i.e., the one-way propagator). While the less accurate BPM based on the slowly varying envelope approximation can be efficiently solved by the alternating direction implicit (ADI) method, the wide-angle variants involve linear systems that are more difficult to handle. We present an efficient solver for these linear systems that is based on a Krylov subspace method with an ADI preconditioner. The resulting wide-angle full-vector BPM is used to simulate the propagation of wave fields in a Y branch and a taper.
Wide-angle full-vector beam propagation method based on an alternating direction implicit preconditioner.

PubMed

Chui, Siu Lit; Lu, Ya Yan

2004-03-01

Wide-angle full-vector beam propagation methods (BPMs) for three-dimensional wave-guiding structures can be derived on the basis of rational approximants of a square root operator or its exponential (i.e., the one-way propagator). While the less accurate BPM based on the slowly varying envelope approximation can be efficiently solved by the alternating direction implicit (ADI) method, the wide-angle variants involve linear systems that are more difficult to handle. We present an efficient solver for these linear systems that is based on a Krylov subspace method with an ADI preconditioner. The resulting wide-angle full-vector BPM is used to simulate the propagation of wave fields in a Y branch and a taper.

Distributed Memory Parallel Computing with SEAWAT

NASA Astrophysics Data System (ADS)

Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

2017-12-01

Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources. Speed-ups up to 40 were obtained with the new PKS solver.
Preconditioned conjugate gradient methods for the Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing

1994-01-01

A preconditioned Krylov subspace method (GMRES) is used to solve the linear systems of equations formed at each time-integration step of the unsteady, two-dimensional, compressible Navier-Stokes equations of fluid flow. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux-split formulation. Several preconditioning techniques are investigated to enhance the efficiency and convergence rate of the implicit solver based on the GMRES algorithm. The superiority of the new solver is established by comparisons with a conventional implicit solver, namely line Gauss-Seidel relaxation (LGSR). Computational test results for low-speed (incompressible flow over a backward-facing step at Mach 0.1), transonic flow (trailing edge flow in a transonic turbine cascade), and hypersonic flow (shock-on-shock interactions on a cylindrical leading edge at Mach 6.0) are presented. For the Mach 0.1 case, overall speedup factors of up to 17 (in terms of time-steps) and 15 (in terms of CPU time on a CRAY-YMP/8) are found in favor of the preconditioned GMRES solver, when compared with the LGSR solver. The corresponding speedup factors for the transonic flow case are 17 and 23, respectively. The hypersonic flow case shows slightly lower speedup factors of 9 and 13, respectively. The study of preconditioners conducted in this research reveals that a new LUSGS-type preconditioner is much more efficient than a conventional incomplete LU-type preconditioner.
Teko: A block preconditioning capability with concrete example applications in Navier--Stokes and MHD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cyr, Eric C.; Shadid, John N.; Tuminaro, Raymond S.

This study describes the design of Teko, an object-oriented C++ library for implementing advanced block preconditioners. Mathematical design criteria that elucidate the needs of block preconditioning libraries and techniques are explained and shown to motivate the structure of Teko. For instance, a principal design choice was for Teko to strongly reflect the mathematical statement of the preconditioners to reduce development burden and permit focus on the numerics. Additional mechanisms are explained that provide a pathway to developing an optimized production capable block preconditioning capability with Teko. Finally, Teko is demonstrated on fluid flow and magnetohydrodynamics applications. In addition to highlightingmore » the features of the Teko library, these new results illustrate the effectiveness of recent preconditioning developments applied to advanced discretization approaches.« less
The preconditioned Gauss-Seidel method faster than the SOR method

NASA Astrophysics Data System (ADS)

Niki, Hiroshi; Kohno, Toshiyuki; Morimoto, Munenori

2008-09-01

In recent years, a number of preconditioners have been applied to linear systems [A.D. Gunawardena, S.K. Jain, L. Snyder, Modified iterative methods for consistent linear systems, Linear Algebra Appl. 154-156 (1991) 123-143; T. Kohno, H. Kotakemori, H. Niki, M. Usui, Improving modified Gauss-Seidel method for Z-matrices, Linear Algebra Appl. 267 (1997) 113-123; H. Kotakemori, K. Harada, M. Morimoto, H. Niki, A comparison theorem for the iterative method with the preconditioner (I+Smax), J. Comput. Appl. Math. 145 (2002) 373-378; H. Kotakemori, H. Niki, N. Okamoto, Accelerated iteration method for Z-matrices, J. Comput. Appl. Math. 75 (1996) 87-97; M. Usui, H. Niki, T.Kohno, Adaptive Gauss-Seidel method for linear systems, Internat. J. Comput. Math. 51(1994)119-125 [10
Teko: A block preconditioning capability with concrete example applications in Navier--Stokes and MHD

DOE PAGES

Cyr, Eric C.; Shadid, John N.; Tuminaro, Raymond S.

2016-10-27

This study describes the design of Teko, an object-oriented C++ library for implementing advanced block preconditioners. Mathematical design criteria that elucidate the needs of block preconditioning libraries and techniques are explained and shown to motivate the structure of Teko. For instance, a principal design choice was for Teko to strongly reflect the mathematical statement of the preconditioners to reduce development burden and permit focus on the numerics. Additional mechanisms are explained that provide a pathway to developing an optimized production capable block preconditioning capability with Teko. Finally, Teko is demonstrated on fluid flow and magnetohydrodynamics applications. In addition to highlightingmore » the features of the Teko library, these new results illustrate the effectiveness of recent preconditioning developments applied to advanced discretization approaches.« less
Scalable algorithms for three-field mixed finite element coupled poromechanics

NASA Astrophysics Data System (ADS)

Castelletto, Nicola; White, Joshua A.; Ferronato, Massimiliano

2016-12-01

We introduce a class of block preconditioners for accelerating the iterative solution of coupled poromechanics equations based on a three-field formulation. The use of a displacement/velocity/pressure mixed finite-element method combined with a first order backward difference formula for the approximation of time derivatives produces a sequence of linear systems with a 3 × 3 unsymmetric and indefinite block matrix. The preconditioners are obtained by approximating the two-level Schur complement with the aid of physically-based arguments that can be also generalized in a purely algebraic approach. A theoretical and experimental analysis is presented that provides evidence of the robustness, efficiency and scalability of the proposed algorithm. The performance is also assessed for a real-world challenging consolidation experiment of a shallow formation.
A survey of packages for large linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Kesheng; Milne, Brent

2000-02-11

This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE PAGES

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

2015-12-01

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chacon, Luis; Stanier, Adam John

Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
Nonlinearly preconditioned semismooth Newton methods for variational inequality solution of two-phase flow in porous media

NASA Astrophysics Data System (ADS)

Yang, Haijian; Sun, Shuyu; Yang, Chao

2017-03-01

Most existing methods for solving two-phase flow problems in porous media do not take the physically feasible saturation fractions between 0 and 1 into account, which often destroys the numerical accuracy and physical interpretability of the simulation. To calculate the solution without the loss of this basic requirement, we introduce a variational inequality formulation of the saturation equilibrium with a box inequality constraint, and use a conservative finite element method for the spatial discretization and a backward differentiation formula with adaptive time stepping for the temporal integration. The resulting variational inequality system at each time step is solved by using a semismooth Newton algorithm. To accelerate the Newton convergence and improve the robustness, we employ a family of adaptive nonlinear elimination methods as a nonlinear preconditioner. Some numerical results are presented to demonstrate the robustness and efficiency of the proposed algorithm. A comparison is also included to show the superiority of the proposed fully implicit approach over the classical IMplicit Pressure-Explicit Saturation (IMPES) method in terms of the time step size and the total execution time measured on a parallel computer.
Solving coupled groundwater flow systems using a Jacobian Free Newton Krylov method

NASA Astrophysics Data System (ADS)

Mehl, S.

2012-12-01

Jacobian Free Newton Kyrlov (JFNK) methods can have several advantages for simulating coupled groundwater flow processes versus conventional methods. Conventional methods are defined here as those based on an iterative coupling (rather than a direct coupling) and/or that use Picard iteration rather than Newton iteration. In an iterative coupling, the systems are solved separately, coupling information is updated and exchanged between the systems, and the systems are re-solved, etc., until convergence is achieved. Trusted simulators, such as Modflow, are based on these conventional methods of coupling and work well in many cases. An advantage of the JFNK method is that it only requires calculation of the residual vector of the system of equations and thus can make use of existing simulators regardless of how the equations are formulated. This opens the possibility of coupling different process models via augmentation of a residual vector by each separate process, which often requires substantially fewer changes to the existing source code than if the processes were directly coupled. However, appropriate perturbation sizes need to be determined for accurate approximations of the Frechet derivative, which is not always straightforward. Furthermore, preconditioning is necessary for reasonable convergence of the linear solution required at each Kyrlov iteration. Existing preconditioners can be used and applied separately to each process which maximizes use of existing code and robust preconditioners. In this work, iteratively coupled parent-child local grid refinement models of groundwater flow and groundwater flow models with nonlinear exchanges to streams are used to demonstrate the utility of the JFNK approach for Modflow models. Use of incomplete Cholesky preconditioners with various levels of fill are examined on a suite of nonlinear and linear models to analyze the effect of the preconditioner. Comparisons of convergence and computer simulation time are made using conventional iteratively coupled methods and those based on Picard iteration to those formulated with JFNK to gain insights on the types of nonlinearities and system features that make one approach advantageous. Results indicate that nonlinearities associated with stream/aquifer exchanges are more problematic than those resulting from unconfined flow.
AztecOO user guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heroux, Michael Allen

2004-07-01

The Trilinos{trademark} Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. AztecOO{trademark} is a package within Trilinos that enables the use of the Aztec solver library [19] with Epetra{trademark} [13] objects. AztecOO provides access to Aztec preconditioners and solvers by implementing the Aztec 'matrix-free' interface using Epetra. While Aztec is written in C and procedure-oriented, AztecOO is written in C++ and is object-oriented. In addition to providing access to Aztec capabilities, AztecOO also provides some signficant new functionality. In particular it provides an extensible status testing capability that allows expression of sophisticatedmore » stopping criteria as is needed in production use of iterative solvers. AztecOO also provides mechanisms for using Ifpack [2], ML [20] and AztecOO itself as preconditioners.« less
On adaptive weighted polynomial preconditioning for Hermitian positive definite matrices

NASA Technical Reports Server (NTRS)

Fischer, Bernd; Freund, Roland W.

1992-01-01

The conjugate gradient algorithm for solving Hermitian positive definite linear systems is usually combined with preconditioning in order to speed up convergence. In recent years, there has been a revival of polynomial preconditioning, motivated by the attractive features of the method on modern architectures. Standard techniques for choosing the preconditioning polynomial are based only on bounds for the extreme eigenvalues. Here a different approach is proposed, which aims at adapting the preconditioner to the eigenvalue distribution of the coefficient matrix. The technique is based on the observation that good estimates for the eigenvalue distribution can be derived after only a few steps of the Lanczos process. This information is then used to construct a weight function for a suitable Chebyshev approximation problem. The solution of this problem yields the polynomial preconditioner. In particular, we investigate the use of Bernstein-Szego weights.
Applicability of Kerker preconditioning scheme to the self-consistent density functional theory calculations of inhomogeneous systems

NASA Astrophysics Data System (ADS)

Zhou, Yuzhi; Wang, Han; Liu, Yu; Gao, Xingyu; Song, Haifeng

2018-03-01

The Kerker preconditioner, based on the dielectric function of homogeneous electron gas, is designed to accelerate the self-consistent field (SCF) iteration in the density functional theory calculations. However, a question still remains regarding its applicability to the inhomogeneous systems. We develop a modified Kerker preconditioning scheme which captures the long-range screening behavior of inhomogeneous systems and thus improves the SCF convergence. The effectiveness and efficiency is shown by the tests on long-z slabs of metals, insulators, and metal-insulator contacts. For situations without a priori knowledge of the system, we design the a posteriori indicator to monitor if the preconditioner has suppressed charge sloshing during the iterations. Based on the a posteriori indicator, we demonstrate two schemes of the self-adaptive configuration for the SCF iteration.
A Note on Substructuring Preconditioning for Nonconforming Finite Element Approximations of Second Order Elliptic Problems

NASA Technical Reports Server (NTRS)

Maliassov, Serguei

1996-01-01

In this paper an algebraic substructuring preconditioner is considered for nonconforming finite element approximations of second order elliptic problems in 3D domains with a piecewise constant diffusion coefficient. Using a substructuring idea and a block Gauss elimination, part of the unknowns is eliminated and the Schur complement obtained is preconditioned by a spectrally equivalent very sparse matrix. In the case of quasiuniform tetrahedral mesh an appropriate algebraic multigrid solver can be used to solve the problem with this matrix. Explicit estimates of condition numbers and implementation algorithms are established for the constructed preconditioner. It is shown that the condition number of the preconditioned matrix does not depend on either the mesh step size or the jump of the coefficient. Finally, numerical experiments are presented to illustrate the theory being developed.
Optimal preconditioning of lattice Boltzmann methods

NASA Astrophysics Data System (ADS)

Izquierdo, Salvador; Fueyo, Norberto

2009-09-01

A preconditioning technique to accelerate the simulation of steady-state problems using the single-relaxation-time (SRT) lattice Boltzmann (LB) method was first proposed by Guo et al. [Z. Guo, T. Zhao, Y. Shi, Preconditioned lattice-Boltzmann method for steady flows, Phys. Rev. E 70 (2004) 066706-1]. The key idea in this preconditioner is to modify the equilibrium distribution function in such a way that, by means of a Chapman-Enskog expansion, a time-derivative preconditioner of the Navier-Stokes (NS) equations is obtained. In the present contribution, the optimal values for the free parameter γ of this preconditioner are searched both numerically and theoretically; the later with the aid of linear-stability analysis and with the condition number of the system of NS equations. The influence of the collision operator, single- versus multiple-relaxation-times (MRT), is also studied. Three steady-state laminar test cases are used for validation, namely: the two-dimensional lid-driven cavity, a two-dimensional microchannel and the three-dimensional backward-facing step. Finally, guidelines are suggested for an a priori definition of optimal preconditioning parameters as a function of the Reynolds and Mach numbers. The new optimally preconditioned MRT method derived is shown to improve, simultaneously, the rate of convergence, the stability and the accuracy of the lattice Boltzmann simulations, when compared to the non-preconditioned methods and to the optimally preconditioned SRT one. Additionally, direct time-derivative preconditioning of the LB equation is also studied.
Preconditioned conjugate-gradient methods for low-speed flow calculations

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing

1993-01-01

An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations is integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the Lower-Upper Successive Symmetric Over-Relaxation iterative scheme is more efficient than a preconditioner based on Incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional Line Gauss-Seidel Relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
Preconditioned Conjugate Gradient methods for low speed flow calculations

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing

1993-01-01

An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations are integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and the convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the lower-upper (L-U)-successive symmetric over-relaxation iterative scheme is more efficient than a preconditioner based on incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional line Gauss-Seidel relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
Multisource least-squares reverse-time migration with structure-oriented filtering

NASA Astrophysics Data System (ADS)

Fan, Jing-Wen; Li, Zhen-Chun; Zhang, Kai; Zhang, Min; Liu, Xue-Tong

2016-09-01

The technology of simultaneous-source acquisition of seismic data excited by several sources can significantly improve the data collection efficiency. However, direct imaging of simultaneous-source data or blended data may introduce crosstalk noise and affect the imaging quality. To address this problem, we introduce a structure-oriented filtering operator as preconditioner into the multisource least-squares reverse-time migration (LSRTM). The structure-oriented filtering operator is a nonstationary filter along structural trends that suppresses crosstalk noise while maintaining structural information. The proposed method uses the conjugate-gradient method to minimize the mismatch between predicted and observed data, while effectively attenuating the interference noise caused by exciting several sources simultaneously. Numerical experiments using synthetic data suggest that the proposed method can suppress the crosstalk noise and produce highly accurate images.

A Spectral Element Discretisation on Unstructured Triangle / Tetrahedral Meshes for Elastodynamics

NASA Astrophysics Data System (ADS)

May, Dave A.; Gabriel, Alice-A.

2017-04-01

The spectral element method (SEM) defined over quadrilateral and hexahedral element geometries has proven to be a fast, accurate and scalable approach to study wave propagation phenomena. In the context of regional scale seismology and or simulations incorporating finite earthquake sources, the geometric restrictions associated with hexahedral elements can limit the applicability of the classical quad./hex. SEM. Here we describe a continuous Galerkin spectral element discretisation defined over unstructured meshes composed of triangles (2D), or tetrahedra (3D). The method uses a stable, nodal basis constructed from PKD polynomials and thus retains the spectral accuracy and low dispersive properties of the classical SEM, in addition to the geometric versatility provided by unstructured simplex meshes. For the particular basis and quadrature rule we have adopted, the discretisation results in a mass matrix which is not diagonal, thereby mandating linear solvers be utilised. To that end, we have developed efficient solvers and preconditioners which are robust with respect to the polynomial order (p), and possess high arithmetic intensity. Furthermore, we also consider using implicit time integrators, together with a p-multigrid preconditioner to circumvent the CFL condition. Implicit time integrators become particularly relevant when considering solving problems on poor quality meshes, or meshes containing elements with a widely varying range of length scales - both of which frequently arise when meshing non-trivial geometries. We demonstrate the applicability of the new method by examining a number of two- and three-dimensional wave propagation scenarios. These scenarios serve to characterise the accuracy and cost of the new method. Lastly, we will assess the potential benefits of using implicit time integrators for regional scale wave propagation simulations.
A frequency dependent preconditioned wavelet method for atmospheric tomography

NASA Astrophysics Data System (ADS)

Yudytskiy, Mykhaylo; Helin, Tapio; Ramlau, Ronny

2013-12-01

Atmospheric tomography, i.e. the reconstruction of the turbulence in the atmosphere, is a main task for the adaptive optics systems of the next generation telescopes. For extremely large telescopes, such as the European Extremely Large Telescope, this problem becomes overly complex and an efficient algorithm is needed to reduce numerical costs. Recently, a conjugate gradient method based on wavelet parametrization of turbulence layers was introduced [5]. An iterative algorithm can only be numerically efficient when the number of iterations required for a sufficient reconstruction is low. A way to achieve this is to design an efficient preconditioner. In this paper we propose a new frequency-dependent preconditioner for the wavelet method. In the context of a multi conjugate adaptive optics (MCAO) system simulated on the official end-to-end simulation tool OCTOPUS of the European Southern Observatory we demonstrate robustness and speed of the preconditioned algorithm. We show that three iterations are sufficient for a good reconstruction.
A Robust Locally Preconditioned Semi-Coarsening Multigrid Algorithm for the 2-D Navier-Stokes Equations

NASA Technical Reports Server (NTRS)

Cain, Michael D.

1999-01-01

The goal of this thesis is to develop an efficient and robust locally preconditioned semi-coarsening multigrid algorithm for the two-dimensional Navier-Stokes equations. This thesis examines the performance of the multigrid algorithm with local preconditioning for an upwind-discretization of the Navier-Stokes equations. A block Jacobi iterative scheme is used because of its high frequency error mode damping ability. At low Mach numbers, the performance of a flux preconditioner is investigated. The flux preconditioner utilizes a new limiting technique based on local information that was developed by Siu. Full-coarsening and-semi-coarsening are examined as well as the multigrid V-cycle and full multigrid. The numerical tests were performed on a NACA 0012 airfoil at a range of Mach numbers. The tests show that semi-coarsening with flux preconditioning is the most efficient and robust combination of coarsening strategy, and iterative scheme - especially at low Mach numbers.
Novel numerical techniques for magma dynamics

NASA Astrophysics Data System (ADS)

Rhebergen, S.; Katz, R. F.; Wathen, A.; Alisic, L.; Rudge, J. F.; Wells, G.

2013-12-01

We discuss the development of finite element techniques and solvers for magma dynamics computations. These are implemented within the FEniCS framework. This approach allows for user-friendly, expressive, high-level code development, but also provides access to powerful, scalable numerical solvers and a large family of finite element discretisations. With the recent addition of dolfin-adjoint, FeniCS supports automated adjoint and tangent-linear models, enabling the rapid development of Generalised Stability Analysis. The ability to easily scale codes to three dimensions with large meshes, and/or to apply intricate adjoint calculations means that efficiency of the numerical algorithms is vital. We therefore describe our development and analysis of preconditioners designed specifically for finite element discretizations of equations governing magma dynamics. The preconditioners are based on Elman-Silvester-Wathen methods for the Stokes equation, and we extend these to flows with compaction. Our simulations are validated by comparison of results with laboratory experiments on partially molten aggregates.
On conjugate gradient type methods and polynomial preconditioners for a class of complex non-Hermitian matrices

NASA Technical Reports Server (NTRS)

Freund, Roland

1988-01-01

Conjugate gradient type methods are considered for the solution of large linear systems Ax = b with complex coefficient matrices of the type A = T + i(sigma)I where T is Hermitian and sigma, a real scalar. Three different conjugate gradient type approaches with iterates defined by a minimal residual property, a Galerkin type condition, and an Euclidian error minimization, respectively, are investigated. In particular, numerically stable implementations based on the ideas behind Paige and Saunder's SYMMLQ and MINRES for real symmetric matrices are proposed. Error bounds for all three methods are derived. It is shown how the special shift structure of A can be preserved by using polynomial preconditioning. Results on the optimal choice of the polynomial preconditioner are given. Also, some numerical experiments for matrices arising from finite difference approximations to the complex Helmholtz equation are reported.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.

2016-01-01

Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frinks, Neal T.

2016-01-01

Several improvements to the mixed-elementUSM3Ddiscretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Preconditioner-free Wiener filtering with a dense noise matrix

NASA Astrophysics Data System (ADS)

Huffenberger, Kevin M.

2018-05-01

This work extends the Elsner & Wandelt (2013) iterative method for efficient, preconditioner-free Wiener filtering to cases in which the noise covariance matrix is dense, but can be decomposed into a sum whose parts are sparse in convenient bases. The new method, which uses multiple messenger fields, reproduces Wiener-filter solutions for test problems, and we apply it to a case beyond the reach of the Elsner & Wandelt (2013) method. We compute the Wiener-filter solution for a simulated Cosmic Microwave Background (CMB) map that contains spatially varying, uncorrelated noise, isotropic 1/f noise, and large-scale horizontal stripes (like those caused by atmospheric noise). We discuss simple extensions that can filter contaminated modes or inverse-noise-filter the data. These techniques help to address complications in the noise properties of maps from current and future generations of ground-based Microwave Background experiments, like Advanced ACTPol, Simons Observatory, and CMB-S4.
Low-memory iterative density fitting.

PubMed

Grajciar, Lukáš

2015-07-30

A new low-memory modification of the density fitting approximation based on a combination of a continuous fast multipole method (CFMM) and a preconditioned conjugate gradient solver is presented. Iterative conjugate gradient solver uses preconditioners formed from blocks of the Coulomb metric matrix that decrease the number of iterations needed for convergence by up to one order of magnitude. The matrix-vector products needed within the iterative algorithm are calculated using CFMM, which evaluates them with the linear scaling memory requirements only. Compared with the standard density fitting implementation, up to 15-fold reduction of the memory requirements is achieved for the most efficient preconditioner at a cost of only 25% increase in computational time. The potential of the method is demonstrated by performing density functional theory calculations for zeolite fragment with 2592 atoms and 121,248 auxiliary basis functions on a single 12-core CPU workstation. © 2015 Wiley Periodicals, Inc.
Eigenvalue Solvers for Modeling Nuclear Reactors on Leadership Class Machines

DOE PAGES

Slaybaugh, R. N.; Ramirez-Zweiger, M.; Pandya, Tara; ...

2018-02-20

In this paper, three complementary methods have been implemented in the code Denovo that accelerate neutral particle transport calculations with methods that use leadership-class computers fully and effectively: a multigroup block (MG) Krylov solver, a Rayleigh quotient iteration (RQI) eigenvalue solver, and a multigrid in energy (MGE) preconditioner. The MG Krylov solver converges more quickly than Gauss Seidel and enables energy decomposition such that Denovo can scale to hundreds of thousands of cores. RQI should converge in fewer iterations than power iteration (PI) for large and challenging problems. RQI creates shifted systems that would not be tractable without the MGmore » Krylov solver. It also creates ill-conditioned matrices. The MGE preconditioner reduces iteration count significantly when used with RQI and takes advantage of the new energy decomposition such that it can scale efficiently. Each individual method has been described before, but this is the first time they have been demonstrated to work together effectively. The combination of solvers enables the RQI eigenvalue solver to work better than the other available solvers for large reactors problems on leadership-class machines. Using these methods together, RQI converged in fewer iterations and in less time than PI for a full pressurized water reactor core. These solvers also performed better than an Arnoldi eigenvalue solver for a reactor benchmark problem when energy decomposition is needed. The MG Krylov, MGE preconditioner, and RQI solver combination also scales well in energy. Finally, this solver set is a strong choice for very large and challenging problems.« less
Eigenvalue Solvers for Modeling Nuclear Reactors on Leadership Class Machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slaybaugh, R. N.; Ramirez-Zweiger, M.; Pandya, Tara

In this paper, three complementary methods have been implemented in the code Denovo that accelerate neutral particle transport calculations with methods that use leadership-class computers fully and effectively: a multigroup block (MG) Krylov solver, a Rayleigh quotient iteration (RQI) eigenvalue solver, and a multigrid in energy (MGE) preconditioner. The MG Krylov solver converges more quickly than Gauss Seidel and enables energy decomposition such that Denovo can scale to hundreds of thousands of cores. RQI should converge in fewer iterations than power iteration (PI) for large and challenging problems. RQI creates shifted systems that would not be tractable without the MGmore » Krylov solver. It also creates ill-conditioned matrices. The MGE preconditioner reduces iteration count significantly when used with RQI and takes advantage of the new energy decomposition such that it can scale efficiently. Each individual method has been described before, but this is the first time they have been demonstrated to work together effectively. The combination of solvers enables the RQI eigenvalue solver to work better than the other available solvers for large reactors problems on leadership-class machines. Using these methods together, RQI converged in fewer iterations and in less time than PI for a full pressurized water reactor core. These solvers also performed better than an Arnoldi eigenvalue solver for a reactor benchmark problem when energy decomposition is needed. The MG Krylov, MGE preconditioner, and RQI solver combination also scales well in energy. Finally, this solver set is a strong choice for very large and challenging problems.« less
Albany/FELIX: A parallel, scalable and robust, finite element, first-order Stokes approximation ice sheet solver built for advanced analysis

DOE PAGES

Tezaur, I. K.; Perego, M.; Salinger, A. G.; ...

2015-04-27

This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, alongmore » with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.« less
Advancing parabolic operators in thermodynamic MHD models: Explicit super time-stepping versus implicit schemes with Krylov solvers

NASA Astrophysics Data System (ADS)

Caplan, R. M.; Mikić, Z.; Linker, J. A.; Lionello, R.

2017-05-01

We explore the performance and advantages/disadvantages of using unconditionally stable explicit super time-stepping (STS) algorithms versus implicit schemes with Krylov solvers for integrating parabolic operators in thermodynamic MHD models of the solar corona. Specifically, we compare the second-order Runge-Kutta Legendre (RKL2) STS method with the implicit backward Euler scheme computed using the preconditioned conjugate gradient (PCG) solver with both a point-Jacobi and a non-overlapping domain decomposition ILU0 preconditioner. The algorithms are used to integrate anisotropic Spitzer thermal conduction and artificial kinematic viscosity at time-steps much larger than classic explicit stability criteria allow. A key component of the comparison is the use of an established MHD model (MAS) to compute a real-world simulation on a large HPC cluster. Special attention is placed on the parallel scaling of the algorithms. It is shown that, for a specific problem and model, the RKL2 method is comparable or surpasses the implicit method with PCG solvers in performance and scaling, but suffers from some accuracy limitations. These limitations, and the applicability of RKL methods are briefly discussed.
A scalable, fully implicit algorithm for the reduced two-field low-β extended MHD model

DOE PAGES

Chacon, Luis; Stanier, Adam John

2016-12-01

Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
Using Chebyshev polynomials and approximate inverse triangular factorizations for preconditioning the conjugate gradient method

NASA Astrophysics Data System (ADS)

Kaporin, I. E.

2012-02-01

In order to precondition a sparse symmetric positive definite matrix, its approximate inverse is examined, which is represented as the product of two sparse mutually adjoint triangular matrices. In this way, the solution of the corresponding system of linear algebraic equations (SLAE) by applying the preconditioned conjugate gradient method (CGM) is reduced to performing only elementary vector operations and calculating sparse matrix-vector products. A method for constructing the above preconditioner is described and analyzed. The triangular factor has a fixed sparsity pattern and is optimal in the sense that the preconditioned matrix has a minimum K-condition number. The use of polynomial preconditioning based on Chebyshev polynomials makes it possible to considerably reduce the amount of scalar product operations (at the cost of an insignificant increase in the total number of arithmetic operations). The possibility of an efficient massively parallel implementation of the resulting method for solving SLAEs is discussed. For a sequential version of this method, the results obtained by solving 56 test problems from the Florida sparse matrix collection (which are large-scale and ill-conditioned) are presented. These results show that the method is highly reliable and has low computational costs.
An overview of NSPCG: A nonsymmetric preconditioned conjugate gradient package

NASA Astrophysics Data System (ADS)

Oppe, Thomas C.; Joubert, Wayne D.; Kincaid, David R.

1989-05-01

The most recent research-oriented software package developed as part of the ITPACK Project is called "NSPCG" since it contains many nonsymmetric preconditioned conjugate gradient procedures. It is designed to solve large sparse systems of linear algebraic equations by a variety of different iterative methods. One of the main purposes for the development of the package is to provide a common modular structure for research on iterative methods for nonsymmetric matrices. Another purpose for the development of the package is to investigate the suitability of several iterative methods for vector computers. Since the vectorizability of an iterative method depends greatly on the matrix structure, NSPCG allows great flexibility in the operator representation. The coefficient matrix can be passed in one of several different matrix data storage schemes. These sparse data formats allow matrices with a wide range of structures from highly structured ones such as those with all nonzeros along a relatively small number of diagonals to completely unstructured sparse matrices. Alternatively, the package allows the user to call the accelerators directly with user-supplied routines for performing certain matrix operations. In this case, one can use the data format from an application program and not be required to copy the matrix into one of the package formats. This is particularly advantageous when memory space is limited. Some of the basic preconditioners that are available are point methods such as Jacobi, Incomplete LU Decomposition and Symmetric Successive Overrelaxation as well as block and multicolor preconditioners. The user can select from a large collection of accelerators such as Conjugate Gradient (CG), Chebyshev (SI, for semi-iterative), Generalized Minimal Residual (GMRES), Biconjugate Gradient Squared (BCGS) and many others. The package is modular so that almost any accelerator can be used with almost any preconditioner.
Efficient preconditioning of the electronic structure problem in large scale ab initio molecular dynamics simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schiffmann, Florian; VandeVondele, Joost, E-mail: Joost.VandeVondele@mat.ethz.ch

2015-06-28

We present an improved preconditioning scheme for electronic structure calculations based on the orbital transformation method. First, a preconditioner is developed which includes information from the full Kohn-Sham matrix but avoids computationally demanding diagonalisation steps in its construction. This reduces the computational cost of its construction, eliminating a bottleneck in large scale simulations, while maintaining rapid convergence. In addition, a modified form of Hotelling’s iterative inversion is introduced to replace the exact inversion of the preconditioner matrix. This method is highly effective during molecular dynamics (MD), as the solution obtained in earlier MD steps is a suitable initial guess. Filteringmore » small elements during sparse matrix multiplication leads to linear scaling inversion, while retaining robustness, already for relatively small systems. For system sizes ranging from a few hundred to a few thousand atoms, which are typical for many practical applications, the improvements to the algorithm lead to a 2-5 fold speedup per MD step.« less
Analysis of physics-based preconditioning for single-phase subchannel equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hansel, J. E.; Ragusa, J. C.; Allu, S.

2013-07-01

The (single-phase) subchannel approximations are used throughout nuclear engineering to provide an efficient flow simulation because the computational burden is much smaller than for computational fluid dynamics (CFD) simulations, and empirical relations have been developed and validated to provide accurate solutions in appropriate flow regimes. Here, the subchannel equations have been recast in a residual form suitable for a multi-physics framework. The Eigen spectrum of the Jacobian matrix, along with several potential physics-based preconditioning approaches, are evaluated, and the the potential for improved convergence from preconditioning is assessed. The physics-based preconditioner options include several forms of reduced equations that decouplemore » the subchannels by neglecting crossflow, conduction, and/or both turbulent momentum and energy exchange between subchannels. Eigen-scopy analysis shows that preconditioning moves clusters of eigenvalues away from zero and toward one. A test problem is run with and without preconditioning. Without preconditioning, the solution failed to converge using GMRES, but application of any of the preconditioners allowed the solution to converge. (authors)« less
Fully implicit adaptive mesh refinement solver for 2D MHD

NASA Astrophysics Data System (ADS)

Philip, B.; Chacon, L.; Pernice, M.

2008-11-01

Application of implicit adaptive mesh refinement (AMR) to simulate resistive magnetohydrodynamics is described. Solving this challenging multi-scale, multi-physics problem can improve understanding of reconnection in magnetically-confined plasmas. AMR is employed to resolve extremely thin current sheets, essential for an accurate macroscopic description. Implicit time stepping allows us to accurately follow the dynamical time scale of the developing magnetic field, without being restricted by fast Alfven time scales. At each time step, the large-scale system of nonlinear equations is solved by a Jacobian-free Newton-Krylov method together with a physics-based preconditioner. Each block within the preconditioner is solved optimally using the Fast Adaptive Composite grid method, which can be considered as a multiplicative Schwarz method on AMR grids. We will demonstrate the excellent accuracy and efficiency properties of the method with several challenging reduced MHD applications, including tearing, island coalescence, and tilt instabilities. B. Philip, L. Chac'on, M. Pernice, J. Comput. Phys., in press (2008)
Multilevel Preconditioners for Discontinuous Galerkin Approximations of Elliptic Problems with Jump Coefficients

DTIC Science & Technology

2010-12-01

discontinuous coefficients on geometrically nonconforming substructures. Technical Report Serie A 634, Instituto de Matematica Pura e Aplicada, Brazil, 2009...Instituto de Matematica Pura e Aplicada, Brazil, 2010. submitted. [41] M. Dryja, M. V. Sarkis, and O. B. Widlund. Multilevel Schwarz methods for

Improved L-BFGS diagonal preconditioners for a large-scale 4D-Var inversion system: application to CO2 flux constraints and analysis error calculation

NASA Astrophysics Data System (ADS)

Bousserez, Nicolas; Henze, Daven; Bowman, Kevin; Liu, Junjie; Jones, Dylan; Keller, Martin; Deng, Feng

2013-04-01

This work presents improved analysis error estimates for 4D-Var systems. From operational NWP models to top-down constraints on trace gas emissions, many of today's data assimilation and inversion systems in atmospheric science rely on variational approaches. This success is due to both the mathematical clarity of these formulations and the availability of computationally efficient minimization algorithms. However, unlike Kalman Filter-based algorithms, these methods do not provide an estimate of the analysis or forecast error covariance matrices, these error statistics being propagated only implicitly by the system. From both a practical (cycling assimilation) and scientific perspective, assessing uncertainties in the solution of the variational problem is critical. For large-scale linear systems, deterministic or randomization approaches can be considered based on the equivalence between the inverse Hessian of the cost function and the covariance matrix of analysis error. For perfectly quadratic systems, like incremental 4D-Var, Lanczos/Conjugate-Gradient algorithms have proven to be most efficient in generating low-rank approximations of the Hessian matrix during the minimization. For weakly non-linear systems though, the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS), a quasi-Newton descent algorithm, is usually considered the best method for the minimization. Suitable for large-scale optimization, this method allows one to generate an approximation to the inverse Hessian using the latest m vector/gradient pairs generated during the minimization, m depending upon the available core memory. At each iteration, an initial low-rank approximation to the inverse Hessian has to be provided, which is called preconditioning. The ability of the preconditioner to retain useful information from previous iterations largely determines the efficiency of the algorithm. Here we assess the performance of different preconditioners to estimate the inverse Hessian of a large-scale 4D-Var system. The impact of using the diagonal preconditioners proposed by Gilbert and Le Maréchal (1989) instead of the usual Oren-Spedicato scalar will be first presented. We will also introduce new hybrid methods that combine randomization estimates of the analysis error variance with L-BFGS diagonal updates to improve the inverse Hessian approximation. Results from these new algorithms will be evaluated against standard large ensemble Monte-Carlo simulations. The methods explored here are applied to the problem of inferring global atmospheric CO2 fluxes using remote sensing observations, and are intended to be integrated with the future NASA Carbon Monitoring System.
The Development of a Finite Volume Method for Modeling Sound in Coastal Ocean Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Long, Wen; Yang, Zhaoqing; Copping, Andrea E.

: As the rapid growth of marine renewable energy and off-shore wind energy, there have been concerns that the noises generated from construction and operation of the devices may interfere marine animals’ communication. In this research, a underwater sound model is developed to simulate sound prorogation generated by marine-hydrokinetic energy (MHK) devices or offshore wind (OSW) energy platforms. Finite volume and finite difference methods are developed to solve the 3D Helmholtz equation of sound propagation in the coastal environment. For finite volume method, the grid system consists of triangular grids in horizontal plane and sigma-layers in vertical dimension. A 3Dmore » sparse matrix solver with complex coefficients is formed for solving the resulting acoustic pressure field. The Complex Shifted Laplacian Preconditioner (CSLP) method is applied to efficiently solve the matrix system iteratively with MPI parallelization using a high performance cluster. The sound model is then coupled with the Finite Volume Community Ocean Model (FVCOM) for simulating sound propagation generated by human activities in a range-dependent setting, such as offshore wind energy platform constructions and tidal stream turbines. As a proof of concept, initial validation of the finite difference solver is presented for two coastal wedge problems. Validation of finite volume method will be reported separately.« less
A Study of Multigrid Preconditioners Using Eigensystem Analysis

NASA Technical Reports Server (NTRS)

Roberts, Thomas W.; Swanson, R. C.

2005-01-01

The convergence properties of numerical schemes for partial differential equations are studied by examining the eigensystem of the discrete operator. This method of analysis is very general, and allows the effects of boundary conditions and grid nonuniformities to be examined directly. Algorithms for the Laplace equation and a two equation model hyperbolic system are examined.
Spectral element multigrid. Part 2: Theoretical justification

NASA Technical Reports Server (NTRS)

Maday, Yvon; Munoz, Rafael

1988-01-01

A multigrid algorithm is analyzed which is used for solving iteratively the algebraic system resulting from tha approximation of a second order problem by spectral or spectral element methods. The analysis, performed here in the one dimensional case, justifies the good smoothing properties of the Jacobi preconditioner that was presented in Part 1 of this paper.
The survey of preconditioners used for accelerating the rate of convergence in the Gauss-Seidel method

NASA Astrophysics Data System (ADS)

Niki, Hiroshi; Harada, Kyouji; Morimoto, Munenori; Sakakihara, Michio

2004-03-01

Several preconditioned iterative methods reported in the literature have been used for improving the convergence rate of the Gauss-Seidel method. In this article, on the basis of nonnegative matrix, comparisons between some splittings for such preconditioned matrices are derived. Simple numerical examples are also given.
A Kronecker product splitting preconditioner for two-dimensional space-fractional diffusion equations

NASA Astrophysics Data System (ADS)

Chen, Hao; Lv, Wen; Zhang, Tongtong

2018-05-01

We study preconditioned iterative methods for the linear system arising in the numerical discretization of a two-dimensional space-fractional diffusion equation. Our approach is based on a formulation of the discrete problem that is shown to be the sum of two Kronecker products. By making use of an alternating Kronecker product splitting iteration technique we establish a class of fixed-point iteration methods. Theoretical analysis shows that the new method converges to the unique solution of the linear system. Moreover, the optimal choice of the involved iteration parameters and the corresponding asymptotic convergence rate are computed exactly when the eigenvalues of the system matrix are all real. The basic iteration is accelerated by a Krylov subspace method like GMRES. The corresponding preconditioner is in a form of a Kronecker product structure and requires at each iteration the solution of a set of discrete one-dimensional fractional diffusion equations. We use structure preserving approximations to the discrete one-dimensional fractional diffusion operators in the action of the preconditioning matrix. Numerical examples are presented to illustrate the effectiveness of this approach.
Application of linear multifrequency-grey acceleration to preconditioned Krylov iterations for thermal radiation transport

DOE PAGES

Till, Andrew T.; Warsa, James S.; Morel, Jim E.

2018-06-15

The thermal radiative transfer (TRT) equations comprise a radiation equation coupled to the material internal energy equation. Linearization of these equations produces effective, thermally-redistributed scattering through absorption-reemission. In this paper, we investigate the effectiveness and efficiency of Linear-Multi-Frequency-Grey (LMFG) acceleration that has been reformulated for use as a preconditioner to Krylov iterative solution methods. We introduce two general frameworks, the scalar flux formulation (SFF) and the absorption rate formulation (ARF), and investigate their iterative properties in the absence and presence of true scattering. SFF has a group-dependent state size but may be formulated without inner iterations in the presence ofmore » scattering, while ARF has a group-independent state size but requires inner iterations when scattering is present. We compare and evaluate the computational cost and efficiency of LMFG applied to these two formulations using a direct solver for the preconditioners. Finally, this work is novel because the use of LMFG for the radiation transport equation, in conjunction with Krylov methods, involves special considerations not required for radiation diffusion.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Paul T.; Shadid, John N.; Tsuji, Paul H.

Here, this study explores the performance and scaling of a GMRES Krylov method employed as a smoother for an algebraic multigrid (AMG) preconditioned Newton- Krylov solution approach applied to a fully-implicit variational multiscale (VMS) nite element (FE) resistive magnetohydrodynamics (MHD) formulation. In this context a Newton iteration is used for the nonlinear system and a Krylov (GMRES) method is employed for the linear subsystems. The efficiency of this approach is critically dependent on the scalability and performance of the AMG preconditioner for the linear solutions and the performance of the smoothers play a critical role. Krylov smoothers are considered inmore » an attempt to reduce the time and memory requirements of existing robust smoothers based on additive Schwarz domain decomposition (DD) with incomplete LU factorization solves on each subdomain. Three time dependent resistive MHD test cases are considered to evaluate the method. The results demonstrate that the GMRES smoother can be faster due to a decrease in the preconditioner setup time and a reduction in outer GMRESR solver iterations, and requires less memory (typically 35% less memory for global GMRES smoother) than the DD ILU smoother.« less
A parallel electrostatic Particle-in-Cell method on unstructured tetrahedral grids for large-scale bounded collisionless plasma simulations

NASA Astrophysics Data System (ADS)

Averkin, Sergey N.; Gatsonis, Nikolaos A.

2018-06-01

An unstructured electrostatic Particle-In-Cell (EUPIC) method is developed on arbitrary tetrahedral grids for simulation of plasmas bounded by arbitrary geometries. The electric potential in EUPIC is obtained on cell vertices from a finite volume Multi-Point Flux Approximation of Gauss' law using the indirect dual cell with Dirichlet, Neumann and external circuit boundary conditions. The resulting matrix equation for the nodal potential is solved with a restarted generalized minimal residual method (GMRES) and an ILU(0) preconditioner algorithm, parallelized using a combination of node coloring and level scheduling approaches. The electric field on vertices is obtained using the gradient theorem applied to the indirect dual cell. The algorithms for injection, particle loading, particle motion, and particle tracking are parallelized for unstructured tetrahedral grids. The algorithms for the potential solver, electric field evaluation, loading, scatter-gather algorithms are verified using analytic solutions for test cases subject to Laplace and Poisson equations. Grid sensitivity analysis examines the L2 and L∞ norms of the relative error in potential, field, and charge density as a function of edge-averaged and volume-averaged cell size. Analysis shows second order of convergence for the potential and first order of convergence for the electric field and charge density. Temporal sensitivity analysis is performed and the momentum and energy conservation properties of the particle integrators in EUPIC are examined. The effects of cell size and timestep on heating, slowing-down and the deflection times are quantified. The heating, slowing-down and the deflection times are found to be almost linearly dependent on number of particles per cell. EUPIC simulations of current collection by cylindrical Langmuir probes in collisionless plasmas show good comparison with previous experimentally validated numerical results. These simulations were also used in a parallelization efficiency investigation. Results show that the EUPIC has efficiency of more than 80% when the simulation is performed on a single CPU from a non-uniform memory access node and the efficiency is decreasing as the number of threads further increases. The EUPIC is applied to the simulation of the multi-species plasma flow over a geometrically complex CubeSat in Low Earth Orbit. The EUPIC potential and flowfield distribution around the CubeSat exhibit features that are consistent with previous simulations over simpler geometrical bodies.
StagBL : A Scalable, Portable, High-Performance Discretization and Solver Layer for Geodynamic Simulation

NASA Astrophysics Data System (ADS)

Sanan, P.; Tackley, P. J.; Gerya, T.; Kaus, B. J. P.; May, D.

2017-12-01

StagBL is an open-source parallel solver and discretization library for geodynamic simulation,encapsulating and optimizing operations essential to staggered-grid finite volume Stokes flow solvers.It provides a parallel staggered-grid abstraction with a high-level interface in C and Fortran.On top of this abstraction, tools are available to define boundary conditions and interact with particle systems.Tools and examples to efficiently solve Stokes systems defined on the grid are provided in small (direct solver), medium (simple preconditioners), and large (block factorization and multigrid) model regimes.By working directly with leading application codes (StagYY, I3ELVIS, and LaMEM) and providing an API and examples to integrate with others, StagBL aims to become a community tool supplying scalable, portable, reproducible performance toward novel science in regional- and planet-scale geodynamics and planetary science.By implementing kernels used by many research groups beneath a uniform abstraction layer, the library will enable optimization for modern hardware, thus reducing community barriers to large- or extreme-scale parallel simulation on modern architectures. In particular, the library will include CPU-, Manycore-, and GPU-optimized variants of matrix-free operators and multigrid components.The common layer provides a framework upon which to introduce innovative new tools.StagBL will leverage p4est to provide distributed adaptive meshes, and incorporate a multigrid convergence analysis tool.These options, in addition to a wealth of solver options provided by an interface to PETSc, will make the most modern solution techniques available from a common interface. StagBL in turn provides a PETSc interface, DMStag, to its central staggered grid abstraction.We present public version 0.5 of StagBL, including preliminary integration with application codes and demonstrations with its own demonstration application, StagBLDemo. Central to StagBL is the notion of an uninterrupted pipeline from toy/teaching codes to high-performance, extreme-scale solves. StagBLDemo replicates the functionality of an advanced MATLAB-style regional geodynamics code, thus providing users with a concrete procedure to exceed the performance and scalability limitations of smaller-scale tools.
Preconditioner Circuit Analysis

DTIC Science & Technology

2011-09-01

S) Matthew J. Nye 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval Postgraduate School Monterey, CA 939435–000 8. PERFORMING... ORGANIZATION REPORT NUMBER 9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) N/A 10. SPONSORING/MONITORING AGENCY REPORT NUMBER 11...of the simulations and the theoretical computations. D. THESIS ORGANIZATION This thesis is organized into four chapters. The theoretical
Desiccant outdoor air preconditioners maximize heat recovery ventilation potentials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meckler, M.

1995-12-31

Microorganisms are well protected indoors by the moisture surrounding them if the relative humidity is above 70%. They can cause many acute diseases, infections, and allergies. Humidity also has an effect on air cleanliness and causes the building structure and its contents to deteriorate. Therefore, controlling humidity is a very important factor to human health and comfort and the structural longevity of a building. To date, a great deal of research has been done, and is continuing, in the use of both solid and liquid desiccants. This paper introduces a desiccant-assisted system that combines dehumidification and mechanical refrigeration by meansmore » of a desiccant preconditioning module that can serve two or more conventional air-conditioning units. It will be demonstrated that the proposed system, also having indirect evaporative cooling within the preconditioning module, can reduce energy consumption and provide significant cost savings, independent humidity and temperature control, and, therefore, improved indoor air quality and enhanced occupant comfort.« less
Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

NASA Astrophysics Data System (ADS)

Zerr, Robert Joseph

2011-12-01

The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
BDDC algorithms with deluxe scaling and adaptive selection of primal constraints for Raviart-Thomas vector fields

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oh, Duk -Soon; Widlund, Olof B.; Zampini, Stefano

Here, a BDDC domain decomposition preconditioner is defined by a coarse component, expressed in terms of primal constraints, a weighted average across the interface between the subdomains, and local components given in terms of solvers of local subdomain problems. BDDC methods for vector field problems discretized with Raviart-Thomas finite elements are introduced. The methods are based on a new type of weighted average an adaptive selection of primal constraints developed to deal with coefficients with high contrast even inside individual subdomains. For problems with very many subdomains, a third level of the preconditioner is introduced. Assuming that the subdomains aremore » all built from elements of a coarse triangulation of the given domain, and that in each subdomain the material parameters are consistent, one obtains a bound for the preconditioned linear system's condition number which is independent of the values and jumps of these parameters across the subdomains' interface. Numerical experiments, using the PETSc library, are also presented which support the theory and show the algorithms' effectiveness even for problems not covered by the theory. Also included are experiments with Brezzi-Douglas-Marini finite-element approximations.« less
A model reduction approach to numerical inversion for a parabolic partial differential equation

NASA Astrophysics Data System (ADS)

Borcea, Liliana; Druskin, Vladimir; Mamonov, Alexander V.; Zaslavsky, Mikhail

2014-12-01

We propose a novel numerical inversion algorithm for the coefficients of parabolic partial differential equations, based on model reduction. The study is motivated by the application of controlled source electromagnetic exploration, where the unknown is the subsurface electrical resistivity and the data are time resolved surface measurements of the magnetic field. The algorithm presented in this paper considers inversion in one and two dimensions. The reduced model is obtained with rational interpolation in the frequency (Laplace) domain and a rational Krylov subspace projection method. It amounts to a nonlinear mapping from the function space of the unknown resistivity to the small dimensional space of the parameters of the reduced model. We use this mapping as a nonlinear preconditioner for the Gauss-Newton iterative solution of the inverse problem. The advantage of the inversion algorithm is twofold. First, the nonlinear preconditioner resolves most of the nonlinearity of the problem. Thus the iterations are less likely to get stuck in local minima and the convergence is fast. Second, the inversion is computationally efficient because it avoids repeated accurate simulations of the time-domain response. We study the stability of the inversion algorithm for various rational Krylov subspaces, and assess its performance with numerical experiments.
BDDC algorithms with deluxe scaling and adaptive selection of primal constraints for Raviart-Thomas vector fields

DOE PAGES

Oh, Duk -Soon; Widlund, Olof B.; Zampini, Stefano; ...

2017-06-21

Here, a BDDC domain decomposition preconditioner is defined by a coarse component, expressed in terms of primal constraints, a weighted average across the interface between the subdomains, and local components given in terms of solvers of local subdomain problems. BDDC methods for vector field problems discretized with Raviart-Thomas finite elements are introduced. The methods are based on a new type of weighted average an adaptive selection of primal constraints developed to deal with coefficients with high contrast even inside individual subdomains. For problems with very many subdomains, a third level of the preconditioner is introduced. Assuming that the subdomains aremore » all built from elements of a coarse triangulation of the given domain, and that in each subdomain the material parameters are consistent, one obtains a bound for the preconditioned linear system's condition number which is independent of the values and jumps of these parameters across the subdomains' interface. Numerical experiments, using the PETSc library, are also presented which support the theory and show the algorithms' effectiveness even for problems not covered by the theory. Also included are experiments with Brezzi-Douglas-Marini finite-element approximations.« less
Real-time simulation of contact and cutting of heterogeneous soft-tissues.

PubMed

Courtecuisse, Hadrien; Allard, Jérémie; Kerfriden, Pierre; Bordas, Stéphane P A; Cotin, Stéphane; Duriez, Christian

2014-02-01

This paper presents a numerical method for interactive (real-time) simulations, which considerably improves the accuracy of the response of heterogeneous soft-tissue models undergoing contact, cutting and other topological changes. We provide an integrated methodology able to deal both with the ill-conditioning issues associated with material heterogeneities, contact boundary conditions which are one of the main sources of inaccuracies, and cutting which is one of the most challenging issues in interactive simulations. Our approach is based on an implicit time integration of a non-linear finite element model. To enable real-time computations, we propose a new preconditioning technique, based on an asynchronous update at low frequency. The preconditioner is not only used to improve the computation of the deformation of the tissues, but also to simulate the contact response of homogeneous and heterogeneous bodies with the same accuracy. We also address the problem of cutting the heterogeneous structures and propose a method to update the preconditioner according to the topological modifications. Finally, we apply our approach to three challenging demonstrators: (i) a simulation of cataract surgery (ii) a simulation of laparoscopic hepatectomy (iii) a brain tumor surgery. Copyright © 2013 Elsevier B.V. All rights reserved.
Recent advances in nonlinear implicit, electrostatic particle-in-cell (PIC) algorithms

NASA Astrophysics Data System (ADS)

Chen, Guangye; Chacón, Luis; Barnes, Daniel

2012-10-01

An implicit 1D electrostatic PIC algorithmfootnotetextChen, Chac'on, Barnes, J. Comput. Phys. 230 (2011) has been developed that satisfies exact energy and charge conservation. The algorithm employs a kinetic-enslaved Jacobian-free Newton-Krylov methodfootnotetextIbid. that ensures nonlinear convergence while taking timesteps comparable to the dynamical timescale of interest. Here we present two main improvements of the algorithm. The first is the formulation of a preconditioner based on linearized fluid equations, which are closed using available particle information. The computational benefit is that solving the fluid system is much cheaper than the kinetic one. The effectiveness of the preconditioner in accelerating nonlinear iterations on challenging problems will be demonstrated. A second improvement is the generalization of Ref. 1 to curvilinear meshes,footnotetextChac'on, Chen, Barnes, J. Comput. Phys. submitted (2012) with a hybrid particle update of positions and velocities in logical and physical space respectively.footnotetextSwift, J. Comp. Phys., 126 (1996) The curvilinear algorithm remains exactly charge and energy-conserving, and can be extended to multiple dimensions. We demonstrate the accuracy and efficiency of the algorithm with a 1D ion-acoustic shock wave simulation.
A modified conjugate gradient method based on the Tikhonov system for computerized tomography (CT).

PubMed

Wang, Qi; Wang, Huaxiang

2011-04-01

During the past few decades, computerized tomography (CT) was widely used for non-destructive testing (NDT) and non-destructive examination (NDE) in the industrial area because of its characteristics of non-invasiveness and visibility. Recently, CT technology has been applied to multi-phase flow measurement. Using the principle of radiation attenuation measurements along different directions through the investigated object with a special reconstruction algorithm, cross-sectional information of the scanned object can be worked out. It is a typical inverse problem and has always been a challenge for its nonlinearity and ill-conditions. The Tikhonov regulation method is widely used for similar ill-posed problems. However, the conventional Tikhonov method does not provide reconstructions with qualities good enough, the relative errors between the reconstructed images and the real distribution should be further reduced. In this paper, a modified conjugate gradient (CG) method is applied to a Tikhonov system (MCGT method) for reconstructing CT images. The computational load is dominated by the number of independent measurements m, and a preconditioner is imported to lower the condition number of the Tikhonov system. Both simulation and experiment results indicate that the proposed method can reduce the computational time and improve the quality of image reconstruction. Copyright © 2010 ISA. Published by Elsevier Ltd. All rights reserved.
Architecting the Finite Element Method Pipeline for the GPU.

PubMed

Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T

2014-02-01

The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.

Identification of immiscible NAPL contaminant sources in aquifers by a modified two-level saturation based imperialist competitive algorithm.

PubMed

Ghafouri, H R; Mosharaf-Dehkordi, M; Afzalan, B

2017-07-01

A simulation-optimization model is proposed for identifying the characteristics of local immiscible NAPL contaminant sources inside aquifers. This model employs the UTCHEM 9.0 software as its simulator for solving the governing equations associated with the multi-phase flow in porous media. As the optimization model, a novel two-level saturation based Imperialist Competitive Algorithm (ICA) is proposed to estimate the parameters of contaminant sources. The first level consists of three parallel independent ICAs and plays as a pre-conditioner for the second level which is a single modified ICA. The ICA in the second level is modified by dividing each country into a number of provinces (smaller parts). Similar to countries in the classical ICA, these provinces are optimized by the assimilation, competition, and revolution steps in the ICA. To increase the diversity of populations, a new approach named knock the base method is proposed. The performance and accuracy of the simulation-optimization model is assessed by solving a set of two and three-dimensional problems considering the effects of different parameters such as the grid size, rock heterogeneity and designated monitoring networks. The obtained numerical results indicate that using this simulation-optimization model provides accurate results at a less number of iterations when compared with the model employing the classical one-level ICA. Copyright © 2017 Elsevier B.V. All rights reserved.
M-step preconditioned conjugate gradient methods

NASA Technical Reports Server (NTRS)

Adams, L.

1983-01-01

Preconditioned conjugate gradient methods for solving sparse symmetric and positive finite systems of linear equations are described. Necessary and sufficient conditions are given for when these preconditioners can be used and an analysis of their effectiveness is given. Efficient computer implementations of these methods are discussed and results on the CYBER 203 and the Finite Element Machine under construction at NASA Langley Research Center are included.
A multilevel preconditioner for domain decomposition boundary systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.

1991-12-11

In this note, we consider multilevel preconditioning of the reduced boundary systems which arise in non-overlapping domain decomposition methods. It will be shown that the resulting preconditioned systems have condition numbers which be bounded in the case of multilevel spaces on the whole domain and grow at most proportional to the number of levels in the case of multilevel boundary spaces without multilevel extensions into the interior.
Angular Multigrid Preconditioner for Krylov-Based Solution Techniques Applied to the Sn Equations with Highly Forward-Peaked Scattering

NASA Astrophysics Data System (ADS)

Turcksin, Bruno; Ragusa, Jean C.; Morel, Jim E.

2012-01-01

It is well known that the diffusion synthetic acceleration (DSA) methods for the Sn equations become ineffective in the Fokker-Planck forward-peaked scattering limit. In response to this deficiency, Morel and Manteuffel (1991) developed an angular multigrid method for the 1-D Sn equations. This method is very effective, costing roughly twice as much as DSA per source iteration, and yielding a maximum spectral radius of approximately 0.6 in the Fokker-Planck limit. Pautz, Adams, and Morel (PAM) (1999) later generalized the angular multigrid to 2-D, but it was found that the method was unstable with sufficiently forward-peaked mappings between the angular grids. The method was stabilized via a filtering technique based on diffusion operators, but this filtering also degraded the effectiveness of the overall scheme. The spectral radius was not bounded away from unity in the Fokker-Planck limit, although the method remained more effective than DSA. The purpose of this article is to recast the multidimensional PAM angular multigrid method without the filtering as an Sn preconditioner and use it in conjunction with the Generalized Minimal RESidual (GMRES) Krylov method. The approach ensures stability and our computational results demonstrate that it is also significantly more efficient than an analogous DSA-preconditioned Krylov method.
Fractional Step Like Schemes for Free Surface Problems with Thermal Coupling Using the Lagrangian PFEM

NASA Astrophysics Data System (ADS)

Aubry, R.; Oñate, E.; Idelsohn, S. R.

2006-09-01

The method presented in Aubry et al. (Comput Struc 83:1459-1475, 2005) for the solution of an incompressible viscous fluid flow with heat transfer using a fully Lagrangian description of motion is extended to three dimensions (3D) with particular emphasis on mass conservation. A modified fractional step (FS) based on the pressure Schur complement (Turek 1999), and related to the class of algebraic splittings Quarteroni et al. (Comput Methods Appl Mech Eng 188:505-526, 2000), is used and a new advantage of the splittings of the equations compared with the classical FS is highlighted for free surface problems. The temperature is semi-coupled with the displacement, which is the main variable in a Lagrangian description. Comparisons for various mesh Reynolds numbers are performed with the classical FS, an algebraic splitting and a monolithic solution, in order to illustrate the behaviour of the Uzawa operator and the mass conservation. As the classical fractional step is equivalent to one iteration of the Uzawa algorithm performed with a standard Laplacian as a preconditioner, it will behave well only in a Reynold mesh number domain where the preconditioner is efficient. Numerical results are provided to assess the superiority of the modified algebraic splitting to the classical FS.
A second-order accurate finite volume scheme with the discrete maximum principle for solving Richards’ equation on unstructured meshes

DOE PAGES

Svyatsky, Daniil; Lipnikov, Konstantin

2017-03-18

Richards’s equation describes steady-state or transient flow in a variably saturated medium. For a medium having multiple layers of soils that are not aligned with coordinate axes, a mesh fitted to these layers is no longer orthogonal and the classical two-point flux approximation finite volume scheme is no longer accurate. Here, we propose new second-order accurate nonlinear finite volume (NFV) schemes for the head and pressure formulations of Richards’ equation. We prove that the discrete maximum principles hold for both formulations at steady-state which mimics similar properties of the continuum solution. The second-order accuracy is achieved using high-order upwind algorithmsmore » for the relative permeability. Numerical simulations of water infiltration into a dry soil show significant advantage of the second-order NFV schemes over the first-order NFV schemes even on coarse meshes. Since explicit calculation of the Jacobian matrix becomes prohibitively expensive for high-order schemes due to build-in reconstruction and slope limiting algorithms, we study numerically the preconditioning strategy introduced recently in Lipnikov et al. (2016) that uses a stable approximation of the continuum Jacobian. Lastly, numerical simulations show that the new preconditioner reduces computational cost up to 2–3 times in comparison with the conventional preconditioners.« less
Convergence Acceleration of Runge-Kutta Schemes for Solving the Navier-Stokes Equations

NASA Technical Reports Server (NTRS)

Swanson, Roy C., Jr.; Turkel, Eli; Rossow, C.-C.

2007-01-01

The convergence of a Runge-Kutta (RK) scheme with multigrid is accelerated by preconditioning with a fully implicit operator. With the extended stability of the Runge-Kutta scheme, CFL numbers as high as 1000 can be used. The implicit preconditioner addresses the stiffness in the discrete equations associated with stretched meshes. This RK/implicit scheme is used as a smoother for multigrid. Fourier analysis is applied to determine damping properties. Numerical dissipation operators based on the Roe scheme, a matrix dissipation, and the CUSP scheme are considered in evaluating the RK/implicit scheme. In addition, the effect of the number of RK stages is examined. Both the numerical and computational efficiency of the scheme with the different dissipation operators are discussed. The RK/implicit scheme is used to solve the two-dimensional (2-D) and three-dimensional (3-D) compressible, Reynolds-averaged Navier-Stokes equations. Turbulent flows over an airfoil and wing at subsonic and transonic conditions are computed. The effects of the cell aspect ratio on convergence are investigated for Reynolds numbers between 5:7 x 10(exp 6) and 100 x 10(exp 6). It is demonstrated that the implicit preconditioner can reduce the computational time of a well-tuned standard RK scheme by a factor between four and ten.
A second-order accurate finite volume scheme with the discrete maximum principle for solving Richards’ equation on unstructured meshes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Svyatsky, Daniil; Lipnikov, Konstantin

Richards’s equation describes steady-state or transient flow in a variably saturated medium. For a medium having multiple layers of soils that are not aligned with coordinate axes, a mesh fitted to these layers is no longer orthogonal and the classical two-point flux approximation finite volume scheme is no longer accurate. Here, we propose new second-order accurate nonlinear finite volume (NFV) schemes for the head and pressure formulations of Richards’ equation. We prove that the discrete maximum principles hold for both formulations at steady-state which mimics similar properties of the continuum solution. The second-order accuracy is achieved using high-order upwind algorithmsmore » for the relative permeability. Numerical simulations of water infiltration into a dry soil show significant advantage of the second-order NFV schemes over the first-order NFV schemes even on coarse meshes. Since explicit calculation of the Jacobian matrix becomes prohibitively expensive for high-order schemes due to build-in reconstruction and slope limiting algorithms, we study numerically the preconditioning strategy introduced recently in Lipnikov et al. (2016) that uses a stable approximation of the continuum Jacobian. Lastly, numerical simulations show that the new preconditioner reduces computational cost up to 2–3 times in comparison with the conventional preconditioners.« less
Efficient Coupling of Fluid-Plasma and Monte-Carlo-Neutrals Models for Edge Plasma Transport

NASA Astrophysics Data System (ADS)

Dimits, A. M.; Cohen, B. I.; Friedman, A.; Joseph, I.; Lodestro, L. L.; Rensink, M. E.; Rognlien, T. D.; Sjogreen, B.; Stotler, D. P.; Umansky, M. V.

2017-10-01

UEDGE has been valuable for modeling transport in the tokamak edge and scrape-off layer due in part to its efficient fully implicit solution of coupled fluid neutrals and plasma models. We are developing an implicit coupling of the kinetic Monte-Carlo (MC) code DEGAS-2, as the neutrals model component, to the UEDGE plasma component, based on an extension of the Jacobian-free Newton-Krylov (JFNK) method to MC residuals. The coupling components build on the methods and coding already present in UEDGE. For the linear Krylov iterations, a procedure has been developed to ``extract'' a good preconditioner from that of UEDGE. This preconditioner may also be used to greatly accelerate the convergence rate of a relaxed fixed-point iteration, which may provide a useful ``intermediate'' algorithm. The JFNK method also requires calculation of Jacobian-vector products, for which any finite-difference procedure is inaccurate when a MC component is present. A semi-analytical procedure that retains the standard MC accuracy and fully kinetic neutrals physics is therefore being developed. Prepared for US DOE by LLNL under Contract DE-AC52-07NA27344 and LDRD project 15-ERD-059, by PPPL under Contract DE-AC02-09CH11466, and supported in part by the U.S. DOE, OFES.
A broadband fast multipole accelerated boundary element method for the three dimensional Helmholtz equation.

PubMed

Gumerov, Nail A; Duraiswami, Ramani

2009-01-01

The development of a fast multipole method (FMM) accelerated iterative solution of the boundary element method (BEM) for the Helmholtz equations in three dimensions is described. The FMM for the Helmholtz equation is significantly different for problems with low and high kD (where k is the wavenumber and D the domain size), and for large problems the method must be switched between levels of the hierarchy. The BEM requires several approximate computations (numerical quadrature, approximations of the boundary shapes using elements), and these errors must be balanced against approximations introduced by the FMM and the convergence criterion for iterative solution. These different errors must all be chosen in a way that, on the one hand, excess work is not done and, on the other, that the error achieved by the overall computation is acceptable. Details of translation operators for low and high kD, choice of representations, and BEM quadrature schemes, all consistent with these approximations, are described. A novel preconditioner using a low accuracy FMM accelerated solver as a right preconditioner is also described. Results of the developed solvers for large boundary value problems with 0.0001 less, similarkD less, similar500 are presented and shown to perform close to theoretical expectations.
Optimal and fast E/B separation with a dual messenger field

NASA Astrophysics Data System (ADS)

Kodi Ramanah, Doogesh; Lavaux, Guilhem; Wandelt, Benjamin D.

2018-05-01

We adapt our recently proposed dual messenger algorithm for spin field reconstruction and showcase its efficiency and effectiveness in Wiener filtering polarized cosmic microwave background (CMB) maps. Unlike conventional preconditioned conjugate gradient (PCG) solvers, our preconditioner-free technique can deal with high-resolution joint temperature and polarization maps with inhomogeneous noise distributions and arbitrary mask geometries with relative ease. Various convergence diagnostics illustrate the high quality of the dual messenger reconstruction. In contrast, the PCG implementation fails to converge to a reasonable solution for the specific problem considered. The implementation of the dual messenger method is straightforward and guarantees numerical stability and convergence. We show how the algorithm can be modified to generate fluctuation maps, which, combined with the Wiener filter solution, yield unbiased constrained signal realizations, consistent with observed data. This algorithm presents a pathway to exact global analyses of high-resolution and high-sensitivity CMB data for a statistically optimal separation of E and B modes. It is therefore relevant for current and next-generation CMB experiments, in the quest for the elusive primordial B-mode signal.
A communication library for the parallelization of air quality models on structured grids

NASA Astrophysics Data System (ADS)

Miehe, Philipp; Sandu, Adrian; Carmichael, Gregory R.; Tang, Youhua; Dăescu, Dacian

PAQMSG is an MPI-based, Fortran 90 communication library for the parallelization of air quality models (AQMs) on structured grids. It consists of distribution, gathering and repartitioning routines for different domain decompositions implementing a master-worker strategy. The library is architecture and application independent and includes optimization strategies for different architectures. This paper presents the library from a user perspective. Results are shown from the parallelization of STEM-III on Beowulf clusters. The PAQMSG library is available on the web. The communication routines are easy to use, and should allow for an immediate parallelization of existing AQMs. PAQMSG can also be used for constructing new models.
National Centers for Environmental Prediction

Science.gov Websites

Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
Limited angle tomographic breast imaging: A comparison of parallel beam and pinhole collimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wessell, D.E.; Kadrmas, D.J.; Frey, E.C.

1996-12-31

Results from clinical trials have suggested no improvement in lesion detection with parallel hole SPECT scintimammography (SM) with Tc-99m over parallel hole planar SM. In this initial investigation, we have elucidated some of the unique requirements of SPECT SM. With these requirements in mind, we have begun to develop practical data acquisition and reconstruction strategies that can reduce image artifacts and improve image quality. In this paper we investigate limited angle orbits for both parallel hole and pinhole SPECT SM. Singular Value Decomposition (SVD) is used to analyze the artifacts associated with the limited angle orbits. Maximum likelihood expectation maximizationmore » (MLEM) reconstructions are then used to examine the effects of attenuation compensation on the quality of the reconstructed image. All simulations are performed using the 3D-MCAT breast phantom. The results of these simulation studies demonstrate that limited angle SPECT SM is feasible, that attenuation correction is needed for accurate reconstructions, and that pinhole SPECT SM may have an advantage over parallel hole SPECT SM in terms of improved image quality and reduced image artifacts.« less
Combustion Stability Innovations for Liquid Rocket

DTIC Science & Technology

2010-01-31

waves within the pipe . Acoustic time for one pass = 0.003 sec. Closed end The following figure shows the second harmonic of the quarter wave mode at...waveguides at the center of the test section. The two drivers at either end can operate at sync or at a specified phase difference. The effect of close ...preserve conservation in real time. The preconditioner operates on the inner loop driving the solution to the next time level. Sufficient number of inner
Spectral analysis and multigrid preconditioners for two-dimensional space-fractional diffusion equations

NASA Astrophysics Data System (ADS)

Moghaderi, Hamid; Dehghan, Mehdi; Donatelli, Marco; Mazza, Mariarosa

2017-12-01

Fractional diffusion equations (FDEs) are a mathematical tool used for describing some special diffusion phenomena arising in many different applications like porous media and computational finance. In this paper, we focus on a two-dimensional space-FDE problem discretized by means of a second order finite difference scheme obtained as combination of the Crank-Nicolson scheme and the so-called weighted and shifted Grünwald formula. By fully exploiting the Toeplitz-like structure of the resulting linear system, we provide a detailed spectral analysis of the coefficient matrix at each time step, both in the case of constant and variable diffusion coefficients. Such a spectral analysis has a very crucial role, since it can be used for designing fast and robust iterative solvers. In particular, we employ the obtained spectral information to define a Galerkin multigrid method based on the classical linear interpolation as grid transfer operator and damped-Jacobi as smoother, and to prove the linear convergence rate of the corresponding two-grid method. The theoretical analysis suggests that the proposed grid transfer operator is strong enough for working also with the V-cycle method and the geometric multigrid. On this basis, we introduce two computationally favourable variants of the proposed multigrid method and we use them as preconditioners for Krylov methods. Several numerical results confirm that the resulting preconditioning strategies still keep a linear convergence rate.
Fully anisotropic 3-D EM modelling on a Lebedev grid with a multigrid pre-conditioner

NASA Astrophysics Data System (ADS)

Jaysaval, Piyoosh; Shantsev, Daniil V.; de la Kethulle de Ryhove, Sébastien; Bratteland, Tarjei

2016-12-01

We present a numerical algorithm for 3-D electromagnetic (EM) simulations in conducting media with general electric anisotropy. The algorithm is based on the finite-difference discretization of frequency-domain Maxwell's equations on a Lebedev grid, in which all components of the electric field are collocated but half a spatial step staggered with respect to the magnetic field components, which also are collocated. This leads to a system of linear equations that is solved using a stabilized biconjugate gradient method with a multigrid preconditioner. We validate the accuracy of the numerical results for layered and 3-D tilted transverse isotropic (TTI) earth models representing typical scenarios used in the marine controlled-source EM method. It is then demonstrated that not taking into account the full anisotropy of the conductivity tensor can lead to misleading inversion results. For synthetic data corresponding to a 3-D model with a TTI anticlinal structure, a standard vertical transverse isotropic (VTI) inversion is not able to image a resistor, while for a 3-D model with a TTI synclinal structure it produces a false resistive anomaly. However, if the VTI forward solver used in the inversion is replaced by the proposed TTI solver with perfect knowledge of the strike and dip of the dipping structures, the resulting resistivity images become consistent with the true models.
Parallel algorithms for placement and routing in VLSI design. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Brouwer, Randall Jay

1991-01-01

The computational requirements for high quality synthesis, analysis, and verification of very large scale integration (VLSI) designs have rapidly increased with the fast growing complexity of these designs. Research in the past has focused on the development of heuristic algorithms, special purpose hardware accelerators, or parallel algorithms for the numerous design tasks to decrease the time required for solution. Two new parallel algorithms are proposed for two VLSI synthesis tasks, standard cell placement and global routing. The first algorithm, a parallel algorithm for global routing, uses hierarchical techniques to decompose the routing problem into independent routing subproblems that are solved in parallel. Results are then presented which compare the routing quality to the results of other published global routers and which evaluate the speedups attained. The second algorithm, a parallel algorithm for cell placement and global routing, hierarchically integrates a quadrisection placement algorithm, a bisection placement algorithm, and the previous global routing algorithm. Unique partitioning techniques are used to decompose the various stages of the algorithm into independent tasks which can be evaluated in parallel. Finally, results are presented which evaluate the various algorithm alternatives and compare the algorithm performance to other placement programs. Measurements are presented on the parallel speedups available.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jing Yanfei, E-mail: yanfeijing@uestc.edu.c; Huang Tingzhu, E-mail: tzhuang@uestc.edu.c; Duan Yong, E-mail: duanyong@yahoo.c

This study is mainly focused on iterative solutions with simple diagonal preconditioning to two complex-valued nonsymmetric systems of linear equations arising from a computational chemistry model problem proposed by Sherry Li of NERSC. Numerical experiments show the feasibility of iterative methods to some extent when applied to the problems and reveal the competitiveness of our recently proposed Lanczos biconjugate A-orthonormalization methods to other classic and popular iterative methods. By the way, experiment results also indicate that application specific preconditioners may be mandatory and required for accelerating convergence.
Adaptive Implicit Non-Equilibrium Radiation Diffusion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Philip, Bobby; Wang, Zhen; Berrill, Mark A

2013-01-01

We describe methods for accurate and efficient long term time integra- tion of non-equilibrium radiation diffusion systems: implicit time integration for effi- cient long term time integration of stiff multiphysics systems, local control theory based step size control to minimize the required global number of time steps while control- ling accuracy, dynamic 3D adaptive mesh refinement (AMR) to minimize memory and computational costs, Jacobian Free Newton-Krylov methods on AMR grids for efficient nonlinear solution, and optimal multilevel preconditioner components that provide level independent solver convergence.

Generalized Preconditioned Locally Harmonic Residual Eigensolver (GPLHR) v0.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

VECHARYNSKI, EUGENE; YANG, CHAO

The software contains a MATLAB implementation of the Generalized Preconditioned Locally Harmonic Residual (GPLHR) method for solving standard and generalized non-Hermitian eigenproblems. The method is particularly useful for computing a subset of eigenvalues, and their eigen- or Schur vectors, closest to a given shift. The proposed method is based on block iterations and can take advantage of a preconditioner if it is available. It does not need to perform exact shift-and-invert transformation. Standard and generalized eigenproblems are handled in a unified framework.
Urban residential greenspace and mental health in youth: Different approaches to testing multiple pathways yield different conclusions.

PubMed

Dzhambov, Angel; Hartig, Terry; Markevych, Iana; Tilov, Boris; Dimitrova, Donka

2018-01-01

Urban greenspace can benefit mental health through multiple mechanisms. They may work together, but previous studies have treated them as independent. We aimed to compare single and parallel mediation models, which estimate the independent contributions of different paths, to several models that posit serial mediation components in the pathway from greenspace to mental health. We collected cross-sectional survey data from 399 participants (15-25 years of age) in the city of Plovdiv, Bulgaria. Objective "exposure" to urban residential greenspace was defined by the Normalized Difference Vegetation Index (NDVI), Soil Adjusted Vegetation Index, tree cover density within the 500-m buffer, and Euclidean distance to the nearest urban greenspace. Self-reported measures of availability, access, quality, and usage of greenspace were also used. Mental health was measured with the General Health Questionnaire. The following potential mediators were considered in single and parallel mediation models: restorative quality of the neighborhood, neighborhood social cohesion, commuting and leisure time physical activity, road traffic noise annoyance, and perceived air pollution. Four models were tested with the following serial mediation components: (1) restorative quality → social cohesion; (2) restorative quality → physical activity; (3) perceived traffic pollution → restorative quality; (4) and noise annoyance → physical activity. There was no direct association between objectively-measured greenspace and mental health. For the 500-m buffer, the tests of the single mediator models suggested that restorative quality mediated the relationship between NDVI and mental health. Tests of parallel mediation models did not find any significant indirect effects. In line with theory, tests of the serial mediation models showed that higher restorative quality was associated with more physical activity and more social cohesion, and in turn with better mental health. As for self-reported greenspace measures, single mediation through restorative quality was significant only for time in greenspace, and there was no mediation though restorative quality in the parallel mediation models; however, serial mediation through restorative quality and social cohesion/physical activity was indicated for all self-reported measures except for greenspace quality. Statistical models should adequately address the theoretically indicated interdependencies between mechanisms underlying association between greenspace and mental health. If such causal relationships hold, testing mediators alone or in parallel may lead to incorrect inferences about the relative contribution of specific paths, and thus to inappropriate intervention strategies. Copyright © 2017 Elsevier Inc. All rights reserved.
Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

PubMed

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-06-01

We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

PubMed Central

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
77 FR 47573 - Approval and Promulgation of Implementation Plans; Mississippi; 110(a)(2)(E)(ii) Infrastructure...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-08-09

... Mississippi Department of Environmental Quality (MDEQ), on July 13, 2012, for parallel processing. This... of Contents I. What is parallel processing? II. Background III. What elements are required under... Executive Order Reviews I. What is parallel processing? Consistent with EPA regulations found at 40 CFR Part...
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control

NASA Astrophysics Data System (ADS)

Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming

2017-09-01

Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
Scalable implicit incompressible resistive MHD with stabilized FE and fully-coupled Newton–Krylov-AMG

DOE PAGES

Shadid, J. N.; Pawlowski, R. P.; Cyr, E. C.; ...

2016-02-10

Here, we discuss that the computational solution of the governing balance equations for mass, momentum, heat transfer and magnetic induction for resistive magnetohydrodynamics (MHD) systems can be extremely challenging. These difficulties arise from both the strong nonlinear, nonsymmetric coupling of fluid and electromagnetic phenomena, as well as the significant range of time- and length-scales that the interactions of these physical mechanisms produce. This paper explores the development of a scalable, fully-implicit stabilized unstructured finite element (FE) capability for 3D incompressible resistive MHD. The discussion considers the development of a stabilized FE formulation in context of the variational multiscale (VMS) method,more » and describes the scalable implicit time integration and direct-to-steady-state solution capability. The nonlinear solver strategy employs Newton–Krylov methods, which are preconditioned using fully-coupled algebraic multilevel preconditioners. These preconditioners are shown to enable a robust, scalable and efficient solution approach for the large-scale sparse linear systems generated by the Newton linearization. Verification results demonstrate the expected order-of-accuracy for the stabilized FE discretization. The approach is tested on a variety of prototype problems, that include MHD duct flows, an unstable hydromagnetic Kelvin–Helmholtz shear layer, and a 3D island coalescence problem used to model magnetic reconnection. Initial results that explore the scaling of the solution methods are also presented on up to 128K processors for problems with up to 1.8B unknowns on a CrayXK7.« less
SymPix: A Spherical Grid for Efficient Sampling of Rotationally Invariant Operators

NASA Astrophysics Data System (ADS)

Seljebotn, D. S.; Eriksen, H. K.

2016-02-01

We present SymPix, a special-purpose spherical grid optimized for efficiently sampling rotationally invariant linear operators. This grid is conceptually similar to the Gauss-Legendre (GL) grid, aligning sample points with iso-latitude rings located on Legendre polynomial zeros. Unlike the GL grid, however, the number of grid points per ring varies as a function of latitude, avoiding expensive oversampling near the poles and ensuring nearly equal sky area per grid point. The ratio between the number of grid points in two neighboring rings is required to be a low-order rational number (3, 2, 1, 4/3, 5/4, or 6/5) to maintain a high degree of symmetries. Our main motivation for this grid is to solve linear systems using multi-grid methods, and to construct efficient preconditioners through pixel-space sampling of the linear operator in question. As a benchmark and representative example, we compute a preconditioner for a linear system that involves the operator \\widehat{{\\boldsymbol{D}}}+{\\widehat{{\\boldsymbol{B}}}}T{{\\boldsymbol{N}}}-1\\widehat{{\\boldsymbol{B}}}, where \\widehat{{\\boldsymbol{B}}} and \\widehat{{\\boldsymbol{D}}} may be described as both local and rotationally invariant operators, and {\\boldsymbol{N}} is diagonal in the pixel domain. For a bandwidth limit of {{\\ell }}{max} = 3000, we find that our new SymPix implementation yields average speed-ups of 360 and 23 for {\\widehat{{\\boldsymbol{B}}}}T{{\\boldsymbol{N}}}-1\\widehat{{\\boldsymbol{B}}} and \\widehat{{\\boldsymbol{D}}}, respectively, compared with the previous state-of-the-art implementation.
Preconditioned steepest descent methods for some nonlinear elliptic equations involving p-Laplacian terms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feng, Wenqiang, E-mail: wfeng1@vols.utk.edu; Salgado, Abner J., E-mail: asalgad1@utk.edu; Wang, Cheng, E-mail: cwang1@umassd.edu

We describe and analyze preconditioned steepest descent (PSD) solvers for fourth and sixth-order nonlinear elliptic equations that include p-Laplacian terms on periodic domains in 2 and 3 dimensions. The highest and lowest order terms of the equations are constant-coefficient, positive linear operators, which suggests a natural preconditioning strategy. Such nonlinear elliptic equations often arise from time discretization of parabolic equations that model various biological and physical phenomena, in particular, liquid crystals, thin film epitaxial growth and phase transformations. The analyses of the schemes involve the characterization of the strictly convex energies associated with the equations. We first give a generalmore » framework for PSD in Hilbert spaces. Based on certain reasonable assumptions of the linear pre-conditioner, a geometric convergence rate is shown for the nonlinear PSD iteration. We then apply the general theory to the fourth and sixth-order problems of interest, making use of Sobolev embedding and regularity results to confirm the appropriateness of our pre-conditioners for the regularized p-Lapacian problems. Our results include a sharper theoretical convergence result for p-Laplacian systems compared to what may be found in existing works. We demonstrate rigorously how to apply the theory in the finite dimensional setting using finite difference discretization methods. Numerical simulations for some important physical application problems – including thin film epitaxy with slope selection and the square phase field crystal model – are carried out to verify the efficiency of the scheme.« less
A Tensor-Train accelerated solver for integral equations in complex geometries

NASA Astrophysics Data System (ADS)

Corona, Eduardo; Rahimian, Abtin; Zorin, Denis

2017-04-01

We present a framework using the Quantized Tensor Train (QTT) decomposition to accurately and efficiently solve volume and boundary integral equations in three dimensions. We describe how the QTT decomposition can be used as a hierarchical compression and inversion scheme for matrices arising from the discretization of integral equations. For a broad range of problems, computational and storage costs of the inversion scheme are extremely modest O (log ⁡ N) and once the inverse is computed, it can be applied in O (Nlog ⁡ N) . We analyze the QTT ranks for hierarchically low rank matrices and discuss its relationship to commonly used hierarchical compression techniques such as FMM and HSS. We prove that the QTT ranks are bounded for translation-invariant systems and argue that this behavior extends to non-translation invariant volume and boundary integrals. For volume integrals, the QTT decomposition provides an efficient direct solver requiring significantly less memory compared to other fast direct solvers. We present results demonstrating the remarkable performance of the QTT-based solver when applied to both translation and non-translation invariant volume integrals in 3D. For boundary integral equations, we demonstrate that using a QTT decomposition to construct preconditioners for a Krylov subspace method leads to an efficient and robust solver with a small memory footprint. We test the QTT preconditioners in the iterative solution of an exterior elliptic boundary value problem (Laplace) formulated as a boundary integral equation in complex, multiply connected geometries.
Preconditioned steepest descent methods for some nonlinear elliptic equations involving p-Laplacian terms

NASA Astrophysics Data System (ADS)

Feng, Wenqiang; Salgado, Abner J.; Wang, Cheng; Wise, Steven M.

2017-04-01

We describe and analyze preconditioned steepest descent (PSD) solvers for fourth and sixth-order nonlinear elliptic equations that include p-Laplacian terms on periodic domains in 2 and 3 dimensions. The highest and lowest order terms of the equations are constant-coefficient, positive linear operators, which suggests a natural preconditioning strategy. Such nonlinear elliptic equations often arise from time discretization of parabolic equations that model various biological and physical phenomena, in particular, liquid crystals, thin film epitaxial growth and phase transformations. The analyses of the schemes involve the characterization of the strictly convex energies associated with the equations. We first give a general framework for PSD in Hilbert spaces. Based on certain reasonable assumptions of the linear pre-conditioner, a geometric convergence rate is shown for the nonlinear PSD iteration. We then apply the general theory to the fourth and sixth-order problems of interest, making use of Sobolev embedding and regularity results to confirm the appropriateness of our pre-conditioners for the regularized p-Lapacian problems. Our results include a sharper theoretical convergence result for p-Laplacian systems compared to what may be found in existing works. We demonstrate rigorously how to apply the theory in the finite dimensional setting using finite difference discretization methods. Numerical simulations for some important physical application problems - including thin film epitaxy with slope selection and the square phase field crystal model - are carried out to verify the efficiency of the scheme.
Parallel Reconstruction Using Null Operations (PRUNO)

PubMed Central

Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

2011-01-01

A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Scalable subsurface inverse modeling of huge data sets with an application to tracer concentration breakthrough data from magnetic resonance imaging

NASA Astrophysics Data System (ADS)

Lee, Jonghyun; Yoon, Hongkyu; Kitanidis, Peter K.; Werth, Charles J.; Valocchi, Albert J.

2016-07-01

Characterizing subsurface properties is crucial for reliable and cost-effective groundwater supply management and contaminant remediation. With recent advances in sensor technology, large volumes of hydrogeophysical and geochemical data can be obtained to achieve high-resolution images of subsurface properties. However, characterization with such a large amount of information requires prohibitive computational costs associated with "big data" processing and numerous large-scale numerical simulations. To tackle such difficulties, the principal component geostatistical approach (PCGA) has been proposed as a "Jacobian-free" inversion method that requires much smaller forward simulation runs for each iteration than the number of unknown parameters and measurements needed in the traditional inversion methods. PCGA can be conveniently linked to any multiphysics simulation software with independent parallel executions. In this paper, we extend PCGA to handle a large number of measurements (e.g., 106 or more) by constructing a fast preconditioner whose computational cost scales linearly with the data size. For illustration, we characterize the heterogeneous hydraulic conductivity (K) distribution in a laboratory-scale 3-D sand box using about 6 million transient tracer concentration measurements obtained using magnetic resonance imaging. Since each individual observation has little information on the K distribution, the data were compressed by the zeroth temporal moment of breakthrough curves, which is equivalent to the mean travel time under the experimental setting. Only about 2000 forward simulations in total were required to obtain the best estimate with corresponding estimation uncertainty, and the estimated K field captured key patterns of the original packing design, showing the efficiency and effectiveness of the proposed method.
Beam quality corrections for parallel-plate ion chambers in electron reference dosimetry

NASA Astrophysics Data System (ADS)

Zink, K.; Wulff, J.

2012-04-01

Current dosimetry protocols (AAPM, IAEA, IPEM, DIN) recommend parallel-plate ionization chambers for dose measurements in clinical electron beams. This study presents detailed Monte Carlo simulations of beam quality correction factors for four different types of parallel-plate chambers: NACP-02, Markus, Advanced Markus and Roos. These chambers differ in constructive details which should have notable impact on the resulting perturbation corrections, hence on the beam quality corrections. The results reveal deviations to the recommended beam quality corrections given in the IAEA TRS-398 protocol in the range of 0%-2% depending on energy and chamber type. For well-guarded chambers, these deviations could be traced back to a non-unity and energy-dependent wall perturbation correction. In the case of the guardless Markus chamber, a nearly energy-independent beam quality correction is resulting as the effects of wall and cavity perturbation compensate each other. For this chamber, the deviations to the recommended values are the largest and may exceed 2%. From calculations of type-B uncertainties including effects due to uncertainties of the underlying cross-sectional data as well as uncertainties due to the chamber material composition and chamber geometry, the overall uncertainty of calculated beam quality correction factors was estimated to be <0.7%. Due to different chamber positioning recommendations given in the national and international dosimetry protocols, an additional uncertainty in the range of 0.2%-0.6% is present. According to the IAEA TRS-398 protocol, the uncertainty in clinical electron dosimetry using parallel-plate ion chambers is 1.7%. This study may help to reduce this uncertainty significantly.
Drug innovation, price controls, and parallel trade.

PubMed

Matteucci, Giorgio; Reverberi, Pierfrancesco

2016-12-21

We study the long-run welfare effects of parallel trade (PT) in pharmaceuticals. We develop a two-country model of PT with endogenous quality, where the pharmaceutical firm negotiates the price of the drug with the government in the foreign country. We show that, even though the foreign government does not consider global R&D costs, (the threat of) PT improves the quality of the drug as long as the foreign consumers' valuation of quality is high enough. We find that the firm's short-run profit may be higher when PT is allowed. Nonetheless, this is neither necessary nor sufficient for improving drug quality in the long run. We also show that improving drug quality is a sufficient condition for PT to increase global welfare. Finally, we show that, when PT is allowed, drug quality may be higher with than without price controls.
A New Coarsening Operator for the Optimal Preconditioning of the Dual and Primal Domain Decomposition Methods: Application to Problems with Severe Coefficient Jumps

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Rixen, Daniel

1996-01-01

We present an optimal preconditioning algorithm that is equally applicable to the dual (FETI) and primal (Balancing) Schur complement domain decomposition methods, and which successfully addresses the problems of subdomain heterogeneities including the effects of large jumps of coefficients. The proposed preconditioner is derived from energy principles and embeds a new coarsening operator that propagates the error globally and accelerates convergence. The resulting iterative solver is illustrated with the solution of highly heterogeneous elasticity problems.
Amélioration des performances d'une implantation parallélovectorielle du gradient conjugué par extension du schéma de stockage matriciel

NASA Astrophysics Data System (ADS)

Magnin, H.; Coulomb, J. L.

1993-03-01

Electromagnetic field computation with the Finite Element (FE) method implies solving of large linear systems of equations. Performances and memory capacities of today computers allow to achieve three-dimensional FE discretizations of electromagnetic problems, but the number of unknowns grows high. So, to improve time to the numerical solution of the linear system(s) thus arising, the use of parallel and/or vector computers has to be envisaged. In this paper, the main constitutive steps of the Pre-conditioned Conjugate Gradient algorithm (PCG) are analysed. After a short recall of our previous work concerning their improvement by use of vector and parallel computations, we show some speedup limitations due to the sparse row-wise matrix storage scheme employed. Then, an extension of this matrix representation is proposed, leading to introduce redundant storage of non-zero coefficients. In spite of the “memory waste” thus implied, it is shown how this extension can be successfully employed to increase the speedup due to parallelism and vectorization on the whole algorithm, and in particular to derive a parallel preconditioner. La résolution par la méthode des éléments finis des équations de l'électromagnétisme conduit à résoudre de grands systèmes d'équations linéaires. Les capacités mémoire et les performances actuelles des systèmes informatiques permettent de traiter les problèmes électromagnétiques par discrétisation tridimensionnelle, mais alors le nombre d'inconnues devient très élevé. Ainsi, la résolution en un temps raisonnable des équations linéaires associées à de telles discrétisations conduit à envisager l'emploi d'ordinateurs à architecture parallèle. Dans cet article, les différentes étapes constitutives de l'algorithme du gradient conjugué préconditionné (GCP) sont analysées. Après un court rappel de nos travaux antérieurs concemant leur amélioration par utilisation de traitements parallèles et vectoriels, nous montrons les limitations du gain de temps dues au mode de stockage matriciel utilisé : la représentation creuse dite “Morse”. Nous proposons alors une extension de ce mode de stockage, conduisant à l'introduction de redondance au niveau du rangement des termes matriciels en mémoire. Malgré le “gaspillage” mémoire ainsi occasionné, il apparait que cette extension peut être mise à profit pour augmenter sensiblement les gains par parallélisation et vectorisation de l'ensemble de l'algorithme du gradient conjugué, et notamment pour la réalisation d'un pré-conditionnement parallèle.
National Centers for Environmental Prediction

Science.gov Websites

Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model Configuration /EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION / DIAGNOSTICS Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes

NASA Technical Reports Server (NTRS)

Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.

1999-01-01

The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Algebraic multigrid preconditioners for two-phase flow in porous media with phase transitions [Algebraic multigrid preconditioners for multiphase flow in porous media with phase transitions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bui, Quan M.; Wang, Lu; Osei-Kuffuor, Daniel

Multiphase flow is a critical process in a wide range of applications, including oil and gas recovery, carbon sequestration, and contaminant remediation. Numerical simulation of multiphase flow requires solving of a large, sparse linear system resulting from the discretization of the partial differential equations modeling the flow. In the case of multiphase multicomponent flow with miscible effect, this is a very challenging task. The problem becomes even more difficult if phase transitions are taken into account. A new approach to handle phase transitions is to formulate the system as a nonlinear complementarity problem (NCP). Unlike in the primary variable switchingmore » technique, the set of primary variables in this approach is fixed even when there is phase transition. Not only does this improve the robustness of the nonlinear solver, it opens up the possibility to use multigrid methods to solve the resulting linear system. The disadvantage of the complementarity approach, however, is that when a phase disappears, the linear system has the structure of a saddle point problem and becomes indefinite, and current algebraic multigrid (AMG) algorithms cannot be applied directly. In this study, we explore the effectiveness of a new multilevel strategy, based on the multigrid reduction technique, to deal with problems of this type. We demonstrate the effectiveness of the method through numerical results for the case of two-phase, two-component flow with phase appearance/disappearance. In conclusion, we also show that the strategy is efficient and scales optimally with problem size.« less

Algebraic multigrid preconditioners for two-phase flow in porous media with phase transitions [Algebraic multigrid preconditioners for multiphase flow in porous media with phase transitions

DOE PAGES

Bui, Quan M.; Wang, Lu; Osei-Kuffuor, Daniel

2018-02-06

Multiphase flow is a critical process in a wide range of applications, including oil and gas recovery, carbon sequestration, and contaminant remediation. Numerical simulation of multiphase flow requires solving of a large, sparse linear system resulting from the discretization of the partial differential equations modeling the flow. In the case of multiphase multicomponent flow with miscible effect, this is a very challenging task. The problem becomes even more difficult if phase transitions are taken into account. A new approach to handle phase transitions is to formulate the system as a nonlinear complementarity problem (NCP). Unlike in the primary variable switchingmore » technique, the set of primary variables in this approach is fixed even when there is phase transition. Not only does this improve the robustness of the nonlinear solver, it opens up the possibility to use multigrid methods to solve the resulting linear system. The disadvantage of the complementarity approach, however, is that when a phase disappears, the linear system has the structure of a saddle point problem and becomes indefinite, and current algebraic multigrid (AMG) algorithms cannot be applied directly. In this study, we explore the effectiveness of a new multilevel strategy, based on the multigrid reduction technique, to deal with problems of this type. We demonstrate the effectiveness of the method through numerical results for the case of two-phase, two-component flow with phase appearance/disappearance. In conclusion, we also show that the strategy is efficient and scales optimally with problem size.« less
Feasibility and its characteristics of CO2 laser micromachining-based PMMA anti-scattering grid estimated by MCNP code simulation.

PubMed

Bae, Jun Woo; Kim, Hee Reyoung

2018-01-01

Anti-scattering grid has been used to improve the image quality. However, applying a commonly used linear or parallel grid would cause image distortion, and focusing grid also requires a precise fabrication technology, which is expensive. To investigate and analyze whether using CO2 laser micromachining-based PMMA anti-scattering grid can improve the performance of the grid at a lower cost. Thus, improvement of grid performance would result in improvement of image quality. The cross-sectional shape of CO2 laser machined PMMA is similar to alphabet 'V'. The performance was characterized by contrast improvement factor (CIF) and Bucky. Four types of grid were tested, which include thin parallel, thick parallel, 'V'-type and 'inverse V'-type of grid. For a Bucky factor of 2.1, the CIF of the grid with both the "V" and inverse "V" had a value of 1.53, while the thick and thick parallel types had values of 1.43 and 1.65, respectively. The 'V' shape grid manufacture by CO2 laser micromachining showed higher CIF than parallel one, which had same shielding material channel width. It was thought that the 'V' shape grid would be replacement to the conventional parallel grid if it is hard to fabricate the high-aspect-ratio grid.
Design of k-Space Channel Combination Kernels and Integration with Parallel Imaging

PubMed Central

Beatty, Philip J.; Chang, Shaorong; Holmes, James H.; Wang, Kang; Brau, Anja C. S.; Reeder, Scott B.; Brittain, Jean H.

2014-01-01

Purpose In this work, a new method is described for producing local k-space channel combination kernels using a small amount of low-resolution multichannel calibration data. Additionally, this work describes how these channel combination kernels can be combined with local k-space unaliasing kernels produced by the calibration phase of parallel imaging methods such as GRAPPA, PARS and ARC. Methods Experiments were conducted to evaluate both the image quality and computational efficiency of the proposed method compared to a channel-by-channel parallel imaging approach with image-space sum-of-squares channel combination. Results Results indicate comparable image quality overall, with some very minor differences seen in reduced field-of-view imaging. It was demonstrated that this method enables a speed up in computation time on the order of 3–16X for 32-channel data sets. Conclusion The proposed method enables high quality channel combination to occur earlier in the reconstruction pipeline, reducing computational and memory requirements for image reconstruction. PMID:23943602
Advanced fabrication of Si nanowire FET structures by means of a parallel approach.

PubMed

Li, J; Pud, S; Mayer, D; Vitusevich, S

2014-07-11

In this paper we present fabricated Si nanowires (NWs) of different dimensions with enhanced electrical characteristics. The parallel fabrication process is based on nanoimprint lithography using high-quality molds, which facilitates the realization of 50 nm-wide NW field-effect transistors (FETs). The imprint molds were fabricated by using a wet chemical anisotropic etching process. The wet chemical etch results in well-defined vertical sidewalls with edge roughness (3σ) as small as 2 nm, which is about four times better compared with the roughness usually obtained for reactive-ion etching molds. The quality of the mold was studied using atomic force microscopy and scanning electron microscopy image data. The use of the high-quality mold leads to almost 100% yield during fabrication of Si NW FETs as well as to an exceptional quality of the surfaces of the devices produced. To characterize the Si NW FETs, we used noise spectroscopy as a powerful method for evaluating device performance and the reliability of structures with nanoscale dimensions. The Hooge parameter of fabricated FET structures exhibits an average value of 1.6 × 10(-3). This value reflects the high quality of Si NW FETs fabricated by means of a parallel approach that uses a nanoimprint mold and cost-efficient technology.
Unstable Resonator Optical Parametric Oscillator Based on Quasi-Phase-Matched RbTiOAsO(4).

PubMed

Hansson, G; Karlsson, H; Laurell, F

2001-10-20

We demonstrate improved signal and idler-beam quality of a 3-mm-aperture quasi-phase-matched RbTiOAsO(4) optical parametric oscillator through use of a confocal unstable resonator as compared with a plane-parallel resonator. Both oscillators were singly resonant, and the periodically poled RbTiOAsO(4) crystal generated a signal at 1.56 mum and an idler at 3.33 mum when pumped at 1.064 mum. We compared the beam quality produced by the 1.2-magnification confocal unstable resonator with the beam quality produced by the plane-parallel resonator by measuring the signal and the idler beam M(2) value. We also investigated the effect of pump-beam intensity distribution by comparing the result of a Gaussian and a top-hat intensity profile pump beam. We generated a signal beam of M(2) approximately 7 and an idler beam of M(2) approximately 2.5 through use of an unstable resonator and a Gaussian intensity profile pump beam. This corresponds to an increase of a factor of approximately 2 in beam quality for the signal and a factor of 3 for the idler, compared with the beam quality of the plane-parallel resonator optical parametric oscillator.
Inspection criteria ensure quality control of parallel gap soldering

NASA Technical Reports Server (NTRS)

Burka, J. A.

1968-01-01

Investigation of parallel gap soldering of electrical leads resulted in recommendation on material preparation, equipment, process control, and visual inspection criteria to ensure reliable solder joints. The recommendations will minimize problems in heat-dwell time, amount of solder, bridging conductors, and damage of circuitry.
Optimizing a realistic large-scale frequency assignment problem using a new parallel evolutionary approach

NASA Astrophysics Data System (ADS)

Chaves-González, José M.; Vega-Rodríguez, Miguel A.; Gómez-Pulido, Juan A.; Sánchez-Pérez, Juan M.

2011-08-01

This article analyses the use of a novel parallel evolutionary strategy to solve complex optimization problems. The work developed here has been focused on a relevant real-world problem from the telecommunication domain to verify the effectiveness of the approach. The problem, known as frequency assignment problem (FAP), basically consists of assigning a very small number of frequencies to a very large set of transceivers used in a cellular phone network. Real data FAP instances are very difficult to solve due to the NP-hard nature of the problem, therefore using an efficient parallel approach which makes the most of different evolutionary strategies can be considered as a good way to obtain high-quality solutions in short periods of time. Specifically, a parallel hyper-heuristic based on several meta-heuristics has been developed. After a complete experimental evaluation, results prove that the proposed approach obtains very high-quality solutions for the FAP and beats any other result published.
Electrical Resistivity Tomography using a finite element based BFGS algorithm with algebraic multigrid preconditioning

NASA Astrophysics Data System (ADS)

Codd, A. L.; Gross, L.

2018-03-01

We present a new inversion method for Electrical Resistivity Tomography which, in contrast to established approaches, minimizes the cost function prior to finite element discretization for the unknown electric conductivity and electric potential. Minimization is performed with the Broyden-Fletcher-Goldfarb-Shanno method (BFGS) in an appropriate function space. BFGS is self-preconditioning and avoids construction of the dense Hessian which is the major obstacle to solving large 3-D problems using parallel computers. In addition to the forward problem predicting the measurement from the injected current, the so-called adjoint problem also needs to be solved. For this problem a virtual current is injected through the measurement electrodes and an adjoint electric potential is obtained. The magnitude of the injected virtual current is equal to the misfit at the measurement electrodes. This new approach has the advantage that the solution process of the optimization problem remains independent to the meshes used for discretization and allows for mesh adaptation during inversion. Computation time is reduced by using superposition of pole loads for the forward and adjoint problems. A smoothed aggregation algebraic multigrid (AMG) preconditioned conjugate gradient is applied to construct the potentials for a given electric conductivity estimate and for constructing a first level BFGS preconditioner. Through the additional reuse of AMG operators and coarse grid solvers inversion time for large 3-D problems can be reduced further. We apply our new inversion method to synthetic survey data created by the resistivity profile representing the characteristics of subsurface fluid injection. We further test it on data obtained from a 2-D surface electrode survey on Heron Island, a small tropical island off the east coast of central Queensland, Australia.
Scalable subsurface inverse modeling of huge data sets with an application to tracer concentration breakthrough data from magnetic resonance imaging

DOE PAGES

Lee, Jonghyun; Yoon, Hongkyu; Kitanidis, Peter K.; ...

2016-06-09

When characterizing subsurface properties is crucial for reliable and cost-effective groundwater supply management and contaminant remediation. With recent advances in sensor technology, large volumes of hydro-geophysical and geochemical data can be obtained to achieve high-resolution images of subsurface properties. However, characterization with such a large amount of information requires prohibitive computational costs associated with “big data” processing and numerous large-scale numerical simulations. To tackle such difficulties, the Principal Component Geostatistical Approach (PCGA) has been proposed as a “Jacobian-free” inversion method that requires much smaller forward simulation runs for each iteration than the number of unknown parameters and measurements needed inmore » the traditional inversion methods. PCGA can be conveniently linked to any multi-physics simulation software with independent parallel executions. In our paper, we extend PCGA to handle a large number of measurements (e.g. 106 or more) by constructing a fast preconditioner whose computational cost scales linearly with the data size. For illustration, we characterize the heterogeneous hydraulic conductivity (K) distribution in a laboratory-scale 3-D sand box using about 6 million transient tracer concentration measurements obtained using magnetic resonance imaging. Since each individual observation has little information on the K distribution, the data was compressed by the zero-th temporal moment of breakthrough curves, which is equivalent to the mean travel time under the experimental setting. Moreover, only about 2,000 forward simulations in total were required to obtain the best estimate with corresponding estimation uncertainty, and the estimated K field captured key patterns of the original packing design, showing the efficiency and effectiveness of the proposed method. This article is protected by copyright. All rights reserved.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Jonghyun; Yoon, Hongkyu; Kitanidis, Peter K.

When characterizing subsurface properties is crucial for reliable and cost-effective groundwater supply management and contaminant remediation. With recent advances in sensor technology, large volumes of hydro-geophysical and geochemical data can be obtained to achieve high-resolution images of subsurface properties. However, characterization with such a large amount of information requires prohibitive computational costs associated with “big data” processing and numerous large-scale numerical simulations. To tackle such difficulties, the Principal Component Geostatistical Approach (PCGA) has been proposed as a “Jacobian-free” inversion method that requires much smaller forward simulation runs for each iteration than the number of unknown parameters and measurements needed inmore » the traditional inversion methods. PCGA can be conveniently linked to any multi-physics simulation software with independent parallel executions. In our paper, we extend PCGA to handle a large number of measurements (e.g. 106 or more) by constructing a fast preconditioner whose computational cost scales linearly with the data size. For illustration, we characterize the heterogeneous hydraulic conductivity (K) distribution in a laboratory-scale 3-D sand box using about 6 million transient tracer concentration measurements obtained using magnetic resonance imaging. Since each individual observation has little information on the K distribution, the data was compressed by the zero-th temporal moment of breakthrough curves, which is equivalent to the mean travel time under the experimental setting. Moreover, only about 2,000 forward simulations in total were required to obtain the best estimate with corresponding estimation uncertainty, and the estimated K field captured key patterns of the original packing design, showing the efficiency and effectiveness of the proposed method. This article is protected by copyright. All rights reserved.« less
Application of PDSLin to the magnetic reconnection problem

NASA Astrophysics Data System (ADS)

Yuan, Xuefei; Li, Xiaoye S.; Yamazaki, Ichitaro; Jardin, Stephen C.; Koniges, Alice E.; Keyes, David E.

2013-01-01

Magnetic reconnection is a fundamental process in a magnetized plasma at both low and high magnetic Lundquist numbers (the ratio of the resistive diffusion time to the Alfvén wave transit time), which occurs in a wide variety of laboratory and space plasmas, e.g. magnetic fusion experiments, the solar corona and the Earth's magnetotail. An implicit time advance for the two-fluid magnetic reconnection problem is known to be difficult because of the large condition number of the associated matrix. This is especially troublesome when the collisionless ion skin depth is large so that the Whistler waves, which cause the fast reconnection, dominate the physics (Yuan et al 2012 J. Comput. Phys. 231 5822-53). For small system sizes, a direct solver such as SuperLU can be employed to obtain an accurate solution as long as the condition number is bounded by the reciprocal of the floating-point machine precision. However, SuperLU scales effectively only to hundreds of processors or less. For larger system sizes, it has been shown that physics-based (Chacón and Knoll 2003 J. Comput. Phys. 188 573-92) or other preconditioners can be applied to provide adequate solver performance. In recent years, we have been developing a new algebraic hybrid linear solver, PDSLin (Parallel Domain decomposition Schur complement-based Linear solver) (Yamazaki and Li 2010 Proc. VECPAR pp 421-34 and Yamazaki et al 2011 Technical Report). In this work, we compare numerical results from a direct solver and the proposed hybrid solver for the magnetic reconnection problem and demonstrate that the new hybrid solver is scalable to thousands of processors while maintaining the same robustness as a direct solver.
Identification of immiscible NAPL contaminant sources in aquifers by a modified two-level saturation based imperialist competitive algorithm

NASA Astrophysics Data System (ADS)

Ghafouri, H. R.; Mosharaf-Dehkordi, M.; Afzalan, B.

2017-07-01

A simulation-optimization model is proposed for identifying the characteristics of local immiscible NAPL contaminant sources inside aquifers. This model employs the UTCHEM 9.0 software as its simulator for solving the governing equations associated with the multi-phase flow in porous media. As the optimization model, a novel two-level saturation based Imperialist Competitive Algorithm (ICA) is proposed to estimate the parameters of contaminant sources. The first level consists of three parallel independent ICAs and plays as a pre-conditioner for the second level which is a single modified ICA. The ICA in the second level is modified by dividing each country into a number of provinces (smaller parts). Similar to countries in the classical ICA, these provinces are optimized by the assimilation, competition, and revolution steps in the ICA. To increase the diversity of populations, a new approach named knock the base method is proposed. The performance and accuracy of the simulation-optimization model is assessed by solving a set of two and three-dimensional problems considering the effects of different parameters such as the grid size, rock heterogeneity and designated monitoring networks. The obtained numerical results indicate that using this simulation-optimization model provides accurate results at a less number of iterations when compared with the model employing the classical one-level ICA. A model is proposed to identify characteristics of immiscible NAPL contaminant sources. The contaminant is immiscible in water and multi-phase flow is simulated. The model is a multi-level saturation-based optimization algorithm based on ICA. Each answer string in second level is divided into a set of provinces. Each ICA is modified by incorporating a new knock the base model.
Final Report for "Implimentation and Evaluation of Multigrid Linear Solvers into Extended Magnetohydrodynamic Codes for Petascale Computing"

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srinath Vadlamani; Scott Kruger; Travis Austin

Extended magnetohydrodynamic (MHD) codes are used to model the large, slow-growing instabilities that are projected to limit the performance of International Thermonuclear Experimental Reactor (ITER). The multiscale nature of the extended MHD equations requires an implicit approach. The current linear solvers needed for the implicit algorithm scale poorly because the resultant matrices are so ill-conditioned. A new solver is needed, especially one that scales to the petascale. The most successful scalable parallel processor solvers to date are multigrid solvers. Applying multigrid techniques to a set of equations whose fundamental modes are dispersive waves is a promising solution to CEMM problems.more » For the Phase 1, we implemented multigrid preconditioners from the HYPRE project of the Center for Applied Scientific Computing at LLNL via PETSc of the DOE SciDAC TOPS for the real matrix systems of the extended MHD code NIMROD which is a one of the primary modeling codes of the OFES-funded Center for Extended Magnetohydrodynamic Modeling (CEMM) SciDAC. We implemented the multigrid solvers on the fusion test problem that allows for real matrix systems with success, and in the process learned about the details of NIMROD data structures and the difficulties of inverting NIMROD operators. The further success of this project will allow for efficient usage of future petascale computers at the National Leadership Facilities: Oak Ridge National Laboratory, Argonne National Laboratory, and National Energy Research Scientific Computing Center. The project will be a collaborative effort between computational plasma physicists and applied mathematicians at Tech-X Corporation, applied mathematicians Front Range Scientific Computations, Inc. (who are collaborators on the HYPRE project), and other computational plasma physicists involved with the CEMM project.« less
Conjugate gradient coupled with multigrid for an indefinite problem

NASA Technical Reports Server (NTRS)

Gozani, J.; Nachshon, A.; Turkel, E.

1984-01-01

An iterative algorithm for the Helmholtz equation is presented. This scheme was based on the preconditioned conjugate gradient method for the normal equations. The preconditioning is one cycle of a multigrid method for the discrete Laplacian. The smoothing algorithm is red-black Gauss-Seidel and is constructed so it is a symmetric operator. The total number of iterations needed by the algorithm is independent of h. By varying the number of grids, the number of iterations depends only weakly on k when k(3)h(2) is constant. Comparisons with a SSOR preconditioner are presented.
Rapidly converging multigrid reconstruction of cone-beam tomographic data

NASA Astrophysics Data System (ADS)

Myers, Glenn R.; Kingston, Andrew M.; Latham, Shane J.; Recur, Benoit; Li, Thomas; Turner, Michael L.; Beeching, Levi; Sheppard, Adrian P.

2016-10-01

In the context of large-angle cone-beam tomography (CBCT), we present a practical iterative reconstruction (IR) scheme designed for rapid convergence as required for large datasets. The robustness of the reconstruction is provided by the "space-filling" source trajectory along which the experimental data is collected. The speed of convergence is achieved by leveraging the highly isotropic nature of this trajectory to design an approximate deconvolution filter that serves as a pre-conditioner in a multi-grid scheme. We demonstrate this IR scheme for CBCT and compare convergence to that of more traditional techniques.
Performance issues for iterative solvers in device simulation

NASA Technical Reports Server (NTRS)

Fan, Qing; Forsyth, P. A.; Mcmacken, J. R. F.; Tang, Wei-Pai

1994-01-01

Due to memory limitations, iterative methods have become the method of choice for large scale semiconductor device simulation. However, it is well known that these methods still suffer from reliability problems. The linear systems which appear in numerical simulation of semiconductor devices are notoriously ill-conditioned. In order to produce robust algorithms for practical problems, careful attention must be given to many implementation issues. This paper concentrates on strategies for developing robust preconditioners. In addition, effective data structures and convergence check issues are also discussed. These algorithms are compared with a standard direct sparse matrix solver on a variety of problems.
76 FR 2853 - Approval and Promulgation of Air Quality Implementation Plans; Delaware; Infrastructure State...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-01-18

... technical analysis submitted for parallel-processing by DNREC on December 9, 2010, to address significant... technical analysis submitted by DNREC for parallel-processing on December 9, 2010, to satisfy the... consists of a technical analysis that provides detailed support for Delaware's position that it has...
National Centers for Environmental Prediction

Science.gov Websites

Reference List Table of Contents NCEP OPERATIONAL MODEL FORECAST GRAPHICS PARALLEL/EXPERIMENTAL MODEL Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS VERIFICATION (GRID VS.OBS) WEB PAGE (NCEP EXPERIMENTAL PAGE, INTERNAL USE ONLY) Interactive web page tool for
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Hui; Shi, Yanjun

A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate amore » pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.« less
Studying Air Quality with Data from the Internet.

ERIC Educational Resources Information Center

Salter, Leo; Parsons, Barbara

2000-01-01

Explains how the internet can be used between institutions for parallel research opportunities. Uses air quality data to examine the relationship between traffic flow and atmospheric particulate matter (PM) values. (Author/YDS)

Use of the preconditioned conjugate gradient algorithm as a generic solver for mixed-model equations in animal breeding applications.

PubMed

Tsuruta, S; Misztal, I; Strandén, I

2001-05-01

Utility of the preconditioned conjugate gradient algorithm with a diagonal preconditioner for solving mixed-model equations in animal breeding applications was evaluated with 16 test problems. The problems included single- and multiple-trait analyses, with data on beef, dairy, and swine ranging from small examples to national data sets. Multiple-trait models considered low and high genetic correlations. Convergence was based on relative differences between left- and right-hand sides. The ordering of equations was fixed effects followed by random effects, with no special ordering within random effects. The preconditioned conjugate gradient program implemented with double precision converged for all models. However, when implemented in single precision, the preconditioned conjugate gradient algorithm did not converge for seven large models. The preconditioned conjugate gradient and successive overrelaxation algorithms were subsequently compared for 13 of the test problems. The preconditioned conjugate gradient algorithm was easy to implement with the iteration on data for general models. However, successive overrelaxation requires specific programming for each set of models. On average, the preconditioned conjugate gradient algorithm converged in three times fewer rounds of iteration than successive overrelaxation. With straightforward implementations, programs using the preconditioned conjugate gradient algorithm may be two or more times faster than those using successive overrelaxation. However, programs using the preconditioned conjugate gradient algorithm would use more memory than would comparable implementations using successive overrelaxation. Extensive optimization of either algorithm can influence rankings. The preconditioned conjugate gradient implemented with iteration on data, a diagonal preconditioner, and in double precision may be the algorithm of choice for solving mixed-model equations when sufficient memory is available and ease of implementation is essential.
Direct reconstruction of cardiac PET kinetic parametric images using a preconditioned conjugate gradient approach

PubMed Central

Rakvongthai, Yothin; Ouyang, Jinsong; Guerin, Bastien; Li, Quanzheng; Alpert, Nathaniel M.; El Fakhri, Georges

2013-01-01

Purpose: Our research goal is to develop an algorithm to reconstruct cardiac positron emission tomography (PET) kinetic parametric images directly from sinograms and compare its performance with the conventional indirect approach. Methods: Time activity curves of a NCAT phantom were computed according to a one-tissue compartmental kinetic model with realistic kinetic parameters. The sinograms at each time frame were simulated using the activity distribution for the time frame. The authors reconstructed the parametric images directly from the sinograms by optimizing a cost function, which included the Poisson log-likelihood and a spatial regularization terms, using the preconditioned conjugate gradient (PCG) algorithm with the proposed preconditioner. The proposed preconditioner is a diagonal matrix whose diagonal entries are the ratio of the parameter and the sensitivity of the radioactivity associated with parameter. The authors compared the reconstructed parametric images using the direct approach with those reconstructed using the conventional indirect approach. Results: At the same bias, the direct approach yielded significant relative reduction in standard deviation by 12%–29% and 32%–70% for 50 × 106 and 10 × 106 detected coincidences counts, respectively. Also, the PCG method effectively reached a constant value after only 10 iterations (with numerical convergence achieved after 40–50 iterations), while more than 500 iterations were needed for CG. Conclusions: The authors have developed a novel approach based on the PCG algorithm to directly reconstruct cardiac PET parametric images from sinograms, and yield better estimation of kinetic parameters than the conventional indirect approach, i.e., curve fitting of reconstructed images. The PCG method increases the convergence rate of reconstruction significantly as compared to the conventional CG method. PMID:24089922
Numerical Technology for Large-Scale Computational Electromagnetics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharpe, R; Champagne, N; White, D

The key bottleneck of implicit computational electromagnetics tools for large complex geometries is the solution of the resulting linear system of equations. The goal of this effort was to research and develop critical numerical technology that alleviates this bottleneck for large-scale computational electromagnetics (CEM). The mathematical operators and numerical formulations used in this arena of CEM yield linear equations that are complex valued, unstructured, and indefinite. Also, simultaneously applying multiple mathematical modeling formulations to different portions of a complex problem (hybrid formulations) results in a mixed structure linear system, further increasing the computational difficulty. Typically, these hybrid linear systems aremore » solved using a direct solution method, which was acceptable for Cray-class machines but does not scale adequately for ASCI-class machines. Additionally, LLNL's previously existing linear solvers were not well suited for the linear systems that are created by hybrid implicit CEM codes. Hence, a new approach was required to make effective use of ASCI-class computing platforms and to enable the next generation design capabilities. Multiple approaches were investigated, including the latest sparse-direct methods developed by our ASCI collaborators. In addition, approaches that combine domain decomposition (or matrix partitioning) with general-purpose iterative methods and special purpose pre-conditioners were investigated. Special-purpose pre-conditioners that take advantage of the structure of the matrix were adapted and developed based on intimate knowledge of the matrix properties. Finally, new operator formulations were developed that radically improve the conditioning of the resulting linear systems thus greatly reducing solution time. The goal was to enable the solution of CEM problems that are 10 to 100 times larger than our previous capability.« less
Direct reconstruction of cardiac PET kinetic parametric images using a preconditioned conjugate gradient approach.

PubMed

Rakvongthai, Yothin; Ouyang, Jinsong; Guerin, Bastien; Li, Quanzheng; Alpert, Nathaniel M; El Fakhri, Georges

2013-10-01

Our research goal is to develop an algorithm to reconstruct cardiac positron emission tomography (PET) kinetic parametric images directly from sinograms and compare its performance with the conventional indirect approach. Time activity curves of a NCAT phantom were computed according to a one-tissue compartmental kinetic model with realistic kinetic parameters. The sinograms at each time frame were simulated using the activity distribution for the time frame. The authors reconstructed the parametric images directly from the sinograms by optimizing a cost function, which included the Poisson log-likelihood and a spatial regularization terms, using the preconditioned conjugate gradient (PCG) algorithm with the proposed preconditioner. The proposed preconditioner is a diagonal matrix whose diagonal entries are the ratio of the parameter and the sensitivity of the radioactivity associated with parameter. The authors compared the reconstructed parametric images using the direct approach with those reconstructed using the conventional indirect approach. At the same bias, the direct approach yielded significant relative reduction in standard deviation by 12%-29% and 32%-70% for 50 × 10(6) and 10 × 10(6) detected coincidences counts, respectively. Also, the PCG method effectively reached a constant value after only 10 iterations (with numerical convergence achieved after 40-50 iterations), while more than 500 iterations were needed for CG. The authors have developed a novel approach based on the PCG algorithm to directly reconstruct cardiac PET parametric images from sinograms, and yield better estimation of kinetic parameters than the conventional indirect approach, i.e., curve fitting of reconstructed images. The PCG method increases the convergence rate of reconstruction significantly as compared to the conventional CG method.
Comparison of an algebraic multigrid algorithm to two iterative solvers used for modeling ground water flow and transport

USGS Publications Warehouse

Detwiler, R.L.; Mehl, S.; Rajaram, H.; Cheung, W.W.

2002-01-01

Numerical solution of large-scale ground water flow and transport problems is often constrained by the convergence behavior of the iterative solvers used to solve the resulting systems of equations. We demonstrate the ability of an algebraic multigrid algorithm (AMG) to efficiently solve the large, sparse systems of equations that result from computational models of ground water flow and transport in large and complex domains. Unlike geometric multigrid methods, this algorithm is applicable to problems in complex flow geometries, such as those encountered in pore-scale modeling of two-phase flow and transport. We integrated AMG into MODFLOW 2000 to compare two- and three-dimensional flow simulations using AMG to simulations using PCG2, a preconditioned conjugate gradient solver that uses the modified incomplete Cholesky preconditioner and is included with MODFLOW 2000. CPU times required for convergence with AMG were up to 140 times faster than those for PCG2. The cost of this increased speed was up to a nine-fold increase in required random access memory (RAM) for the three-dimensional problems and up to a four-fold increase in required RAM for the two-dimensional problems. We also compared two-dimensional numerical simulations of steady-state transport using AMG and the generalized minimum residual method with an incomplete LU-decomposition preconditioner. For these transport simulations, AMG yielded increased speeds of up to 17 times with only a 20% increase in required RAM. The ability of AMG to solve flow and transport problems in large, complex flow systems and its ready availability make it an ideal solver for use in both field-scale and pore-scale modeling.
Vortex phase diagram of the layered superconductor Cu0.03TaS2 for H \\parallel c

NASA Astrophysics Data System (ADS)

Zhu, X. D.; Lu, J. C.; Sun, Y. P.; Pi, L.; Qu, Z.; Ling, L. S.; Yang, Z. R.; Zhang, Y. H.

2010-12-01

The magnetization and anisotropic electrical transport properties have been measured in high quality Cu0.03TaS2 single crystals. A pronounced peak effect has been observed, indicating that high quality and homogeneity are vital to the peak effect. A kink has been observed in the magnetic field, H, dependence of the in-plane resistivity ρab for H\\parallel c , which corresponds to a transition from activated to diffusive behavior of the vortex liquid phase. In the diffusive regime of the vortex liquid phase, the in-plane resistivity ρab is proportional to H0.3, which does not follow the Bardeen-Stephen law for free flux flow. Finally, a simplified vortex phase diagram of Cu0.03TaS2 for H \\parallel c is given.
Renal magnetic resonance angiography at 3.0 Tesla using a 32-element phased-array coil system and parallel imaging in 2 directions.

PubMed

Fenchel, Michael; Nael, Kambiz; Deshpande, Vibhas S; Finn, J Paul; Kramer, Ulrich; Miller, Stephan; Ruehm, Stefan; Laub, Gerhard

2006-09-01

The aim of the present study was to assess the feasibility of renal magnetic resonance angiography at 3.0 T using a phased-array coil system with 32-coil elements. Specifically, high parallel imaging factors were used for an increased spatial resolution and anatomic coverage of the whole abdomen. Signal-to-noise values and the g-factor distribution of the 32 element coil were examined in phantom studies for the magnetic resonance angiography (MRA) sequence. Eleven volunteers (6 men, median age of 30.0 years) were examined on a 3.0-T MR scanner (Magnetom Trio, Siemens Medical Solutions, Malvern, PA) using a 32-element phased-array coil (prototype from In vivo Corp.). Contrast-enhanced 3D-MRA (TR 2.95 milliseconds, TE 1.12 milliseconds, flip angle 25-30 degrees , bandwidth 650 Hz/pixel) was acquired with integrated generalized autocalibrating partially parallel acquisition (GRAPPA), in both phase- and slice-encoding direction. Images were assessed by 2 independent observers with regard to image quality, noise and presence of artifacts. Signal-to-noise levels of 22.2 +/- 22.0 and 57.9 +/- 49.0 were measured with (GRAPPAx6) and without parallel-imaging, respectively. The mean g-factor of the 32-element coil for GRAPPA with an acceleration of 3 and 2 in the phase-encoding and slice-encoding direction, respectively, was 1.61. High image quality was found in 9 of 11 volunteers (2.6 +/- 0.8) with good overall interobserver agreement (k = 0.87). Relatively low image quality with higher noise levels were encountered in 2 volunteers. MRA at 3.0 T using a 32-element phased-array coil is feasible in healthy volunteers. High diagnostic image quality and extended anatomic coverage could be achieved with application of high parallel imaging factors.
Random number generators for large-scale parallel Monte Carlo simulations on FPGA

NASA Astrophysics Data System (ADS)

Lin, Y.; Wang, F.; Liu, B.

2018-05-01

Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Efficient conjugate gradient algorithms for computation of the manipulator forward dynamics

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1989-01-01

The applicability of conjugate gradient algorithms for computation of the manipulator forward dynamics is investigated. The redundancies in the previously proposed conjugate gradient algorithm are analyzed. A new version is developed which, by avoiding these redundancies, achieves a significantly greater efficiency. A preconditioned conjugate gradient algorithm is also presented. A diagonal matrix whose elements are the diagonal elements of the inertia matrix is proposed as the preconditioner. In order to increase the computational efficiency, an algorithm is developed which exploits the synergism between the computation of the diagonal elements of the inertia matrix and that required by the conjugate gradient algorithm.
Stochastic Galerkin methods for the steady-state Navier–Stokes equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sousedík, Bedřich, E-mail: sousedik@umbc.edu; Elman, Howard C., E-mail: elman@cs.umd.edu

2016-07-01

We study the steady-state Navier–Stokes equations in the context of stochastic finite element discretizations. Specifically, we assume that the viscosity is a random field given in the form of a generalized polynomial chaos expansion. For the resulting stochastic problem, we formulate the model and linearization schemes using Picard and Newton iterations in the framework of the stochastic Galerkin method, and we explore properties of the resulting stochastic solutions. We also propose a preconditioner for solving the linear systems of equations arising at each step of the stochastic (Galerkin) nonlinear iteration and demonstrate its effectiveness for solving a set of benchmarkmore » problems.« less
Efficient solution of the simplified P N equations

DOE PAGES

Hamilton, Steven P.; Evans, Thomas M.

2014-12-23

We show new solver strategies for the multigroup SPN equations for nuclear reactor analysis. By forming the complete matrix over space, moments, and energy a robust set of solution strategies may be applied. Moreover, power iteration, shifted power iteration, Rayleigh quotient iteration, Arnoldi's method, and a generalized Davidson method, each using algebraic and physics-based multigrid preconditioners, have been compared on C5G7 MOX test problem as well as an operational PWR model. These results show that the most ecient approach is the generalized Davidson method, that is 30-40 times faster than traditional power iteration and 6-10 times faster than Arnoldi's method.
Stochastic Galerkin methods for the steady-state Navier–Stokes equations

DOE PAGES

Sousedík, Bedřich; Elman, Howard C.

2016-04-12

We study the steady-state Navier–Stokes equations in the context of stochastic finite element discretizations. Specifically, we assume that the viscosity is a random field given in the form of a generalized polynomial chaos expansion. For the resulting stochastic problem, we formulate the model and linearization schemes using Picard and Newton iterations in the framework of the stochastic Galerkin method, and we explore properties of the resulting stochastic solutions. We also propose a preconditioner for solving the linear systems of equations arising at each step of the stochastic (Galerkin) nonlinear iteration and demonstrate its effectiveness for solving a set of benchmarkmore » problems.« less
Domain Decomposition By the Advancing-Partition Method for Parallel Unstructured Grid Generation

NASA Technical Reports Server (NTRS)

Pirzadeh, Shahyar Z.; Zagaris, George

2009-01-01

A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
Vortex phase diagram of the layered superconductor Cu0.03TaS2 for H is parallel to c.

PubMed

Zhu, X D; Lu, J C; Sun, Y P; Pi, L; Qu, Z; Ling, L S; Yang, Z R; Zhang, Y H

2010-12-22

The magnetization and anisotropic electrical transport properties have been measured in high quality Cu(0.03)TaS(2) single crystals. A pronounced peak effect has been observed, indicating that high quality and homogeneity are vital to the peak effect. A kink has been observed in the magnetic field, H, dependence of the in-plane resistivity ρ(ab) for H is parallel to c, which corresponds to a transition from activated to diffusive behavior of the vortex liquid phase. In the diffusive regime of the vortex liquid phase, the in-plane resistivity ρ(ab) is proportional to H(0.3), which does not follow the Bardeen-Stephen law for free flux flow. Finally, a simplified vortex phase diagram of Cu(0.03)TaS(2) for H is parallel to c is given.
Power transformation for enhancing responsiveness of quality of life questionnaire.

PubMed

Zhou, YanYan Ange

2015-01-01

We investigate the effect of power transformation of raw scores on the responsiveness of quality of life survey. The procedure maximizes the paired t-test value on the power transformed data to obtain an optimal power range. The parallel between the Box-Cox transformation is also investigated for the quality of life data.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms

NASA Astrophysics Data System (ADS)

Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

2012-03-01

Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

ERIC Educational Resources Information Center

Papenberg, Martin; Musch, Jochen

2017-01-01

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Computer-aided programming for message-passing system; Problems and a solution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, M.Y.; Gajski, D.D.

1989-12-01

As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
Quality of Education, Comparability, and Assessment Choice in Developing Countries

ERIC Educational Resources Information Center

Wagner, Daniel A.

2010-01-01

Over the past decade, international development agencies have begun to emphasize the improvement of the quality (rather than simply quantity) of education in developing countries. This new focus has been paralleled by a significant increase in the use of educational assessments as a way to measure gains and losses in quality. As this interest in…

Parallel Distributed Processing and Lexical-Semantic Effects in Visual Word Recognition: Are a Few Stages Necessary?

ERIC Educational Resources Information Center

Borowsky, Ron; Besner, Derek

2006-01-01

D. C. Plaut and J. R. Booth presented a parallel distributed processing model that purports to simulate human lexical decision performance. This model (and D. C. Plaut, 1995) offers a single mechanism account of the pattern of factor effects on reaction time (RT) between semantic priming, word frequency, and stimulus quality without requiring a…
Software Issues at the User Interface

DTIC Science & Technology

1991-05-01

successful integration of parallel computers into mainstream scientific computing. Clearly a compiler is the most important software tool available to a...Computer Science University of Colorado Boulder, CO 80309 ABSTRACT We review software issues that are critical to the successful integration of parallel...The development of an optimizing compiler of this quality, addressing communicaton instructions as well as computational instructions is a major
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis

NASA Technical Reports Server (NTRS)

Lim, Chieng-Fai

1991-01-01

The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

NASA Astrophysics Data System (ADS)

Chen, Xi; Zhou, Liqing

2015-12-01

With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging

NASA Astrophysics Data System (ADS)

Larkman, David J.; Nunes, Rita G.

2007-04-01

Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.
The application of parallel wells to support the use of groundwater for sustainable irrigation

NASA Astrophysics Data System (ADS)

Suhardi

2018-05-01

The use of groundwater as a source of irrigation is one alternative in meeting water needs of plants. Using groundwater for irrigation requires a high cost because of the discharge that can be taken is limited. In addition, the use of large groundwater can cause environmental damage and social conflict. To minimize costs, maintain quality of the environment and to prevent social conflicts, it is necessary to innovate in the groundwater taking system. The study was conducted with an innovation of using parallel wells. Performance is measured by comparing parallel wells with a single well. The results showed that the use of parallel wells to meet the water needs of rice plants and increase the pump discharge up to 100%. In addition, parallel wells can reduce the influence radius of taking of groundwater compared to single well so as to prevent social conflict. Thus, the use of parallel wells can support the achievement of the use of groundwater for sustainable irrigation.
Multiple solution of linear algebraic systems by an iterative method with recomputed preconditioner in the analysis of microstrip structures

NASA Astrophysics Data System (ADS)

Ahunov, Roman R.; Kuksenko, Sergey P.; Gazizov, Talgat R.

2016-06-01

A multiple solution of linear algebraic systems with dense matrix by iterative methods is considered. To accelerate the process, the recomputing of the preconditioning matrix is used. A priory condition of the recomputing based on change of the arithmetic mean of the current solution time during the multiple solution is proposed. To confirm the effectiveness of the proposed approach, the numerical experiments using iterative methods BiCGStab and CGS for four different sets of matrices on two examples of microstrip structures are carried out. For solution of 100 linear systems the acceleration up to 1.6 times, compared to the approach without recomputing, is obtained.
Off-diagonal Jacobian support for Nodal BCs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peterson, John W.; Andrs, David; Gaston, Derek R.

In this brief note, we describe the implementation of o-diagonal Jacobian computations for nodal boundary conditions in the Multiphysics Object Oriented Simulation Environment (MOOSE) [1] framework. There are presently a number of applications [2{5] based on the MOOSE framework that solve complicated physical systems of partial dierential equations whose boundary conditions are often highly nonlinear. Accurately computing the on- and o-diagonal Jacobian and preconditioner entries associated to these constraints is crucial for enabling ecient numerical solvers in these applications. Two key ingredients are required for properly specifying the Jacobian contributions of nonlinear nodal boundary conditions in MOOSE and nite elementmore » codes in general: 1. The ability to zero out entire Jacobian matrix rows after \
GPU-completeness: theory and implications

NASA Astrophysics Data System (ADS)

Lin, I.-Jong

2011-01-01

This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
The effect of earthquake on architecture geometry with non-parallel system irregularity configuration

NASA Astrophysics Data System (ADS)

Teddy, Livian; Hardiman, Gagoek; Nuroji; Tudjono, Sri

2017-12-01

Indonesia is an area prone to earthquake that may cause casualties and damage to buildings. The fatalities or the injured are not largely caused by the earthquake, but by building collapse. The collapse of the building is resulted from the building behaviour against the earthquake, and it depends on many factors, such as architectural design, geometry configuration of structural elements in horizontal and vertical plans, earthquake zone, geographical location (distance to earthquake center), soil type, material quality, and construction quality. One of the geometry configurations that may lead to the collapse of the building is irregular configuration of non-parallel system. In accordance with FEMA-451B, irregular configuration in non-parallel system is defined to have existed if the vertical lateral force-retaining elements are neither parallel nor symmetric with main orthogonal axes of the earthquake-retaining axis system. Such configuration may lead to torque, diagonal translation and local damage to buildings. It does not mean that non-parallel irregular configuration should not be formed on architectural design; however the designer must know the consequence of earthquake behaviour against buildings with irregular configuration of non-parallel system. The present research has the objective to identify earthquake behaviour in architectural geometry with irregular configuration of non-parallel system. The present research was quantitative with simulation experimental method. It consisted of 5 models, where architectural data and model structure data were inputted and analyzed using the software SAP2000 in order to find out its performance, and ETAB2015 to determine the eccentricity occurred. The output of the software analysis was tabulated, graphed, compared and analyzed with relevant theories. For areas of strong earthquake zones, avoid designing buildings which wholly form irregular configuration of non-parallel system. If it is inevitable to design a building with building parts containing irregular configuration of non-parallel system, make it more rigid by forming a triangle module, and use the formula.A good collaboration is needed between architects and structural experts in creating earthquake architecture.
Simultaneous fluoroscopic and nuclear imaging: impact of collimator choice on nuclear image quality.

PubMed

van der Velden, Sandra; Beijst, Casper; Viergever, Max A; de Jong, Hugo W A M

2017-01-01

X-ray-guided oncological interventions could benefit from the availability of simultaneously acquired nuclear images during the procedure. To this end, a real-time, hybrid fluoroscopic and nuclear imaging device, consisting of an X-ray c-arm combined with gamma imaging capability, is currently being developed (Beijst C, Elschot M, Viergever MA, de Jong HW. Radiol. 2015;278:232-238). The setup comprises four gamma cameras placed adjacent to the X-ray tube. The four camera views are used to reconstruct an intermediate three-dimensional image, which is subsequently converted to a virtual nuclear projection image that overlaps with the X-ray image. The purpose of the present simulation study is to evaluate the impact of gamma camera collimator choice (parallel hole versus pinhole) on the quality of the virtual nuclear image. Simulation studies were performed with a digital image quality phantom including realistic noise and resolution effects, with a dynamic frame acquisition time of 1 s and a total activity of 150 MBq. Projections were simulated for 3, 5, and 7 mm pinholes and for three parallel hole collimators (low-energy all-purpose (LEAP), low-energy high-resolution (LEHR) and low-energy ultra-high-resolution (LEUHR)). Intermediate reconstruction was performed with maximum likelihood expectation-maximization (MLEM) with point spread function (PSF) modeling. In the virtual projection derived therefrom, contrast, noise level, and detectability were determined and compared with the ideal projection, that is, as if a gamma camera were located at the position of the X-ray detector. Furthermore, image deformations and spatial resolution were quantified. Additionally, simultaneous fluoroscopic and nuclear images of a sphere phantom were acquired with a physical prototype system and compared with the simulations. For small hot spots, contrast is comparable for all simulated collimators. Noise levels are, however, 3 to 8 times higher in pinhole geometries than in parallel hole geometries. This results in higher contrast-to-noise ratios for parallel hole geometries. Smaller spheres can thus be detected with parallel hole collimators than with pinhole collimators (17 mm vs 28 mm). Pinhole geometries show larger image deformations than parallel hole geometries. Spatial resolution varied between 1.25 cm for the 3 mm pinhole and 4 cm for the LEAP collimator. The simulation method was successfully validated by the experiments with the physical prototype. A real-time hybrid fluoroscopic and nuclear imaging device is currently being developed. Image quality of nuclear images obtained with different collimators was compared in terms of contrast, noise, and detectability. Parallel hole collimators showed lower noise and better detectability than pinhole collimators. © 2016 American Association of Physicists in Medicine.
Research of the effectiveness of parallel multithreaded realizations of interpolation methods for scaling raster images

NASA Astrophysics Data System (ADS)

Vnukov, A. A.; Shershnev, M. B.

2018-01-01

The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Efficient computation of hashes

NASA Astrophysics Data System (ADS)

Lopes, Raul H. C.; Franqueira, Virginia N. L.; Hobson, Peter R.

2014-06-01

The sequential computation of hashes at the core of many distributed storage systems and found, for example, in grid services can hinder efficiency in service quality and even pose security challenges that can only be addressed by the use of parallel hash tree modes. The main contributions of this paper are, first, the identification of several efficiency and security challenges posed by the use of sequential hash computation based on the Merkle-Damgard engine. In addition, alternatives for the parallel computation of hash trees are discussed, and a prototype for a new parallel implementation of the Keccak function, the SHA-3 winner, is introduced.
Parallel and Distributed Systems for Probabilistic Reasoning

DTIC Science & Technology

2012-12-01

work at CMU I had the opportunity to work with Andreas Krause on Gaussian process models for signal quality estimation in wireless sensor networks ...we reviewed the natural parallelization of the belief propagation algorithm using the synchronous schedule and demonstrated both theoretically and...problem is that the power-law sparsity structure, commonly found in graphs derived from natural phenomena (e.g., social networks and the web
Robot-assisted ultrasound imaging: overview and development of a parallel telerobotic system.

PubMed

Monfaredi, Reza; Wilson, Emmanuel; Azizi Koutenaei, Bamshad; Labrecque, Brendan; Leroy, Kristen; Goldie, James; Louis, Eric; Swerdlow, Daniel; Cleary, Kevin

2015-02-01

Ultrasound imaging is frequently used in medicine. The quality of ultrasound images is often dependent on the skill of the sonographer. Several researchers have proposed robotic systems to aid in ultrasound image acquisition. In this paper we first provide a short overview of robot-assisted ultrasound imaging (US). We categorize robot-assisted US imaging systems into three approaches: autonomous US imaging, teleoperated US imaging, and human-robot cooperation. For each approach several systems are introduced and briefly discussed. We then describe a compact six degree of freedom parallel mechanism telerobotic system for ultrasound imaging developed by our research team. The long-term goal of this work is to enable remote ultrasound scanning through teleoperation. This parallel mechanism allows for both translation and rotation of an ultrasound probe mounted on the top plate along with force control. Our experimental results confirmed good mechanical system performance with a positioning error of < 1 mm. Phantom experiments by a radiologist showed promising results with good image quality.
Multiple plant hormones and cell wall metabolism regulate apple fruit maturation patterns and texture attributes

USDA-ARS?s Scientific Manuscript database

Molecular events regulating apple fruit ripening and sensory quality are largely unknown. Such knowledge is essential for genomic-assisted apple breeding and postharvest quality management. In this study, a parallel transcriptome profile analysis, scanning electron microscopic (SEM) examination and...
Quality control/quality assurance testing for joint density and segregation of asphalt mixtures : tech transfer summary.

DOT National Transportation Integrated Search

2013-04-01

A longitudinal joint is the interface between two adjacent and parallel hot-mix asphalt (HMA) mats. Inadequate joint construction can lead to a location where water can penetrate the pavement layers and reduce the structural support of the underlying...
[Sampling in quality control of medicinal materials-A case of Epimedium].

PubMed

Wang, Chuanyi; Cao, Jinyi; Liang, Yun; Huang, Wenhua; Guo, Baolin

2009-04-01

To investigate the effect of the different individual number of sampling on the assay results of the medicinal materials. Epimedium pubescens and E. brevicornu were used as samples. The 6 sampling levels were formulated as 1 individual, 5, 10, 20, 30, 50 individuals mix, each level with 3 parallels and 1 individual level5 parallels. The contents of epimedin C and icariin, and the peak areas of epimedin A, epimedin B, rhamnosyl icarisid II and icarisid II in all samples were analyzed by HPLC. The variation degree varied with species and chemical constituents, but the RSD and the deviation from the true value decreased with the increase of individual number on the same chemical constituent. The sampling number should be more than 10 individuals in quality control of Epimedium, and 50 or more individuals would be better for representing the quality of medicinal materials.
Xyce parallel electronic simulator design.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting

2010-09-01

This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly beenmore » funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.« less
Coaching in the AP Classroom

ERIC Educational Resources Information Center

Fornaciari, Jim

2013-01-01

Many parallels exist between quality coaches and quality classroom teachers--especially AP teachers, who often feel the pressure to produce positive test results. Having developed a series of techniques and strategies for building a team-oriented winning culture on the field, Jim Fornaciari writes about how he adapted those methods to work in the…

Quality Content in Distance Education

ERIC Educational Resources Information Center

Yildiz, Ezgi Pelin; Isman, Aytekin

2016-01-01

In parallel with technological advances in today's world of education activities can be conducted without the constraints of time and space. One of the most important of these activities is distance education. The success of the distance education is possible with content quality. The proliferation of e-learning environment has brought a need for…
15 CFR 922.132 - Prohibited or otherwise regulated activities.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 35.92222 N latitude parallel (coastal reference point: Beach access stairway at south Sand Dollar... other matter that subsequently enters the Sanctuary and injures a Sanctuary resource or quality, except... and qualities. The prohibitions in paragraphs (a)(2) through (12) of this section do not apply to...
Composite solvers for linear saddle point problems arising from the incompressible Stokes equations with highly heterogeneous viscosity structure

NASA Astrophysics Data System (ADS)

Sanan, P.; Schnepp, S. M.; May, D.; Schenk, O.

2014-12-01

Geophysical applications require efficient forward models for non-linear Stokes flow on high resolution spatio-temporal domains. The bottleneck in applying the forward model is solving the linearized, discretized Stokes problem which takes the form of a large, indefinite (saddle point) linear system. Due to the heterogeniety of the effective viscosity in the elliptic operator, devising effective preconditioners for saddle point problems has proven challenging and highly problem-dependent. Nevertheless, at least three approaches show promise for preconditioning these difficult systems in an algorithmically scalable way using multigrid and/or domain decomposition techniques. The first is to work with a hierarchy of coarser or smaller saddle point problems. The second is to use the Schur complement method to decouple and sequentially solve for the pressure and velocity. The third is to use the Schur decomposition to devise preconditioners for the full operator. These involve sub-solves resembling inexact versions of the sequential solve. The choice of approach and sub-methods depends crucially on the motivating physics, the discretization, and available computational resources. Here we examine the performance trade-offs for preconditioning strategies applied to idealized models of mantle convection and lithospheric dynamics, characterized by large viscosity gradients. Due to the arbitrary topological structure of the viscosity field in geodynamical simulations, we utilize low order, inf-sup stable mixed finite element spatial discretizations which are suitable when sharp viscosity variations occur in element interiors. Particular attention is paid to possibilities within the decoupled and approximate Schur complement factorization-based monolithic approaches to leverage recently-developed flexible, communication-avoiding, and communication-hiding Krylov subspace methods in combination with `heavy' smoothers, which require solutions of large per-node sub-problems, well-suited to solution on hybrid computational clusters. To manage the combinatorial explosion of solver options (which include hybridizations of all the approaches mentioned above), we leverage the modularity of the PETSc library.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Weston, Brian T.

This dissertation focuses on the development of a fully-implicit, high-order compressible ow solver with phase change. The work is motivated by laser-induced phase change applications, particularly by the need to develop large-scale multi-physics simulations of the selective laser melting (SLM) process in metal additive manufacturing (3D printing). Simulations of the SLM process require precise tracking of multi-material solid-liquid-gas interfaces, due to laser-induced melting/ solidi cation and evaporation/condensation of metal powder in an ambient gas. These rapid density variations and phase change processes tightly couple the governing equations, requiring a fully compressible framework to robustly capture the rapid density variations ofmore » the ambient gas and the melting/evaporation of the metal powder. For non-isothermal phase change, the velocity is gradually suppressed through the mushy region by a variable viscosity and Darcy source term model. The governing equations are discretized up to 4th-order accuracy with our reconstructed Discontinuous Galerkin spatial discretization scheme and up to 5th-order accuracy with L-stable fully implicit time discretization schemes (BDF2 and ESDIRK3-5). The resulting set of non-linear equations is solved using a robust Newton-Krylov method, with the Jacobian-free version of the GMRES solver for linear iterations. Due to the sti nes associated with the acoustic waves and thermal and viscous/material strength e ects, preconditioning the GMRES solver is essential. A robust and scalable approximate block factorization preconditioner was developed, which utilizes the velocity-pressure (vP) and velocity-temperature (vT) Schur complement systems. This multigrid block reduction preconditioning technique converges for high CFL/Fourier numbers and exhibits excellent parallel and algorithmic scalability on classic benchmark problems in uid dynamics (lid-driven cavity ow and natural convection heat transfer) as well as for laser-induced phase change problems in 2D and 3D.« less
Numerical modeling of fluid-structure interaction in arteries with anisotropic polyconvex hyperelastic and anisotropic viscoelastic material models at finite strains.

PubMed

Balzani, Daniel; Deparis, Simone; Fausten, Simon; Forti, Davide; Heinlein, Alexander; Klawonn, Axel; Quarteroni, Alfio; Rheinbach, Oliver; Schröder, Joerg

2016-10-01

The accurate prediction of transmural stresses in arterial walls requires on the one hand robust and efficient numerical schemes for the solution of boundary value problems including fluid-structure interactions and on the other hand the use of a material model for the vessel wall that is able to capture the relevant features of the material behavior. One of the main contributions of this paper is the application of a highly nonlinear, polyconvex anisotropic structural model for the solid in the context of fluid-structure interaction, together with a suitable discretization. Additionally, the influence of viscoelasticity is investigated. The fluid-structure interaction problem is solved using a monolithic approach; that is, the nonlinear system is solved (after time and space discretizations) as a whole without splitting among its components. The linearized block systems are solved iteratively using parallel domain decomposition preconditioners. A simple - but nonsymmetric - curved geometry is proposed that is demonstrated to be suitable as a benchmark testbed for fluid-structure interaction simulations in biomechanics where nonlinear structural models are used. Based on the curved benchmark geometry, the influence of different material models, spatial discretizations, and meshes of varying refinement is investigated. It turns out that often-used standard displacement elements with linear shape functions are not sufficient to provide good approximations of the arterial wall stresses, whereas for standard displacement elements or F-bar formulations with quadratic shape functions, suitable results are obtained. For the time discretization, a second-order backward differentiation formula scheme is used. It is shown that the curved geometry enables the analysis of non-rotationally symmetric distributions of the mechanical fields. For instance, the maximal shear stresses in the fluid-structure interface are found to be higher in the inner curve that corresponds to clinical observations indicating a high plaque nucleation probability at such locations. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
A parallel finite element procedure for contact-impact problems using edge-based smooth triangular element and GPU

NASA Astrophysics Data System (ADS)

Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang

2018-04-01

The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Mesh quality oriented 3D geometric vascular modeling based on parallel transport frame.

PubMed

Guo, Jixiang; Li, Shun; Chui, Yim Pan; Qin, Jing; Heng, Pheng Ann

2013-08-01

While a number of methods have been proposed to reconstruct geometrically and topologically accurate 3D vascular models from medical images, little attention has been paid to constantly maintain high mesh quality of these models during the reconstruction procedure, which is essential for many subsequent applications such as simulation-based surgical training and planning. We propose a set of methods to bridge this gap based on parallel transport frame. An improved bifurcation modeling method and two novel trifurcation modeling methods are developed based on 3D Bézier curve segments in order to ensure the continuous surface transition at furcations. In addition, a frame blending scheme is implemented to solve the twisting problem caused by frame mismatch of two successive furcations. A curvature based adaptive sampling scheme combined with a mesh quality guided frame tilting algorithm is developed to construct an evenly distributed, non-concave and self-intersection free surface mesh for vessels with distinct radius and high curvature. Extensive experiments demonstrate that our methodology can generate vascular models with better mesh quality than previous methods in terms of surface mesh quality criteria. Copyright © 2013 Elsevier Ltd. All rights reserved.
Parallel MR imaging: a user's guide.

PubMed

Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin

2005-01-01

Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.
Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology

NASA Astrophysics Data System (ADS)

Macioł, Piotr; Michalik, Kazimierz

2016-10-01

Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-08-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.
The Debate on Learning Assessments in Developing Countries

ERIC Educational Resources Information Center

Wagner, Daniel A.; Lockheed, Marlaine; Mullis, Ina; Martin, Michael O.; Kanjee, Anil; Gove, Amber; Dowd, Amy Jo

2012-01-01

Over the past decade, international and national education agencies have begun to emphasize the improvement of the quality (rather than quantity) of education in developing countries. This trend has been paralleled by a significant increase in the use of educational assessments as a way to measure gains and losses in quality of learning. As…
A Quantitative Quality Control Model for Parallel and Distributed Crowdsourcing Tasks

ERIC Educational Resources Information Center

Zhu, Shaojian

2014-01-01

Crowdsourcing is an emerging research area that has experienced rapid growth in the past few years. Although crowdsourcing has demonstrated its potential in numerous domains, several key challenges continue to hinder its application. One of the major challenges is quality control. How can crowdsourcing requesters effectively control the quality…
When the lowest energy does not induce native structures: parallel minimization of multi-energy values by hybridizing searching intelligences.

PubMed

Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

2012-01-01

Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise.
When the Lowest Energy Does Not Induce Native Structures: Parallel Minimization of Multi-Energy Values by Hybridizing Searching Intelligences

PubMed Central

Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

2012-01-01

Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708
Implementation of a Fully-Balanced Periodic Tridiagonal Solver on a Parallel Distributed Memory Architecture

DTIC Science & Technology

1994-05-01

PARALLEL DISTRIBUTED MEMORY ARCHITECTURE LTJh T. M. Eidson 0 - 8 l 9 5 " G. Erlebacher _ _ _. _ DTIe QUALITY INSPECTED a Contract NAS I - 19480 May 1994...DISTRIBUTED MEMORY ARCHITECTURE T.M. Eidson * High Technology Corporation Hampton, VA 23665 G. Erlebachert Institute for Computer Applications in Science and...developed and evaluated. Simple model calculations as well as timing results are pres.nted to evaluate the various strategies. The particular
A Simple Application of Compressed Sensing to Further Accelerate Partially Parallel Imaging

PubMed Central

Miao, Jun; Guo, Weihong; Narayan, Sreenath; Wilson, David L.

2012-01-01

Compressed Sensing (CS) and partially parallel imaging (PPI) enable fast MR imaging by reducing the amount of k-space data required for reconstruction. Past attempts to combine these two have been limited by the incoherent sampling requirement of CS, since PPI routines typically sample on a regular (coherent) grid. Here, we developed a new method, “CS+GRAPPA,” to overcome this limitation. We decomposed sets of equidistant samples into multiple random subsets. Then, we reconstructed each subset using CS, and averaging the results to get a final CS k-space reconstruction. We used both a standard CS, and an edge and joint-sparsity guided CS reconstruction. We tested these intermediate results on both synthetic and real MR phantom data, and performed a human observer experiment to determine the effectiveness of decomposition, and to optimize the number of subsets. We then used these CS reconstructions to calibrate the GRAPPA complex coil weights. In vivo parallel MR brain and heart data sets were used. An objective image quality evaluation metric, Case-PDM, was used to quantify image quality. Coherent aliasing and noise artifacts were significantly reduced using two decompositions. More decompositions further reduced coherent aliasing and noise artifacts but introduced blurring. However, the blurring was effectively minimized using our new edge and joint-sparsity guided CS using two decompositions. Numerical results on parallel data demonstrated that the combined method greatly improved image quality as compared to standard GRAPPA, on average halving Case-PDM scores across a range of sampling rates. The proposed technique allowed the same Case-PDM scores as standard GRAPPA, using about half the number of samples. We conclude that the new method augments GRAPPA by combining it with CS, allowing CS to work even when the k-space sampling pattern is equidistant. PMID:22902065
HMC algorithm with multiple time scale integration and mass preconditioning

NASA Astrophysics Data System (ADS)

Urbach, C.; Jansen, K.; Shindler, A.; Wenger, U.

2006-01-01

We present a variant of the HMC algorithm with mass preconditioning (Hasenbusch acceleration) and multiple time scale integration. We have tested this variant for standard Wilson fermions at β=5.6 and at pion masses ranging from 380 to 680 MeV. We show that in this situation its performance is comparable to the recently proposed HMC variant with domain decomposition as preconditioner. We give an update of the "Berlin Wall" figure, comparing the performance of our variant of the HMC algorithm to other published performance data. Advantages of the HMC algorithm with mass preconditioning and multiple time scale integration are that it is straightforward to implement and can be used in combination with a wide variety of lattice Dirac operators.
Preconditioned conjugate residual methods for the solution of spectral equations

NASA Technical Reports Server (NTRS)

Wong, Y. S.; Zang, T. A.; Hussaini, M. Y.

1986-01-01

Conjugate residual methods for the solution of spectral equations are described. An inexact finite-difference operator is introduced as a preconditioner in the iterative procedures. Application of these techniques is limited to problems for which the symmetric part of the coefficient matrix is positive definite. Although the spectral equation is a very ill-conditioned and full matrix problem, the computational effort of the present iterative methods for solving such a system is comparable to that for the sparse matrix equations obtained from the application of either finite-difference or finite-element methods to the same problems. Numerical experiments are shown for a self-adjoint elliptic partial differential equation with Dirichlet boundary conditions, and comparison with other solution procedures for spectral equations is presented.

On linearization and preconditioning for radiation diffusion coupled to material thermal conduction equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feng, Tao, E-mail: fengtao2@mail.ustc.edu.cn; Graduate School of China Academy Engineering Physics, Beijing 100083; An, Hengbin, E-mail: an_hengbin@iapcm.ac.cn

2013-03-01

Jacobian-free Newton–Krylov (JFNK) method is an effective algorithm for solving large scale nonlinear equations. One of the most important advantages of JFNK method is that there is no necessity to form and store the Jacobian matrix of the nonlinear system when JFNK method is employed. However, an approximation of the Jacobian is needed for the purpose of preconditioning. In this paper, JFNK method is employed to solve a class of non-equilibrium radiation diffusion coupled to material thermal conduction equations, and two preconditioners are designed by linearizing the equations in two methods. Numerical results show that the two preconditioning methods canmore » improve the convergence behavior and efficiency of JFNK method.« less
Numerical Solution of the Gyrokinetic Poisson Equation in TEMPEST

NASA Astrophysics Data System (ADS)

Dorr, Milo; Cohen, Bruce; Cohen, Ronald; Dimits, Andris; Hittinger, Jeffrey; Kerbel, Gary; Nevins, William; Rognlien, Thomas; Umansky, Maxim; Xiong, Andrew; Xu, Xueqiao

2006-10-01

The gyrokinetic Poisson (GKP) model in the TEMPEST continuum gyrokinetic edge plasma code yields the electrostatic potential due to the charge density of electrons and an arbitrary number of ion species including the effects of gyroaveraging in the limit kρ1. The TEMPEST equations are integrated as a differential algebraic system involving a nonlinear system solve via Newton-Krylov iteration. The GKP preconditioner block is inverted using a multigrid preconditioned conjugate gradient (CG) algorithm. Electrons are treated as kinetic or adiabatic. The Boltzmann relation in the adiabatic option employs flux surface averaging to maintain neutrality within field lines and is solved self-consistently with the GKP equation. A decomposition procedure circumvents the near singularity of the GKP Jacobian block that otherwise degrades CG convergence.
HEVC real-time decoding

NASA Astrophysics Data System (ADS)

Bross, Benjamin; Alvarez-Mesa, Mauricio; George, Valeri; Chi, Chi Ching; Mayer, Tobias; Juurlink, Ben; Schierl, Thomas

2013-09-01

The new High Efficiency Video Coding Standard (HEVC) was finalized in January 2013. Compared to its predecessor H.264 / MPEG4-AVC, this new international standard is able to reduce the bitrate by 50% for the same subjective video quality. This paper investigates decoder optimizations that are needed to achieve HEVC real-time software decoding on a mobile processor. It is shown that HEVC real-time decoding up to high definition video is feasible using instruction extensions of the processor while decoding 4K ultra high definition video in real-time requires additional parallel processing. For parallel processing, a picture-level parallel approach has been chosen because it is generic and does not require bitstreams with special indication.
A template-based approach for parallel hexahedral two-refinement

DOE PAGES

Owen, Steven J.; Shih, Ryan M.; Ernst, Corey D.

2016-10-17

Here, we provide a template-based approach for generating locally refined all-hex meshes. We focus specifically on refinement of initially structured grids utilizing a 2-refinement approach where uniformly refined hexes are subdivided into eight child elements. The refinement algorithm consists of identifying marked nodes that are used as the basis for a set of four simple refinement templates. The target application for 2-refinement is a parallel grid-based all-hex meshing tool for high performance computing in a distributed environment. The result is a parallel consistent locally refined mesh requiring minimal communication and where minimum mesh quality is greater than scaled Jacobian 0.3more » prior to smoothing.« less
A template-based approach for parallel hexahedral two-refinement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Owen, Steven J.; Shih, Ryan M.; Ernst, Corey D.

Here, we provide a template-based approach for generating locally refined all-hex meshes. We focus specifically on refinement of initially structured grids utilizing a 2-refinement approach where uniformly refined hexes are subdivided into eight child elements. The refinement algorithm consists of identifying marked nodes that are used as the basis for a set of four simple refinement templates. The target application for 2-refinement is a parallel grid-based all-hex meshing tool for high performance computing in a distributed environment. The result is a parallel consistent locally refined mesh requiring minimal communication and where minimum mesh quality is greater than scaled Jacobian 0.3more » prior to smoothing.« less
An Anthropologist's Reflections on Defining Quality in Education Research

ERIC Educational Resources Information Center

Tobin, Joseph

2007-01-01

In the USA there is a contemporary discourse of crisis about the state of education and a parallel discourse that lays a large portion of the blame onto the poor quality of educational research. The solution offered is "scientific research." This article presents critiques of the core assumptions of the scientific research as secure argument.…
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Design Method of Digital Optimal Control Scheme and Multiple Paralleled Bridge Type Current Amplifier for Generating Gradient Magnetic Fields in MRI Systems

NASA Astrophysics Data System (ADS)

Watanabe, Shuji; Takano, Hiroshi; Fukuda, Hiroya; Hiraki, Eiji; Nakaoka, Mutsuo

This paper deals with a digital control scheme of multiple paralleled high frequency switching current amplifier with four-quadrant chopper for generating gradient magnetic fields in MRI (Magnetic Resonance Imaging) systems. In order to track high precise current pattern in Gradient Coils (GC), the proposal current amplifier cancels the switching current ripples in GC with each other and designed optimum switching gate pulse patterns without influences of the large filter current ripple amplitude. The optimal control implementation and the linear control theory in GC current amplifiers have affinity to each other with excellent characteristics. The digital control system can be realized easily through the digital control implementation, DSPs or microprocessors. Multiple-parallel operational microprocessors realize two or higher paralleled GC current pattern tracking amplifier with optimal control design and excellent results are given for improving the image quality of MRI systems.
Parallel discrete-event simulation of FCFS stochastic queueing networks

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
P-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Biswas, R.; Simon, H.D.

1997-05-01

Partitioning unstructured graphs is central to the parallel solution of problems in computational science and engineering. The authors have introduced earlier the sequential version of an inertial spectral partitioner called HARP which maintains the quality of recursive spectral bisection (RSB) while forming the partitions an order of magnitude faster than RSB. The serial HARP is known to be the fastest spectral partitioner to date, three to four times faster than similar partitioners on a variety of meshes. This paper presents a parallel version of HARP, called P-HARP. Two types of parallelism have been exploited: loop level parallelism and recursive parallelism.more » P-HARP has been implemented in MPI on the SGI/Cray T3E and the IBM SP2. Experimental results demonstrate that P-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.25 seconds on a 64-processor T3E. Experimental results further show that P-HARP can give nearly a 20-fold speedup on 64 processors. These results indicate that graph partitioning is no longer a major bottleneck that hinders the advancement of computational science and engineering for dynamically-changing real-world applications.« less
Quality Regulation in Expansion of Educational Systems: A Case of Privately Sponsored Students' Programme in Kenya's Public Universities

ERIC Educational Resources Information Center

Yego, Helen J. C.

2016-01-01

This paper examines the expansion and management of quality of parallel programmes in Kenya's public universities. The study is based on Privately Sponsored Students Programmes (PSSP) at Moi University and its satellite campuses in Kenya. The study was descriptive in nature and adopted an ex-post facto research design. The study sample consisted…
Sex, Arts and Verbal Abilities: Three Further Indicators of How American Life Is Not Improving

ERIC Educational Resources Information Center

Robinson, John P.

2010-01-01

Despite clear evidence that Americans' economic standard of living has improved over the last half-century in terms of income, ownership of technology and housing among other indicators, there is scant evidence from non-economic quality-of-life (QOL) indicators of improved life quality to parallel these economic gains. The present article adds to…
Fast Whole-Engine Stirling Analysis

NASA Technical Reports Server (NTRS)

Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako

2006-01-01

This presentation discusses the simulation approach to whole-engine for physical consistency, REV regenerator modeling, grid layering for smoothness, and quality, conjugate heat transfer method adjustment, high-speed low cost parallel cluster, and debugging.
Implementation of Hybrid V-Cycle Multilevel Methods for Mixed Finite Element Systems with Penalty

NASA Technical Reports Server (NTRS)

Lai, Chen-Yao G.

1996-01-01

The goal of this paper is the implementation of hybrid V-cycle hierarchical multilevel methods for the indefinite discrete systems which arise when a mixed finite element approximation is used to solve elliptic boundary value problems. By introducing a penalty parameter, the perturbed indefinite system can be reduced to a symmetric positive definite system containing the small penalty parameter for the velocity unknown alone. We stabilize the hierarchical spatial decomposition approach proposed by Cai, Goldstein, and Pasciak for the reduced system. We demonstrate that the relative condition number of the preconditioner is bounded uniformly with respect to the penalty parameter, the number of levels and possible jumps of the coefficients as long as they occur only across the edges of the coarsest elements.
An incomplete assembly with thresholding algorithm for systems of reaction-diffusion equations in three space dimensions IAT for reaction-diffusion systems

NASA Astrophysics Data System (ADS)

Moore, Peter K.

2003-07-01

Solving systems of reaction-diffusion equations in three space dimensions can be prohibitively expensive both in terms of storage and CPU time. Herein, I present a new incomplete assembly procedure that is designed to reduce storage requirements. Incomplete assembly is analogous to incomplete factorization in that only a fixed number of nonzero entries are stored per row and a drop tolerance is used to discard small values. The algorithm is incorporated in a finite element method-of-lines code and tested on a set of reaction-diffusion systems. The effect of incomplete assembly on CPU time and storage and on the performance of the temporal integrator DASPK, algebraic solver GMRES and preconditioner ILUT is studied.
Shape reanalysis and sensitivities utilizing preconditioned iterative boundary solvers

NASA Technical Reports Server (NTRS)

Guru Prasad, K.; Kane, J. H.

1992-01-01

The computational advantages associated with the utilization of preconditined iterative equation solvers are quantified for the reanalysis of perturbed shapes using continuum structural boundary element analysis (BEA). Both single- and multi-zone three-dimensional problems are examined. Significant reductions in computer time are obtained by making use of previously computed solution vectors and preconditioners in subsequent analyses. The effectiveness of this technique is demonstrated for the computation of shape response sensitivities required in shape optimization. Computer times and accuracies achieved using the preconditioned iterative solvers are compared with those obtained via direct solvers and implicit differentiation of the boundary integral equations. It is concluded that this approach employing preconditioned iterative equation solvers in reanalysis and sensitivity analysis can be competitive with if not superior to those involving direct solvers.
Probabilistic Structures Analysis Methods (PSAM) for select space propulsion system components

NASA Technical Reports Server (NTRS)

1991-01-01

The basic formulation for probabilistic finite element analysis is described and demonstrated on a few sample problems. This formulation is based on iterative perturbation that uses the factorized stiffness on the unperturbed system as the iteration preconditioner for obtaining the solution to the perturbed problem. This approach eliminates the need to compute, store and manipulate explicit partial derivatives of the element matrices and force vector, which not only reduces memory usage considerably, but also greatly simplifies the coding and validation tasks. All aspects for the proposed formulation were combined in a demonstration problem using a simplified model of a curved turbine blade discretized with 48 shell elements, and having random pressure and temperature fields with partial correlation, random uniform thickness, and random stiffness at the root.
A domain decomposition approach to implementing fault slip in finite-element models of quasi-static and dynamic crustal deformation

USGS Publications Warehouse

Aagaard, Brad T.; Knepley, M.G.; Williams, C.A.

2013-01-01

We employ a domain decomposition approach with Lagrange multipliers to implement fault slip in a finite-element code, PyLith, for use in both quasi-static and dynamic crustal deformation applications. This integrated approach to solving both quasi-static and dynamic simulations leverages common finite-element data structures and implementations of various boundary conditions, discretization schemes, and bulk and fault rheologies. We have developed a custom preconditioner for the Lagrange multiplier portion of the system of equations that provides excellent scalability with problem size compared to conventional additive Schwarz methods. We demonstrate application of this approach using benchmarks for both quasi-static viscoelastic deformation and dynamic spontaneous rupture propagation that verify the numerical implementation in PyLith.
Choosing order of operations to accelerate strip structure analysis in parameter range

NASA Astrophysics Data System (ADS)

Kuksenko, S. P.; Akhunov, R. R.; Gazizov, T. R.

2018-05-01

The paper considers the issue of using iteration methods in solving the sequence of linear algebraic systems obtained in quasistatic analysis of strip structures with the method of moments. Using the analysis of 4 strip structures, the authors have proved that additional acceleration (up to 2.21 times) of the iterative process can be obtained during the process of solving linear systems repeatedly by means of choosing a proper order of operations and a preconditioner. The obtained results can be used to accelerate the process of computer-aided design of various strip structures. The choice of the order of operations to accelerate the process is quite simple, universal and could be used not only for strip structure analysis but also for a wide range of computational problems.
Layer-oriented multigrid wavefront reconstruction algorithms for multi-conjugate adaptive optics

NASA Astrophysics Data System (ADS)

Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.

2003-02-01

Multi-conjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of AO degrees of freedom. In this paper, we develop an iterative sparse matrix implementation of minimum variance wavefront reconstruction for telescope diameters up to 32m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method, using a multigrid preconditioner incorporating a layer-oriented (block) symmetric Gauss-Seidel iterative smoothing operator. We present open-loop numerical simulation results to illustrate algorithm convergence.

Design and development of polyphenylene oxide foam as a reusable internal insulation for LH2 tanks

NASA Technical Reports Server (NTRS)

1975-01-01

Material specification and fabrication process procedures for foam production are presented. The properties of mechanical strength, modulus of elasticity, density and thermal conductivity were measured and related to foam quality. Properties unique to the foam such as a gas layer insulation, density gradient parallel to the fiber direction, and gas flow conductance in both directions were correlated with foam quality. Inspection and quality control tests procedures are outlined and photographs of test equipment and test specimens are shown.
Performance-related test for asphalt emulsions.

DOT National Transportation Integrated Search

2004-10-01

Yield stress was investigated as a potential quality control parameter for asphalt emulsions. Viscometric data were determined using the concentric cylinder, parallel plate, and cone and plate geometries with rotational rheometers. We also investigat...
Jan Potocki et le "Gothic Novel"

ERIC Educational Resources Information Center

Finne, Jacques

1970-01-01

Establishes a parallel between the supernatural and fantastic qualities of Le Comte Jan Potocki's literary works, and the English gothic novels by comparing the elements of terror, mysterious atmosphere, and the supernatural beings involved. (DS)
Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

PubMed

Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

2017-07-12

Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Local search to improve coordinate-based task mapping

DOE PAGES

Balzuweit, Evan; Bunde, David P.; Leung, Vitus J.; ...

2015-10-31

We present a local search strategy to improve the coordinate-based mapping of a parallel job’s tasks to the MPI ranks of its parallel allocation in order to reduce network congestion and the job’s communication time. The goal is to reduce the number of network hops between communicating pairs of ranks. Our target is applications with a nearest-neighbor stencil communication pattern running on mesh systems with non-contiguous processor allocation, such as Cray XE and XK Systems. Utilizing the miniGhost mini-app, which models the shock physics application CTH, we demonstrate that our strategy reduces application running time while also reducing the runtimemore » variability. Furthermore, we further show that mapping quality can vary based on the selected allocation algorithm, even between allocation algorithms of similar apparent quality.« less
Use of parallel computing in mass processing of laser data

NASA Astrophysics Data System (ADS)

Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

2015-12-01

The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
Externally Calibrated Parallel Imaging for 3D Multispectral Imaging Near Metallic Implants Using Broadband Ultrashort Echo Time Imaging

PubMed Central

Wiens, Curtis N.; Artz, Nathan S.; Jang, Hyungseok; McMillan, Alan B.; Reeder, Scott B.

2017-01-01

Purpose To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. Theory and Methods A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Results Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. Conclusion A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. PMID:27403613
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers

PubMed Central

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers.

PubMed

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
Anatomically accurate high resolution modeling of human whole heart electromechanics: A strongly scalable algebraic multigrid solver method for nonlinear deformation

NASA Astrophysics Data System (ADS)

Augustin, Christoph M.; Neic, Aurel; Liebmann, Manfred; Prassl, Anton J.; Niederer, Steven A.; Haase, Gundolf; Plank, Gernot

2016-01-01

Electromechanical (EM) models of the heart have been used successfully to study fundamental mechanisms underlying a heart beat in health and disease. However, in all modeling studies reported so far numerous simplifications were made in terms of representing biophysical details of cellular function and its heterogeneity, gross anatomy and tissue microstructure, as well as the bidirectional coupling between electrophysiology (EP) and tissue distension. One limiting factor is the employed spatial discretization methods which are not sufficiently flexible to accommodate complex geometries or resolve heterogeneities, but, even more importantly, the limited efficiency of the prevailing solver techniques which is not sufficiently scalable to deal with the incurring increase in degrees of freedom (DOF) when modeling cardiac electromechanics at high spatio-temporal resolution. This study reports on the development of a novel methodology for solving the nonlinear equation of finite elasticity using human whole organ models of cardiac electromechanics, discretized at a high para-cellular resolution. Three patient-specific, anatomically accurate, whole heart EM models were reconstructed from magnetic resonance (MR) scans at resolutions of 220 μm, 440 μm and 880 μm, yielding meshes of approximately 184.6, 24.4 and 3.7 million tetrahedral elements and 95.9, 13.2 and 2.1 million displacement DOF, respectively. The same mesh was used for discretizing the governing equations of both electrophysiology (EP) and nonlinear elasticity. A novel algebraic multigrid (AMG) preconditioner for an iterative Krylov solver was developed to deal with the resulting computational load. The AMG preconditioner was designed under the primary objective of achieving favorable strong scaling characteristics for both setup and solution runtimes, as this is key for exploiting current high performance computing hardware. Benchmark results using the 220 μm, 440 μm and 880 μm meshes demonstrate efficient scaling up to 1024, 4096 and 8192 compute cores which allowed the simulation of a single heart beat in 44.3, 87.8 and 235.3 minutes, respectively. The efficiency of the method allows fast simulation cycles without compromising anatomical or biophysical detail.
Anatomically accurate high resolution modeling of human whole heart electromechanics: A strongly scalable algebraic multigrid solver method for nonlinear deformation

PubMed Central

Augustin, Christoph M.; Neic, Aurel; Liebmann, Manfred; Prassl, Anton J.; Niederer, Steven A.; Haase, Gundolf; Plank, Gernot

2016-01-01

Electromechanical (EM) models of the heart have been used successfully to study fundamental mechanisms underlying a heart beat in health and disease. However, in all modeling studies reported so far numerous simplifications were made in terms of representing biophysical details of cellular function and its heterogeneity, gross anatomy and tissue microstructure, as well as the bidirectional coupling between electrophysiology (EP) and tissue distension. One limiting factor is the employed spatial discretization methods which are not sufficiently flexible to accommodate complex geometries or resolve heterogeneities, but, even more importantly, the limited efficiency of the prevailing solver techniques which are not sufficiently scalable to deal with the incurring increase in degrees of freedom (DOF) when modeling cardiac electromechanics at high spatio-temporal resolution. This study reports on the development of a novel methodology for solving the nonlinear equation of finite elasticity using human whole organ models of cardiac electromechanics, discretized at a high para-cellular resolution. Three patient-specific, anatomically accurate, whole heart EM models were reconstructed from magnetic resonance (MR) scans at resolutions of 220 μm, 440 μm and 880 μm, yielding meshes of approximately 184.6, 24.4 and 3.7 million tetrahedral elements and 95.9, 13.2 and 2.1 million displacement DOF, respectively. The same mesh was used for discretizing the governing equations of both electrophysiology (EP) and nonlinear elasticity. A novel algebraic multigrid (AMG) preconditioner for an iterative Krylov solver was developed to deal with the resulting computational load. The AMG preconditioner was designed under the primary objective of achieving favorable strong scaling characteristics for both setup and solution runtimes, as this is key for exploiting current high performance computing hardware. Benchmark results using the 220 μm, 440 μm and 880 μm meshes demonstrate efficient scaling up to 1024, 4096 and 8192 compute cores which allowed the simulation of a single heart beat in 44.3, 87.8 and 235.3 minutes, respectively. The efficiency of the method allows fast simulation cycles without compromising anatomical or biophysical detail. PMID:26819483
Parallelized Three-Dimensional Resistivity Inversion Using Finite Elements And Adjoint State Methods

NASA Astrophysics Data System (ADS)

Schaa, Ralf; Gross, Lutz; Du Plessis, Jaco

2015-04-01

The resistivity method is one of the oldest geophysical exploration methods, which employs one pair of electrodes to inject current into the ground and one or more pairs of electrodes to measure the electrical potential difference. The potential difference is a non-linear function of the subsurface resistivity distribution described by an elliptic partial differential equation (PDE) of the Poisson type. Inversion of measured potentials solves for the subsurface resistivity represented by PDE coefficients. With increasing advances in multichannel resistivity acquisition systems (systems with more than 60 channels and full waveform recording are now emerging), inversion software require efficient storage and solver algorithms. We developed the finite element solver Escript, which provides a user-friendly programming environment in Python to solve large-scale PDE-based problems (see https://launchpad.net/escript-finley). Using finite elements, highly irregular shaped geology and topography can readily be taken into account. For the 3D resistivity problem, we have implemented the secondary potential approach, where the PDE is decomposed into a primary potential caused by the source current and the secondary potential caused by changes in subsurface resistivity. The primary potential is calculated analytically, and the boundary value problem for the secondary potential is solved using nodal finite elements. This approach removes the singularity caused by the source currents and provides more accurate 3D resistivity models. To solve the inversion problem we apply a 'first optimize then discretize' approach using the quasi-Newton scheme in form of the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method (see Gross & Kemp 2013). The evaluation of the cost function requires the solution of the secondary potential PDE for each source current and the solution of the corresponding adjoint-state PDE for the cost function gradients with respect to the subsurface resistivity. The Hessian of the regularization term is used as preconditioner which requires an additional PDE solution in each iteration step. As it turns out, the relevant PDEs are naturally formulated in the finite element framework. Using the domain decomposition method provided in Escript, the inversion scheme has been parallelized for distributed memory computers with multi-core shared memory nodes. We show numerical examples from simple layered models to complex 3D models and compare with the results from other methods. The inversion scheme is furthermore tested on a field data example to characterise localised freshwater discharge in a coastal environment.. References: L. Gross and C. Kemp (2013) Large Scale Joint Inversion of Geophysical Data using the Finite Element Method in escript. ASEG Extended Abstracts 2013, http://dx.doi.org/10.1071/ASEG2013ab306
Increasing the reach of forensic genetics with massively parallel sequencing.

PubMed

Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R

2017-09-01

The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.
Influence of crystal quality on the excitation and propagation of surface and bulk acoustic waves in polycrystalline AlN films.

PubMed

Clement, Marta; Olivares, Jimena; Capilla, Jose; Sangrador, Jesús; Iborra, Enrique

2012-01-01

We investigate the excitation and propagation of acoustic waves in polycrystalline aluminum nitride films along the directions parallel and normal to the c-axis. Longitudinal and transverse propagations are assessed through the frequency response of surface acoustic wave and bulk acoustic wave devices fabricated on films of different crystal qualities. The crystalline properties significantly affect the electromechanical coupling factors and acoustic properties of the piezoelectric layers. The presence of misoriented grains produces an overall decrease of the piezoelectric activity, degrading more severely the excitation and propagation of waves traveling transversally to the c-axis. It is suggested that the presence of such crystalline defects in c-axis-oriented films reduces the mechanical coherence between grains and hinders the transverse deformation of the film when the electric field is applied parallel to the surface. © 2012 IEEE
Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

PubMed

Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

2013-08-01

Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.
Domain Decomposition By the Advancing-Partition Method

NASA Technical Reports Server (NTRS)

Pirzadeh, Shahyar Z.

2008-01-01

A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
Parallel design patterns for a low-power, software-defined compressed video encoder

NASA Astrophysics Data System (ADS)

Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

2011-06-01

Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
CrossTalk: The Journal of Defense Software Engineering. Volume 19, Number 6

DTIC Science & Technology

2006-06-01

improvement methods. The total volume of projects studied now exceeds 12,000. Software Productivity Research, LLC Phone: (877) 570-5459 (973) 273-5829...While performing quality con- sulting, Olson has helped organizations measurably improve quality and productivity , save millions of dollars in costs of...This article draws parallels between the outrageous events on the Jerry Springer Show and problems faced by process improvement programs. by Paul
Parallels between Objective Indicators and Subjective Perceptions of Quality of Life: A Study of Metropolitan and County Areas in Taiwan

ERIC Educational Resources Information Center

Liao, Pei-shan

2009-01-01

This study explores the consistency between objective indicators and subjective perceptions of quality of life in a ranking of survey data for cities and counties in Taiwan. Data used for analysis included the Statistical Yearbook of Hsiens and Municipalities and the Survey on Living Conditions of Citizens in Taiwan, both given for the year 2000.…
Importance of methodology on (99m)technetium dimercapto-succinic acid scintigraphic image quality: imaging pilot study for RIVUR (Randomized Intervention for Children With Vesicoureteral Reflux) multicenter investigation.

PubMed

Ziessman, Harvey A; Majd, Massoud

2009-07-01

We reviewed our experience with (99m)technetium dimercapto-succinic acid scintigraphy obtained during an imaging pilot study for a multicenter investigation (Randomized Intervention for Children With Vesicoureteral Reflux) of the effectiveness of daily antimicrobial prophylaxis for preventing recurrent urinary tract infection and renal scarring. We analyzed imaging methodology and its relation to diagnostic image quality. (99m)Technetium dimercapto-succinic acid imaging guidelines were provided to participating sites. High-resolution planar imaging with parallel hole or pinhole collimation was required. Two core reviewers evaluated all submitted images. Analysis included appropriate views, presence or lack of patient motion, adequate magnification, sufficient counts and diagnostic image quality. Inter-reader agreement was evaluated. We evaluated 70, (99m)technetium dimercapto-succinic acid studies from 14 institutions. Variability was noted in methodology and image quality. Correlation (r value) between dose administered and patient age was 0.780. For parallel hole collimator imaging good correlation was noted between activity administered and counts (r = 0.800). For pinhole imaging the correlation was poor (r = 0.110). A total of 10 studies (17%) were rejected for quality issues of motion, kidney overlap, inadequate magnification, inadequate counts and poor quality images. The submitting institution was informed and provided with recommendations for improving quality, and resubmission of another study was required. Only 4 studies (6%) were judged differently by the 2 reviewers, and the differences were minor. Methodology and image quality for (99m)technetium dimercapto-succinic acid scintigraphy varied more than expected between institutions. The most common reason for poor image quality was inadequate count acquisition with insufficient attention to the tradeoff between administered dose, length of image acquisition, start time of imaging and resulting image quality. Inter-observer core reader agreement was high. The pilot study ensured good diagnostic quality standardized images for the Randomized Intervention for Children With Vesicoureteral Reflux investigation.

Nested Conjugate Gradient Algorithm with Nested Preconditioning for Non-linear Image Restoration.

PubMed

Skariah, Deepak G; Arigovindan, Muthuvel

2017-06-19

We develop a novel optimization algorithm, which we call Nested Non-Linear Conjugate Gradient algorithm (NNCG), for image restoration based on quadratic data fitting and smooth non-quadratic regularization. The algorithm is constructed as a nesting of two conjugate gradient (CG) iterations. The outer iteration is constructed as a preconditioned non-linear CG algorithm; the preconditioning is performed by the inner CG iteration that is linear. The inner CG iteration, which performs preconditioning for outer CG iteration, itself is accelerated by an another FFT based non-iterative preconditioner. We prove that the method converges to a stationary point for both convex and non-convex regularization functionals. We demonstrate experimentally that proposed method outperforms the well-known majorization-minimization method used for convex regularization, and a non-convex inertial-proximal method for non-convex regularization functional.
Application of Conjugate Gradient methods to tidal simulation

USGS Publications Warehouse

Barragy, E.; Carey, G.F.; Walters, R.A.

1993-01-01

A harmonic decomposition technique is applied to the shallow water equations to yield a complex, nonsymmetric, nonlinear, Helmholtz type problem for the sea surface and an accompanying complex, nonlinear diagonal problem for the velocities. The equation for the sea surface is linearized using successive approximation and then discretized with linear, triangular finite elements. The study focuses on applying iterative methods to solve the resulting complex linear systems. The comparative evaluation includes both standard iterative methods for the real subsystems and complex versions of the well known Bi-Conjugate Gradient and Bi-Conjugate Gradient Squared methods. Several Incomplete LU type preconditioners are discussed, and the effects of node ordering, rejection strategy, domain geometry and Coriolis parameter (affecting asymmetry) are investigated. Implementation details for the complex case are discussed. Performance studies are presented and comparisons made with a frontal solver. ?? 1993.
Multi-dimensional, fully implicit, exactly conserving electromagnetic particle-in-cell simulations in curvilinear geometry

NASA Astrophysics Data System (ADS)

Chen, Guangye; Chacon, Luis

2015-11-01

We discuss a new, conservative, fully implicit 2D3V Vlasov-Darwin particle-in-cell algorithm in curvilinear geometry for non-radiative, electromagnetic kinetic plasma simulations. Unlike standard explicit PIC schemes, fully implicit PIC algorithms are unconditionally stable and allow exact discrete energy and charge conservation. Here, we extend these algorithms to curvilinear geometry. The algorithm retains its exact conservation properties in curvilinear grids. The nonlinear iteration is effectively accelerated with a fluid preconditioner for weakly to modestly magnetized plasmas, which allows efficient use of large timesteps, O (√{mi/me}c/veT) larger than the explicit CFL. In this presentation, we will introduce the main algorithmic components of the approach, and demonstrate the accuracy and efficiency properties of the algorithm with various numerical experiments in 1D (slow shock) and 2D (island coalescense).
Improved Convergence and Robustness of USM3D Solutions on Mixed Element Grids (Invited)

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.

2015-01-01

Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Scheme (HANIS), has been developed and implemented. It provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier Stokes (RANS) equations and a nonlinear control of the solution update. Two variants of the new methodology are assessed on four benchmark cases, namely, a zero-pressure gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the baseline solver technology.
Algorithm 937: MINRES-QLP for Symmetric and Hermitian Linear Equations and Least-Squares Problems.

PubMed

Choi, Sou-Cheng T; Saunders, Michael A

2014-02-01

We describe algorithm MINRES-QLP and its FORTRAN 90 implementation for solving symmetric or Hermitian linear systems or least-squares problems. If the system is singular, MINRES-QLP computes the unique minimum-length solution (also known as the pseudoinverse solution), which generally eludes MINRES. In all cases, it overcomes a potential instability in the original MINRES algorithm. A positive-definite pre-conditioner may be supplied. Our FORTRAN 90 implementation illustrates a design pattern that allows users to make problem data known to the solver but hidden and secure from other program units. In particular, we circumvent the need for reverse communication. Example test programs input and solve real or complex problems specified in Matrix Market format. While we focus here on a FORTRAN 90 implementation, we also provide and maintain MATLAB versions of MINRES and MINRES-QLP.
Hybrid preconditioning for iterative diagonalization of ill-conditioned generalized eigenvalue problems in electronic structure calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cai, Yunfeng, E-mail: yfcai@math.pku.edu.cn; Department of Computer Science, University of California, Davis 95616; Bai, Zhaojun, E-mail: bai@cs.ucdavis.edu

2013-12-15

The iterative diagonalization of a sequence of large ill-conditioned generalized eigenvalue problems is a computational bottleneck in quantum mechanical methods employing a nonorthogonal basis for ab initio electronic structure calculations. We propose a hybrid preconditioning scheme to effectively combine global and locally accelerated preconditioners for rapid iterative diagonalization of such eigenvalue problems. In partition-of-unity finite-element (PUFE) pseudopotential density-functional calculations, employing a nonorthogonal basis, we show that the hybrid preconditioned block steepest descent method is a cost-effective eigensolver, outperforming current state-of-the-art global preconditioning schemes, and comparably efficient for the ill-conditioned generalized eigenvalue problems produced by PUFE as the locally optimal blockmore » preconditioned conjugate-gradient method for the well-conditioned standard eigenvalue problems produced by planewave methods.« less
A numerical method to solve the 1D and the 2D reaction diffusion equation based on Bessel functions and Jacobian free Newton-Krylov subspace methods

NASA Astrophysics Data System (ADS)

Parand, K.; Nikarya, M.

2017-11-01

In this paper a novel method will be introduced to solve a nonlinear partial differential equation (PDE). In the proposed method, we use the spectral collocation method based on Bessel functions of the first kind and the Jacobian free Newton-generalized minimum residual (JFNGMRes) method with adaptive preconditioner. In this work a nonlinear PDE has been converted to a nonlinear system of algebraic equations using the collocation method based on Bessel functions without any linearization, discretization or getting the help of any other methods. Finally, by using JFNGMRes, the solution of the nonlinear algebraic system is achieved. To illustrate the reliability and efficiency of the proposed method, we solve some examples of the famous Fisher equation. We compare our results with other methods.
Externally calibrated parallel imaging for 3D multispectral imaging near metallic implants using broadband ultrashort echo time imaging.

PubMed

Wiens, Curtis N; Artz, Nathan S; Jang, Hyungseok; McMillan, Alan B; Reeder, Scott B

2017-06-01

To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. Magn Reson Med 77:2303-2309, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
The benefits of privatization.

PubMed

Dirnfeld, V

1996-08-15

The promise of a universal, comprehensive, publicly funded system of medical care that was the foundation of the Medical Care Act passed in 1966 is no longer possible. Massive government debt, increasing health care costs, a growing and aging population and advances in technology have challenged the system, which can no longer meet the expectations of the public or of the health care professions. A parallel, private system, funded by a not-for-profit, regulated system of insurance coverage affordable for all wage-earners, would relieve the overstressed public system without decreasing the quality of care in that system. Critics of a parallel, private system, who base their arguments on the politics of fear and envy, charge that such a private system would "Americanize" Canadian health care and that the wealthy would be able to buy better, faster care than the rest of the population. But this has not happened in the parallel public and private health care systems in other Western countries or in the public and private education system in Canada. Wealthy Canadians can already buy medical care in the United States, where they spend $1 billion each year, an amount that represents a loss to Canada of 10,000 health care jobs. Parallel-system schemes in other countries have proven that people are driven to a private system by dissatisfaction with the quality of service, which is already suffering in Canada. Denial of choice is unacceptable to many people, particularly since the terms and conditions under which Canadians originally decided to forgo choice in medical care no longer apply.
Why caution is recommended with post-hoc individual patient matching for estimation of treatment effect in parallel-group randomized controlled trials: the case of acute stroke trials.

PubMed

Jafari, Nahid; Hearne, John; Churilov, Leonid

2013-11-10

A post-hoc individual patient matching procedure was recently proposed within the context of parallel group randomized clinical trials (RCTs) as a method for estimating treatment effect. In this paper, we consider a post-hoc individual patient matching problem within a parallel group RCT as a multi-objective decision-making problem focussing on the trade-off between the quality of individual matches and the overall percentage of matching. Using acute stroke trials as a context, we utilize exact optimization and simulation techniques to investigate a complex relationship between the overall percentage of individual post-hoc matching, the size of the respective RCT, and the quality of matching on variables highly prognostic for a good functional outcome after stroke, as well as the dispersion in these variables. It is empirically confirmed that a high percentage of individual post-hoc matching can only be achieved when the differences in prognostic baseline variables between individually matched subjects within the same pair are sufficiently large and that the unmatched subjects are qualitatively different to the matched ones. It is concluded that the post-hoc individual matching as a technique for treatment effect estimation in parallel-group RCTs should be exercised with caution because of its propensity to introduce significant bias and reduce validity. If used with appropriate caution and thorough evaluation, this approach can complement other viable alternative approaches for estimating the treatment effect. Copyright © 2013 John Wiley & Sons, Ltd.
Aberration compensation of an ultrasound imaging instrument with a reduced number of channels.

PubMed

Jiang, Wei; Astheimer, Jeffrey P; Waag, Robert C

2012-10-01

Focusing and imaging qualities of an ultrasound imaging system that uses aberration correction were experimentally investigated as functions of the number of parallel channels. Front-end electronics that consolidate signals from multiple physical elements can be used to lower hardware and computational costs by reducing the number of parallel channels. However, the signals from sparse arrays of synthetic elements yield poorer aberration estimates. In this study, aberration estimates derived from synthetic arrays of varying element sizes are evaluated by comparing compensated receive focuses, compensated transmit focuses, and compensated b-scan images of a point target and a cyst phantom. An array of 80 x 80 physical elements with a pitch of 0.6 x 0.6 mm was used for all of the experiments and the aberration was produced by a phantom selected to mimic propagation through abdominal wall. The results show that aberration correction derived from synthetic arrays with pitches that have a diagonal length smaller than 70% of the correlation length of the aberration yield focuses and images of approximately the same quality. This connection between correlation length of the aberration and synthetic element size provides a guideline for determining the number of parallel channels that are required when designing imaging systems that employ aberration correction.
Towards a five-minute comprehensive cardiac MR examination using highly accelerated parallel imaging with a 32-element coil array: feasibility and initial comparative evaluation.

PubMed

Xu, Jian; Kim, Daniel; Otazo, Ricardo; Srichai, Monvadi B; Lim, Ruth P; Axel, Leon; Mcgorty, Kelly Anne; Niendorf, Thoralf; Sodickson, Daniel K

2013-07-01

To evaluate the feasibility and perform initial comparative evaluations of a 5-minute comprehensive whole-heart magnetic resonance imaging (MRI) protocol with four image acquisition types: perfusion (PERF), function (CINE), coronary artery imaging (CAI), and late gadolinium enhancement (LGE). This study protocol was Health Insurance Portability and Accountability Act (HIPAA)-compliant and Institutional Review Board-approved. A 5-minute comprehensive whole-heart MRI examination protocol (Accelerated) using 6-8-fold-accelerated volumetric parallel imaging was incorporated into and compared with a standard 2D clinical routine protocol (Standard). Following informed consent, 20 patients were imaged with both protocols. Datasets were reviewed for image quality using a 5-point Likert scale (0 = non-diagnostic, 4 = excellent) in blinded fashion by two readers. Good image quality with full whole-heart coverage was achieved using the accelerated protocol, particularly for CAI, although significant degradations in quality, as compared with traditional lengthy examinations, were observed for the other image types. Mean total scan time was significantly lower for the Accelerated as compared to Standard protocols (28.99 ± 4.59 min vs. 1.82 ± 0.05 min, P < 0.05). Overall image quality for the Standard vs. Accelerated protocol was 3.67 ± 0.29 vs. 1.5 ± 0.51 (P < 0.005) for PERF, 3.48 ± 0.64 vs. 2.6 ± 0.68 (P < 0.005) for CINE, 2.35 ± 1.01 vs. 2.48 ± 0.68 (P = 0.75) for CAI, and 3.67 ± 0.42 vs. 2.67 ± 0.84 (P < 0.005) for LGE. Diagnostic image quality for Standard vs. Accelerated protocols was 20/20 (100%) vs. 10/20 (50%) for PERF, 20/20 (100%) vs. 18/20 (90%) for CINE, 18/20 (90%) vs. 18/20 (90%) for CAI, and 20/20 (100%) vs. 18/20 (90%) for LGE. This study demonstrates the technical feasibility and promising image quality of 5-minute comprehensive whole-heart cardiac examinations, with simplified scan prescription and high spatial and temporal resolution enabled by highly parallel imaging technology. The study also highlights technical hurdles that remain to be addressed. Although image quality remained diagnostic for most scan types, the reduced image quality of PERF, CINE, and LGE scans in the Accelerated protocol remain a concern. Copyright © 2012 Wiley Periodicals, Inc.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Flexure mechanism-based parallelism measurements for chip-on-glass bonding

NASA Astrophysics Data System (ADS)

Jung, Seung Won; Yun, Won Soo; Jin, Songwan; Kim, Bo Sun; Jeong, Young Hun

2011-08-01

Recently, liquid crystal displays (LCDs) have played vital roles in a variety of electronic devices such as televisions, cellular phones, and desktop/laptop monitors because of their enhanced volume, performance, and functionality. However, there is still a need for thinner LCD panels due to the trend of miniaturization in electronic applications. Thus, chip-on-glass (COG) bonding has become one of the most important aspects in the LCD panel manufacturing process. In this study, a novel sensor was developed to measure the parallelism between the tooltip planes of the bonding head and the backup of the COG main bonder, which has previously been estimated by prescale pressure films in industry. The sensor developed in this study is based on a flexure mechanism, and it can measure the total pressing force and the inclination angles in two directions that satisfy the quantitative definition of parallelism. To improve the measurement accuracy, the sensor was calibrated based on the estimation of the total pressing force and the inclination angles using the least-squares method. To verify the accuracy of the sensor, the estimation results for parallelism were compared with those from prescale pressure film measurements. In addition, the influence of parallelism on the bonding quality was experimentally demonstrated. The sensor was successfully applied to the measurement of parallelism in the COG-bonding process with an accuracy of more than three times that of the conventional method using prescale pressure films.
Highly accelerated cardiac cine parallel MRI using low-rank matrix completion and partial separability model

NASA Astrophysics Data System (ADS)

Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie

2016-05-01

This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.
Update on Development of Mesh Generation Algorithms in MeshKit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jain, Rajeev; Vanderzee, Evan; Mahadevan, Vijay

2015-09-30

MeshKit uses a graph-based design for coding all its meshing algorithms, which includes the Reactor Geometry (and mesh) Generation (RGG) algorithms. This report highlights the developmental updates of all the algorithms, results and future work. Parallel versions of algorithms, documentation and performance results are reported. RGG GUI design was updated to incorporate new features requested by the users; boundary layer generation and parallel RGG support were added to the GUI. Key contributions to the release, upgrade and maintenance of other SIGMA1 libraries (CGM and MOAB) were made. Several fundamental meshing algorithms for creating a robust parallel meshing pipeline in MeshKitmore » are under development. Results and current status of automated, open-source and high quality nuclear reactor assembly mesh generation algorithms such as trimesher, quadmesher, interval matching and multi-sweeper are reported.« less
A rapid parallelization of cone-beam projection and back-projection operator based on texture fetching interpolation

NASA Astrophysics Data System (ADS)

Xie, Lizhe; Hu, Yining; Chen, Yang; Shi, Luyao

2015-03-01

Projection and back-projection are the most computational consuming parts in Computed Tomography (CT) reconstruction. Parallelization strategies using GPU computing techniques have been introduced. We in this paper present a new parallelization scheme for both projection and back-projection. The proposed method is based on CUDA technology carried out by NVIDIA Corporation. Instead of build complex model, we aimed on optimizing the existing algorithm and make it suitable for CUDA implementation so as to gain fast computation speed. Besides making use of texture fetching operation which helps gain faster interpolation speed, we fixed sampling numbers in the computation of projection, to ensure the synchronization of blocks and threads, thus prevents the latency caused by inconsistent computation complexity. Experiment results have proven the computational efficiency and imaging quality of the proposed method.
Imaging resolution and properties analysis of super resolution microscopy with parallel detection under different noise, detector and image restoration conditions

NASA Astrophysics Data System (ADS)

Yu, Zhongzhi; Liu, Shaocong; Sun, Shiyi; Kuang, Cuifang; Liu, Xu

2018-06-01

Parallel detection, which can use the additional information of a pinhole plane image taken at every excitation scan position, could be an efficient method to enhance the resolution of a confocal laser scanning microscope. In this paper, we discuss images obtained under different conditions and using different image restoration methods with parallel detection to quantitatively compare the imaging quality. The conditions include different noise levels and different detector array settings. The image restoration methods include linear deconvolution and pixel reassignment with Richard-Lucy deconvolution and with maximum-likelihood estimation deconvolution. The results show that the linear deconvolution share properties such as high-efficiency and the best performance under all different conditions, and is therefore expected to be of use for future biomedical routine research.
Targeted Alteration of Dietary Omega-3 and Omega-6 Fatty Acids for the Treatment of Post-Traumatic Headaches

DTIC Science & Technology

2016-10-01

reduced psychological distress and improved quality-of- life in a chronic headache population. We propose to carry out a 2-arm, parallel group...emphasize the role of inflammation, cytokine modulation, microglial activation, and abnormalities in neurotransmitter activity in mediating PTH. These...anti- and pro-nociceptive lipid mediators and their precursor fatty acids, reduced psychological distress and improved quality-of-life in a chronic

Rapid evaluation and quality control of next generation sequencing data with FaQCs.

PubMed

Lo, Chien-Chi; Chain, Patrick S G

2014-11-19

Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
Comparison of the IAEA TRS-398 and AAPM TG-51 absorbed dose to water protocols in the dosimetry of high-energy photon and electron beams

NASA Astrophysics Data System (ADS)

Saiful Huq, M.; Andreo, Pedro; Song, Haijun

2001-11-01

The International Atomic Energy Agency (IAEA TRS-398) and the American Association of Physicists in Medicine (AAPM TG-51) have published new protocols for the calibration of radiotherapy beams. These protocols are based on the use of an ionization chamber calibrated in terms of absorbed dose to water in a standards laboratory's reference quality beam. This paper compares the recommendations of the two protocols in two ways: (i) by analysing in detail the differences in the basic data included in the two protocols for photon and electron beam dosimetry and (ii) by performing measurements in clinical photon and electron beams and determining the absorbed dose to water following the recommendations of the two protocols. Measurements were made with two Farmer-type ionization chambers and three plane-parallel ionization chamber types in 6, 18 and 25 MV photon beams and 6, 8, 10, 12, 15 and 18 MeV electron beams. The Farmer-type chambers used were NE 2571 and PTW 30001, and the plane-parallel chambers were a Scanditronix-Wellhöfer NACP and Roos, and a PTW Markus chamber. For photon beams, the measured ratios TG-51/TRS-398 of absorbed dose to water Dw ranged between 0.997 and 1.001, with a mean value of 0.999. The ratios for the beam quality correction factors kQ were found to agree to within about +/-0.2% despite significant differences in the method of beam quality specification for photon beams and in the basic data entering into kQ. For electron beams, dose measurements were made using direct ND,w calibrations of cylindrical and plane-parallel chambers in a 60Co gamma-ray beam, as well as cross-calibrations of plane-parallel chambers in a high-energy electron beam. For the direct ND,w calibrations the ratios TG-51/TRS-398 of absorbed dose to water Dw were found to lie between 0.994 and 1.018 depending upon the chamber and electron beam energy used, with mean values of 0.996, 1.006, and 1.017, respectively, for the cylindrical, well-guarded and not well-guarded plane-parallel chambers. The Dw ratios measured for the cross-calibration procedures varied between 0.993 and 0.997. The largest discrepancies for electron beams between the two protocols arise from the use of different data for the perturbation correction factors pwall and pdis of cylindrical and plane-parallel chambers, all in 60Co. A detailed analysis of the reasons for the discrepancies is made which includes comparing the formalisms, correction factors and the quantities in the two protocols.
Grade pending: lessons for hospital quality reporting from the New York City restaurant sanitation inspection program.

PubMed

Ryan, Andrew M; Detsky, Allan S

2015-02-01

Public quality reporting programs have been widely implemented in hospitals in an effort to improve quality and safety. One such program is Hospital Compare, Medicare's national quality reporting program for US hospitals. The New York City sanitary grade inspection program is a parallel effort for restaurants. The aims of Hospital Compare and the New York City sanitary inspection program are fundamentally similar: to address a common market failure resulting from consumers' lack of information on quality and safety. However, by displaying easily understandable information at the point of service, the New York City sanitary inspection program is better designed to encourage informed consumer decision making. We argue that this program holds important lessons for public quality reporting of US hospitals. © 2014 Society of Hospital Medicine.
Causes of drug shortages in the legal pharmaceutical framework.

PubMed

De Weerdt, Elfi; Simoens, Steven; Hombroeckx, Luc; Casteels, Minne; Huys, Isabelle

2015-03-01

Different causes of drug shortages can be linked to the pharmaceutical legal framework, such as: parallel trade, quality requirements, economic decisions to suspend or cease production, etc. However until now no in-depth study of the different regulations affecting drug shortages is available. The aim of this paper is to provide an analysis of relevant legal and regulatory measures in the European pharmaceutical framework which influence drug shortages. Different European and national legislations governing human medicinal products were analyzed (e.g. Directive 2001/83/EC and Directive 2011/62/EU), supplemented with literature studies. For patented drugs, external price referencing may encompass the largest impact on drug shortages. For generic medicines, internal or external reference pricing, tendering as well as price capping may affect drug shortages. Manufacturing/quality requirements also contribute to drug shortages, since non-compliance leads to recalls. The influence of parallel trade on drug shortages is still rather disputable. Price and quality regulations are both important causes of drug shortages or drug unavailability. It can be concluded that there is room for improvement in the pharmaceutical legal framework within the lines drawn by the EU to mitigate drug shortages. Copyright © 2015 Elsevier Inc. All rights reserved.
Scalable Static and Dynamic Community Detection Using Grappolo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman

Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less
Parallel seed-based approach to multiple protein structure similarities detection

DOE PAGES

Chapuis, Guillaume; Le Boudic-Jamin, Mathilde; Andonov, Rumen; ...

2015-01-01

Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makesmore » our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.« less
Graph Partitioning for Parallel Applications in Heterogeneous Grid Environments

NASA Technical Reports Server (NTRS)

Bisws, Rupak; Kumar, Shailendra; Das, Sajal K.; Biegel, Bryan (Technical Monitor)

2002-01-01

The problem of partitioning irregular graphs and meshes for parallel computations on homogeneous systems has been extensively studied. However, these partitioning schemes fail when the target system architecture exhibits heterogeneity in resource characteristics. With the emergence of technologies such as the Grid, it is imperative to study the partitioning problem taking into consideration the differing capabilities of such distributed heterogeneous systems. In our model, the heterogeneous system consists of processors with varying processing power and an underlying non-uniform communication network. We present in this paper a novel multilevel partitioning scheme for irregular graphs and meshes, that takes into account issues pertinent to Grid computing environments. Our partitioning algorithm, called MiniMax, generates and maps partitions onto a heterogeneous system with the objective of minimizing the maximum execution time of the parallel distributed application. For experimental performance study, we have considered both a realistic mesh problem from NASA as well as synthetic workloads. Simulation results demonstrate that MiniMax generates high quality partitions for various classes of applications targeted for parallel execution in a distributed heterogeneous environment.
Performance of a parallel thermal-hydraulics code TEMPEST

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fann, G.I.; Trent, D.S.

The authors describe the parallelization of the Tempest thermal-hydraulics code. The serial version of this code is used for production quality 3-D thermal-hydraulics simulations. Good speedup was obtained with a parallel diagonally preconditioned BiCGStab non-symmetric linear solver, using a spatial domain decomposition approach for the semi-iterative pressure-based and mass-conserved algorithm. The test case used here to illustrate the performance of the BiCGStab solver is a 3-D natural convection problem modeled using finite volume discretization in cylindrical coordinates. The BiCGStab solver replaced the LSOR-ADI method for solving the pressure equation in TEMPEST. BiCGStab also solves the coupled thermal energy equation. Scalingmore » performance of 3 problem sizes (221220 nodes, 358120 nodes, and 701220 nodes) are presented. These problems were run on 2 different parallel machines: IBM-SP and SGI PowerChallenge. The largest problem attains a speedup of 68 on an 128 processor IBM-SP. In real terms, this is over 34 times faster than the fastest serial production time using the LSOR-ADI solver.« less
Parallel transmission RF pulse design for eddy current correction at ultra high field.

PubMed

Zheng, Hai; Zhao, Tiejun; Qian, Yongxian; Ibrahim, Tamer; Boada, Fernando

2012-08-01

Multidimensional spatially selective RF pulses have been used in MRI applications such as B₁ and B₀ inhomogeneities mitigation. However, the long pulse duration has limited their practical applications. Recently, theoretical and experimental studies have shown that parallel transmission can effectively shorten pulse duration without sacrificing the quality of the excitation pattern. Nonetheless, parallel transmission with accelerated pulses can be severely impeded by hardware and/or system imperfections. One of such imperfections is the effect of the eddy current field. In this paper, we first show the effects of the eddy current field on the excitation pattern and then report an RF pulse the design method to correct eddy current fields caused by the RF coil and the gradient system. Experimental results on a 7 T human eight-channel parallel transmit system show substantial improvements on excitation patterns with the use of eddy current correction. Moreover, the proposed model-based correction method not only demonstrates comparable excitation patterns as the trajectory measurement method, but also significantly improves time efficiency. Copyright © 2012. Published by Elsevier Inc.
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Why is intelligence correlated with semen quality?

PubMed Central

Miller, Geoffrey; Arden, Rosalind; Gottfredson, Linda S

2009-01-01

We recently found positive correlations between human general intelligence and three key indices of semen quality, and hypothesized that these correlations arise through a phenotype-wide ‘general fitness factor’ reflecting overall mutation load. In this addendum we consider some of the biochemical pathways that may act as targets for pleiotropic mutations that disrupt both neuron function and sperm function in parallel. We focus especially on the inter-related roles of polyunsaturated fatty acids, exocytosis and receptor signaling. PMID:19907694
Common sense about taste: from mammals to insects.

PubMed

Yarmolinsky, David A; Zuker, Charles S; Ryba, Nicholas J P

2009-10-16

The sense of taste is a specialized chemosensory system dedicated to the evaluation of food and drink. Despite the fact that vertebrates and insects have independently evolved distinct anatomic and molecular pathways for taste sensation, there are clear parallels in the organization and coding logic between the two systems. There is now persuasive evidence that tastant quality is mediated by labeled lines, whereby distinct and strictly segregated populations of taste receptor cells encode each of the taste qualities.
Improving quality of arterial spin labeling MR imaging at 3 Tesla with a 32-channel coil and parallel imaging.

PubMed

Ferré, Jean-Christophe; Petr, Jan; Bannier, Elise; Barillot, Christian; Gauvrit, Jean-Yves

2012-05-01

To compare 12-channel and 32-channel phased-array coils and to determine the optimal parallel imaging (PI) technique and factor for brain perfusion imaging using Pulsed Arterial Spin labeling (PASL) at 3 Tesla (T). Twenty-seven healthy volunteers underwent 10 different PASL perfusion PICORE Q2TIPS scans at 3T using 12-channel and 32-channel coils without PI and with GRAPPA or mSENSE using factor 2. PI with factor 3 and 4 were used only with the 32-channel coil. Visual quality was assessed using four parameters. Quantitative analyses were performed using temporal noise, contrast-to-noise and signal-to-noise ratios (CNR, SNR). Compared with 12-channel acquisition, the scores for 32-channel acquisition were significantly higher for overall visual quality, lower for noise and higher for SNR and CNR. With the 32-channel coil, artifact compromise achieved the best score with PI factor 2. Noise increased, SNR and CNR decreased with PI factor. However mSENSE 2 scores were not always significantly different from acquisition without PI. For PASL at 3T, the 32-channel coil at 3T provided better quality than the 12-channel coil. With the 32-channel coil, mSENSE 2 seemed to offer the best compromise for decreasing artifacts without significantly reducing SNR, CNR. Copyright © 2012 Wiley Periodicals, Inc.
MRI of the wrist at 7 tesla using an eight-channel array coil combined with parallel imaging: preliminary results.

PubMed

Chang, Gregory; Friedrich, Klaus M; Wang, Ligong; Vieira, Renata L R; Schweitzer, Mark E; Recht, Michael P; Wiggins, Graham C; Regatte, Ravinder R

2010-03-01

To determine the feasibility of performing MRI of the wrist at 7 Tesla (T) with parallel imaging and to evaluate how acceleration factors (AF) affect signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), and image quality. This study had institutional review board approval. A four-transmit eight-receive channel array coil was constructed in-house. Nine healthy subjects were scanned on a 7T whole-body MR scanner. Coronal and axial images of cartilage and trabecular bone micro-architecture (3D-Fast Low Angle Shot (FLASH) with and without fat suppression, repetition time/echo time = 20 ms/4.5 ms, flip angle = 10 degrees , 0.169-0.195 x 0.169-0.195 mm, 0.5-1 mm slice thickness) were obtained with AF 1, 2, 3, 4. T1-weighted fast spin-echo (FSE), proton density-weighted FSE, and multiple-echo data image combination (MEDIC) sequences were also performed. SNR and CNR were measured. Three musculoskeletal radiologists rated image quality. Linear correlation analysis and paired t-tests were performed. At higher AF, SNR and CNR decreased linearly for cartilage, muscle, and trabecular bone (r < -0.98). At AF 4, reductions in SNR/CNR were:52%/60% (cartilage), 72%/63% (muscle), 45%/50% (trabecular bone). Radiologists scored images with AF 1 and 2 as near-excellent, AF 3 as good-to-excellent (P = 0.075), and AF 4 as average-to-good (P = 0.11). It is feasible to perform high resolution 7T MRI of the wrist with parallel imaging. SNR and CNR decrease with higher AF, but image quality remains above-average.
An iterative reduced field-of-view reconstruction for periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI.

PubMed

Lin, Jyh-Miin; Patterson, Andrew J; Chang, Hing-Chiu; Gillard, Jonathan H; Graves, Martin J

2015-10-01

To propose a new reduced field-of-view (rFOV) strategy for iterative reconstructions in a clinical environment. Iterative reconstructions can incorporate regularization terms to improve the image quality of periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI. However, the large amount of calculations required for full FOV iterative reconstructions has posed a huge computational challenge for clinical usage. By subdividing the entire problem into smaller rFOVs, the iterative reconstruction can be accelerated on a desktop with a single graphic processing unit (GPU). This rFOV strategy divides the iterative reconstruction into blocks, based on the block-diagonal dominant structure. A near real-time reconstruction system was developed for the clinical MR unit, and parallel computing was implemented using the object-oriented model. In addition, the Toeplitz method was implemented on the GPU to reduce the time required for full interpolation. Using the data acquired from the PROPELLER MRI, the reconstructed images were then saved in the digital imaging and communications in medicine format. The proposed rFOV reconstruction reduced the gridding time by 97%, as the total iteration time was 3 s even with multiple processes running. A phantom study showed that the structure similarity index for rFOV reconstruction was statistically superior to conventional density compensation (p < 0.001). In vivo study validated the increased signal-to-noise ratio, which is over four times higher than with density compensation. Image sharpness index was improved using the regularized reconstruction implemented. The rFOV strategy permits near real-time iterative reconstruction to improve the image quality of PROPELLER images. Substantial improvements in image quality metrics were validated in the experiments. The concept of rFOV reconstruction may potentially be applied to other kinds of iterative reconstructions for shortened reconstruction duration.
Quantitative high-efficiency cadmium-zinc-telluride SPECT with dedicated parallel-hole collimation system in obese patients: results of a multi-center study.

PubMed

Nakazato, Ryo; Slomka, Piotr J; Fish, Mathews; Schwartz, Ronald G; Hayes, Sean W; Thomson, Louise E J; Friedman, John D; Lemley, Mark; Mackin, Maria L; Peterson, Benjamin; Schwartz, Arielle M; Doran, Jesse A; Germano, Guido; Berman, Daniel S

2015-04-01

Obesity is a common source of artifact on conventional SPECT myocardial perfusion imaging (MPI). We evaluated image quality and diagnostic performance of high-efficiency (HE) cadmium-zinc-telluride parallel-hole SPECT MPI for coronary artery disease (CAD) in obese patients. 118 consecutive obese patients at three centers (BMI 43.6 ± 8.9 kg·m(-2), range 35-79.7 kg·m(-2)) had upright/supine HE-SPECT and invasive coronary angiography > 6 months (n = 67) or low likelihood of CAD (n = 51). Stress quantitative total perfusion deficit (TPD) for upright (U-TPD), supine (S-TPD), and combined acquisitions (C-TPD) was assessed. Image quality (IQ; 5 = excellent; < 3 nondiagnostic) was compared among BMI 35-39.9 (n = 58), 40-44.9 (n = 24) and ≥45 (n = 36) groups. ROC curve area for CAD detection (≥50% stenosis) for U-TPD, S-TPD, and C-TPD were 0.80, 0.80, and 0.87, respectively. Sensitivity/specificity was 82%/57% for U-TPD, 74%/71% for S-TPD, and 80%/82% for C-TPD. C-TPD had highest specificity (P = .02). C-TPD normalcy rate was higher than U-TPD (88% vs 75%, P = .02). Mean IQ was similar among BMI 35-39.9, 40-44.9 and ≥45 groups [4.6 vs 4.4 vs 4.5, respectively (P = .6)]. No patient had a nondiagnostic stress scan. In obese patients, HE-SPECT MPI with dedicated parallel-hole collimation demonstrated high image quality, normalcy rate, and diagnostic accuracy for CAD by quantitative analysis of combined upright/supine acquisitions.
Quantitative High-Efficiency Cadmium-Zinc-Telluride SPECT with Dedicated Parallel-Hole Collimation System in Obese Patients: Results of a Multi-Center Study

PubMed Central

Nakazato, Ryo; Slomka, Piotr J.; Fish, Mathews; Schwartz, Ronald G.; Hayes, Sean W.; Thomson, Louise E.J.; Friedman, John D.; Lemley, Mark; Mackin, Maria L.; Peterson, Benjamin; Schwartz, Arielle M.; Doran, Jesse A.; Germano, Guido; Berman, Daniel S.

2014-01-01

Background Obesity is a common source of artifact on conventional SPECT myocardial perfusion imaging (MPI). We evaluated image quality and diagnostic performance of high-efficiency (HE) cadmium-zinc-telluride (CZT) parallel-hole SPECT-MPI for coronary artery disease (CAD) in obese patients. Methods and Results 118 consecutive obese patients at 3 centers (BMI 43.6±8.9 kg/m2, range 35–79.7 kg/m2) had upright/supine HE-SPECT and ICA >6 months (n=67) or low-likelihood of CAD (n=51). Stress quantitative total perfusion deficit (TPD) for upright (U-TPD), supine (S-TPD) and combined acquisitions (C-TPD) was assessed. Image quality (IQ; 5=excellent; <3 nondiagnostic) was compared among BMI 35–39.9 (n=58), 40–44.9 (n=24) and ≥45 (n=36) groups. ROC-curve area for CAD detection (≥50% stenosis) for U-TPD, S-TPD, and C-TPD were 0.80, 0.80, and 0.87, respectively. Sensitivity/specificity was 82%/57% for U-TPD, 74%/71% for S-TPD, and 80%/82% for C-TPD. C-TPD had highest specificity (P=.02). C-TPD normalcy rate was higher than U-TPD (88% vs. 75%, P=.02). Mean IQ was similar among BMI 35–39.9, 40–44.9 and ≥45 groups [4.6 vs. 4.4 vs. 4.5, respectively (P=.6)]. No patient had a non-diagnostic stress scan. Conclusions In obese patients, HE-SPECT MPI with dedicated parallel-hole collimation demonstrated high image quality, normalcy rate, and diagnostic accuracy for CAD by quantitative analysis of combined upright/supine acquisitions. PMID:25388380
Parallel processing in the honeybee olfactory pathway: structure, function, and evolution.

PubMed

Rössler, Wolfgang; Brill, Martin F

2013-11-01

Animals face highly complex and dynamic olfactory stimuli in their natural environments, which require fast and reliable olfactory processing. Parallel processing is a common principle of sensory systems supporting this task, for example in visual and auditory systems, but its role in olfaction remained unclear. Studies in the honeybee focused on a dual olfactory pathway. Two sets of projection neurons connect glomeruli in two antennal-lobe hemilobes via lateral and medial tracts in opposite sequence with the mushroom bodies and lateral horn. Comparative studies suggest that this dual-tract circuit represents a unique adaptation in Hymenoptera. Imaging studies indicate that glomeruli in both hemilobes receive redundant sensory input. Recent simultaneous multi-unit recordings from projection neurons of both tracts revealed widely overlapping response profiles strongly indicating parallel olfactory processing. Whereas lateral-tract neurons respond fast with broad (generalistic) profiles, medial-tract neurons are odorant specific and respond slower. In analogy to "what-" and "where" subsystems in visual pathways, this suggests two parallel olfactory subsystems providing "what-" (quality) and "when" (temporal) information. Temporal response properties may support across-tract coincidence coding in higher centers. Parallel olfactory processing likely enhances perception of complex odorant mixtures to decode the diverse and dynamic olfactory world of a social insect.
Accelerating large-scale protein structure alignments with graphics processing units

PubMed Central

2012-01-01

Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Multi-GPU Acceleration of Branchless Distance Driven Projection and Backprojection for Clinical Helical CT.

PubMed

Mitra, Ayan; Politte, David G; Whiting, Bruce R; Williamson, Jeffrey F; O'Sullivan, Joseph A

2017-01-01

Model-based image reconstruction (MBIR) techniques have the potential to generate high quality images from noisy measurements and a small number of projections which can reduce the x-ray dose in patients. These MBIR techniques rely on projection and backprojection to refine an image estimate. One of the widely used projectors for these modern MBIR based technique is called branchless distance driven (DD) projection and backprojection. While this method produces superior quality images, the computational cost of iterative updates keeps it from being ubiquitous in clinical applications. In this paper, we provide several new parallelization ideas for concurrent execution of the DD projectors in multi-GPU systems using CUDA programming tools. We have introduced some novel schemes for dividing the projection data and image voxels over multiple GPUs to avoid runtime overhead and inter-device synchronization issues. We have also reduced the complexity of overlap calculation of the algorithm by eliminating the common projection plane and directly projecting the detector boundaries onto image voxel boundaries. To reduce the time required for calculating the overlap between the detector edges and image voxel boundaries, we have proposed a pre-accumulation technique to accumulate image intensities in perpendicular 2D image slabs (from a 3D image) before projection and after backprojection to ensure our DD kernels run faster in parallel GPU threads. For the implementation of our iterative MBIR technique we use a parallel multi-GPU version of the alternating minimization (AM) algorithm with penalized likelihood update. The time performance using our proposed reconstruction method with Siemens Sensation 16 patient scan data shows an average of 24 times speedup using a single TITAN X GPU and 74 times speedup using 3 TITAN X GPUs in parallel for combined projection and backprojection.

Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
PRECONDITIONED CONJUGATE-GRADIENT 2 (PCG2), a computer program for solving ground-water flow equations

USGS Publications Warehouse

Hill, Mary C.

1990-01-01

This report documents PCG2 : a numerical code to be used with the U.S. Geological Survey modular three-dimensional, finite-difference, ground-water flow model . PCG2 uses the preconditioned conjugate-gradient method to solve the equations produced by the model for hydraulic head. Linear or nonlinear flow conditions may be simulated. PCG2 includes two reconditioning options : modified incomplete Cholesky preconditioning, which is efficient on scalar computers; and polynomial preconditioning, which requires less computer storage and, with modifications that depend on the computer used, is most efficient on vector computers . Convergence of the solver is determined using both head-change and residual criteria. Nonlinear problems are solved using Picard iterations. This documentation provides a description of the preconditioned conjugate gradient method and the two preconditioners, detailed instructions for linking PCG2 to the modular model, sample data inputs, a brief description of PCG2, and a FORTRAN listing.
Preconditioned Mixed Spectral Element Methods for Elasticity and Stokes Problems

NASA Technical Reports Server (NTRS)

Pavarino, Luca F.

1996-01-01

Preconditioned iterative methods for the indefinite systems obtained by discretizing the linear elasticity and Stokes problems with mixed spectral elements in three dimensions are introduced and analyzed. The resulting stiffness matrices have the structure of saddle point problems with a penalty term, which is associated with the Poisson ratio for elasticity problems or with stabilization techniques for Stokes problems. The main results of this paper show that the convergence rate of the resulting algorithms is independent of the penalty parameter, the number of spectral elements Nu and mildly dependent on the spectral degree eta via the inf-sup constant. The preconditioners proposed for the whole indefinite system are block-diagonal and block-triangular. Numerical experiments presented in the final section show that these algorithms are a practical and efficient strategy for the iterative solution of the indefinite problems arising from mixed spectral element discretizations of elliptic systems.
Characterizing the inverses of block tridiagonal, block Toeplitz matrices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boffi, Nicholas M.; Hill, Judith C.; Reuter, Matthew G.

2014-12-04

We consider the inversion of block tridiagonal, block Toeplitz matrices and comment on the behaviour of these inverses as one moves away from the diagonal. Using matrix M bius transformations, we first present an O(1) representation (with respect to the number of block rows and block columns) for the inverse matrix and subsequently use this representation to characterize the inverse matrix. There are four symmetry-distinct cases where the blocks of the inverse matrix (i) decay to zero on both sides of the diagonal, (ii) oscillate on both sides, (iii) decay on one side and oscillate on the other and (iv)more » decay on one side and grow on the other. This characterization exposes the necessary conditions for the inverse matrix to be numerically banded and may also aid in the design of preconditioners and fast algorithms. Finally, we present numerical examples of these matrix types.« less
Design of a Variational Multiscale Method for Turbulent Compressible Flows

NASA Technical Reports Server (NTRS)

Diosady, Laslo Tibor; Murman, Scott M.

2013-01-01

A spectral-element framework is presented for the simulation of subsonic compressible high-Reynolds-number flows. The focus of the work is maximizing the efficiency of the computational schemes to enable unsteady simulations with a large number of spatial and temporal degrees of freedom. A collocation scheme is combined with optimized computational kernels to provide a residual evaluation with computational cost independent of order of accuracy up to 16th order. The optimized residual routines are used to develop a low-memory implicit scheme based on a matrix-free Newton-Krylov method. A preconditioner based on the finite-difference diagonalized ADI scheme is developed which maintains the low memory of the matrix-free implicit solver, while providing improved convergence properties. Emphasis on low memory usage throughout the solver development is leveraged to implement a coupled space-time DG solver which may offer further efficiency gains through adaptivity in both space and time.
A multi-dimensional nonlinearly implicit, electromagnetic Vlasov-Darwin particle-in-cell (PIC) algorithm

NASA Astrophysics Data System (ADS)

Chen, Guangye; Chacón, Luis; CoCoMans Team

2014-10-01

For decades, the Vlasov-Darwin model has been recognized to be attractive for PIC simulations (to avoid radiative noise issues) in non-radiative electromagnetic regimes. However, the Darwin model results in elliptic field equations that renders explicit time integration unconditionally unstable. Improving on linearly implicit schemes, fully implicit PIC algorithms for both electrostatic and electromagnetic regimes, with exact discrete energy and charge conservation properties, have been recently developed in 1D. This study builds on these recent algorithms to develop an implicit, orbit-averaged, time-space-centered finite difference scheme for the particle-field equations in multiple dimensions. The algorithm conserves energy, charge, and canonical-momentum exactly, even with grid packing. A simple fluid preconditioner allows efficient use of large timesteps, O (√{mi/me}c/veT) larger than the explicit CFL. We demonstrate the accuracy and efficiency properties of the of the algorithm with various numerical experiments in 2D3V.
A fast, preconditioned conjugate gradient Toeplitz solver

NASA Technical Reports Server (NTRS)

Pan, Victor; Schrieber, Robert

1989-01-01

A simple factorization is given of an arbitrary hermitian, positive definite matrix in which the factors are well-conditioned, hermitian, and positive definite. In fact, given knowledge of the extreme eigenvalues of the original matrix A, an optimal improvement can be achieved, making the condition numbers of each of the two factors equal to the square root of the condition number of A. This technique is to applied to the solution of hermitian, positive definite Toeplitz systems. Large linear systems with hermitian, positive definite Toeplitz matrices arise in some signal processing applications. A stable fast algorithm is given for solving these systems that is based on the preconditioned conjugate gradient method. The algorithm exploits Toeplitz structure to reduce the cost of an iteration to O(n log n) by applying the fast Fourier Transform to compute matrix-vector products. Matrix factorization is used as a preconditioner.
Unsymmetric ordering using a constrained Markowitz scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amestoy, Patrick R.; Xiaoye S.; Pralet, Stephane

2005-01-18

We present a family of ordering algorithms that can be used as a preprocessing step prior to performing sparse LU factorization. The ordering algorithms simultaneously achieve the objectives of selecting numerically good pivots and preserving the sparsity. We describe the algorithmic properties and challenges in their implementation. By mixing the two objectives we show that we can reduce the amount of fill-in in the factors and reduce the number of numerical problems during factorization. On a set of large unsymmetric real problems, we obtained the median reductions of 12% in the factorization time, of 13% in the size of themore » LU factors, of 20% in the number of operations performed during the factorization phase, and of 11% in the memory needed by the multifrontal solver MA41-UNS. A byproduct of this ordering strategy is an incomplete LU-factored matrix that can be used as a preconditioner in an iterative solver.« less
Monolithic multigrid methods for two-dimensional resistive magnetohydrodynamics

DOE PAGES

Adler, James H.; Benson, Thomas R.; Cyr, Eric C.; ...

2016-01-06

Magnetohydrodynamic (MHD) representations are used to model a wide range of plasma physics applications and are characterized by a nonlinear system of partial differential equations that strongly couples a charged fluid with the evolution of electromagnetic fields. The resulting linear systems that arise from discretization and linearization of the nonlinear problem are generally difficult to solve. In this paper, we investigate multigrid preconditioners for this system. We consider two well-known multigrid relaxation methods for incompressible fluid dynamics: Braess--Sarazin relaxation and Vanka relaxation. We first extend these to the context of steady-state one-fluid viscoresistive MHD. Then we compare the two relaxationmore » procedures within a multigrid-preconditioned GMRES method employed within Newton's method. To isolate the effects of the different relaxation methods, we use structured grids, inf-sup stable finite elements, and geometric interpolation. Furthermore, we present convergence and timing results for a two-dimensional, steady-state test problem.« less
Some Remarks on GMRES for Transport Theory

NASA Technical Reports Server (NTRS)

Patton, Bruce W.; Holloway, James Paul

2003-01-01

We review some work on the application of GMRES to the solution of the discrete ordinates transport equation in one-dimension. We note that GMRES can be applied directly to the angular flux vector, or it can be applied to only a vector of flux moments as needed to compute the scattering operator of the transport equation. In the former case we illustrate both the delights and defects of ILU right-preconditioners for problems with anisotropic scatter and for problems with upscatter. When working with flux moments we note that GMRES can be used as an accelerator for any existing transport code whose solver is based on a stationary fixed-point iteration, including transport sweeps and DSA transport sweeps. We also provide some numerical illustrations of this idea. We finally show how space can be traded for speed by taking multiple transport sweeps per GMRES iteration. Key Words: transport equation, GMRES, Krylov subspace
High-performance equation solvers and their impact on finite element analysis

NASA Technical Reports Server (NTRS)

Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. Dale, Jr.

1990-01-01

The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number of operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
High-performance equation solvers and their impact on finite element analysis

NASA Technical Reports Server (NTRS)

Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. D., Jr.

1992-01-01

The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number od operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
Analysis of the Hessian for Aerodynamic Optimization: Inviscid Flow

NASA Technical Reports Server (NTRS)

Arian, Eyal; Ta'asan, Shlomo

1996-01-01

In this paper we analyze inviscid aerodynamic shape optimization problems governed by the full potential and the Euler equations in two and three dimensions. The analysis indicates that minimization of pressure dependent cost functions results in Hessians whose eigenvalue distributions are identical for the full potential and the Euler equations. However the optimization problems in two and three dimensions are inherently different. While the two dimensional optimization problems are well-posed the three dimensional ones are ill-posed. Oscillations in the shape up to the smallest scale allowed by the design space can develop in the direction perpendicular to the flow, implying that a regularization is required. A natural choice of such a regularization is derived. The analysis also gives an estimate of the Hessian's condition number which implies that the problems at hand are ill-conditioned. Infinite dimensional approximations for the Hessians are constructed and preconditioners for gradient based methods are derived from these approximate Hessians.
An interior-point method-based solver for simulation of aircraft parts riveting

NASA Astrophysics Data System (ADS)

Stefanova, Maria; Yakunin, Sergey; Petukhova, Margarita; Lupuleac, Sergey; Kokkolaras, Michael

2018-05-01

The particularities of the aircraft parts riveting process simulation necessitate the solution of a large amount of contact problems. A primal-dual interior-point method-based solver is proposed for solving such problems efficiently. The proposed method features a worst case polynomial complexity bound ? on the number of iterations, where n is the dimension of the problem and ε is a threshold related to desired accuracy. In practice, the convergence is often faster than this worst case bound, which makes the method applicable to large-scale problems. The computational challenge is solving the system of linear equations because the associated matrix is ill conditioned. To that end, the authors introduce a preconditioner and a strategy for determining effective initial guesses based on the physics of the problem. Numerical results are compared with ones obtained using the Goldfarb-Idnani algorithm. The results demonstrate the efficiency of the proposed method.
Algorithm 937: MINRES-QLP for Symmetric and Hermitian Linear Equations and Least-Squares Problems

PubMed Central

Choi, Sou-Cheng T.; Saunders, Michael A.

2014-01-01

We describe algorithm MINRES-QLP and its FORTRAN 90 implementation for solving symmetric or Hermitian linear systems or least-squares problems. If the system is singular, MINRES-QLP computes the unique minimum-length solution (also known as the pseudoinverse solution), which generally eludes MINRES. In all cases, it overcomes a potential instability in the original MINRES algorithm. A positive-definite pre-conditioner may be supplied. Our FORTRAN 90 implementation illustrates a design pattern that allows users to make problem data known to the solver but hidden and secure from other program units. In particular, we circumvent the need for reverse communication. Example test programs input and solve real or complex problems specified in Matrix Market format. While we focus here on a FORTRAN 90 implementation, we also provide and maintain MATLAB versions of MINRES and MINRES-QLP. PMID:25328255
Fluid preconditioning for Newton–Krylov-based, fully implicit, electrostatic particle-in-cell simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, G., E-mail: gchen@lanl.gov; Chacón, L.; Leibs, C.A.

2014-02-01

A recent proof-of-principle study proposes an energy- and charge-conserving, nonlinearly implicit electrostatic particle-in-cell (PIC) algorithm in one dimension [9]. The algorithm in the reference employs an unpreconditioned Jacobian-free Newton–Krylov method, which ensures nonlinear convergence at every timestep (resolving the dynamical timescale of interest). Kinetic enslavement, which is one key component of the algorithm, not only enables fully implicit PIC as a practical approach, but also allows preconditioning the kinetic solver with a fluid approximation. This study proposes such a preconditioner, in which the linearized moment equations are closed with moments computed from particles. Effective acceleration of the linear GMRES solvemore » is demonstrated, on both uniform and non-uniform meshes. The algorithm performance is largely insensitive to the electron–ion mass ratio. Numerical experiments are performed on a 1D multi-scale ion acoustic wave test problem.« less
Higher Order Bases in a 2D Hybrid BEM/FEM Formulation

NASA Technical Reports Server (NTRS)

Fink, Patrick W.; Wilton, Donald R.

2002-01-01

The advantages of using higher order, interpolatory basis functions are examined in the analysis of transverse electric (TE) plane wave scattering by homogeneous, dielectric cylinders. A boundary-element/finite-element (BEM/FEM) hybrid formulation is employed in which the interior dielectric region is modeled with the vector Helmholtz equation, and a radiation boundary condition is supplied by an Electric Field Integral Equation (EFIE). An efficient method of handling the singular self-term arising in the EFIE is presented. The iterative solution of the partially dense system of equations is obtained using the Quasi-Minimal Residual (QMR) algorithm with an Incomplete LU Threshold (ILUT) preconditioner. Numerical results are shown for the case of an incident wave impinging upon a square dielectric cylinder. The convergence of the solution is shown versus the number of unknowns as a function of the completeness order of the basis functions.
Efficient numerical calculation of MHD equilibria with magnetic islands, with particular application to saturated neoclassical tearing modes

NASA Astrophysics Data System (ADS)

Raburn, Daniel Louis

We have developed a preconditioned, globalized Jacobian-free Newton-Krylov (JFNK) solver for calculating equilibria with magnetic islands. The solver has been developed in conjunction with the Princeton Iterative Equilibrium Solver (PIES) and includes two notable enhancements over a traditional JFNK scheme: (1) globalization of the algorithm by a sophisticated backtracking scheme, which optimizes between the Newton and steepest-descent directions; and, (2) adaptive preconditioning, wherein information regarding the system Jacobian is reused between Newton iterations to form a preconditioner for our GMRES-like linear solver. We have developed a formulation for calculating saturated neoclassical tearing modes (NTMs) which accounts for the incomplete loss of a bootstrap current due to gradients of multiple physical quantities. We have applied the coupled PIES-JFNK solver to calculate saturated island widths on several shots from the Tokamak Fusion Test Reactor (TFTR) and have found reasonable agreement with experimental measurement.
Performance of multi-hop parallel free-space optical communication over gamma-gamma fading channel with pointing errors.

PubMed

Gao, Zhengguang; Liu, Hongzhan; Ma, Xiaoping; Lu, Wei

2016-11-10

Multi-hop parallel relaying is considered in a free-space optical (FSO) communication system deploying binary phase-shift keying (BPSK) modulation under the combined effects of a gamma-gamma (GG) distribution and misalignment fading. Based on the best path selection criterion, the cumulative distribution function (CDF) of this cooperative random variable is derived. Then the performance of this optical mesh network is analyzed in detail. A Monte Carlo simulation is also conducted to demonstrate the effectiveness of the results for the average bit error rate (ABER) and outage probability. The numerical result proves that it needs a smaller average transmitted optical power to achieve the same ABER and outage probability when using the multi-hop parallel network in FSO links. Furthermore, the system use of more number of hops and cooperative paths can improve the quality of the communication.
Monte Carlo calculations of electron beam quality conversion factors for several ion chamber types.

PubMed

Muir, B R; Rogers, D W O

2014-11-01

To provide a comprehensive investigation of electron beam reference dosimetry using Monte Carlo simulations of the response of 10 plane-parallel and 18 cylindrical ion chamber types. Specific emphasis is placed on the determination of the optimal shift of the chambers' effective point of measurement (EPOM) and beam quality conversion factors. The EGSnrc system is used for calculations of the absorbed dose to gas in ion chamber models and the absorbed dose to water as a function of depth in a water phantom on which cobalt-60 and several electron beam source models are incident. The optimal EPOM shifts of the ion chambers are determined by comparing calculations of R50 converted from I50 (calculated using ion chamber simulations in phantom) to R50 calculated using simulations of the absorbed dose to water vs depth in water. Beam quality conversion factors are determined as the calculated ratio of the absorbed dose to water to the absorbed dose to air in the ion chamber at the reference depth in a cobalt-60 beam to that in electron beams. For most plane-parallel chambers, the optimal EPOM shift is inside of the active cavity but different from the shift determined with water-equivalent scaling of the front window of the chamber. These optimal shifts for plane-parallel chambers also reduce the scatter of beam quality conversion factors, kQ, as a function of R50. The optimal shift of cylindrical chambers is found to be less than the 0.5 rcav recommended by current dosimetry protocols. In most cases, the values of the optimal shift are close to 0.3 rcav. Values of kecal are calculated and compared to those from the TG-51 protocol and differences are explained using accurate individual correction factors for a subset of ion chambers investigated. High-precision fits to beam quality conversion factors normalized to unity in a beam with R50 = 7.5 cm (kQ (')) are provided. These factors avoid the use of gradient correction factors as used in the TG-51 protocol although a chamber dependent optimal shift in the EPOM is required when using plane-parallel chambers while no shift is needed with cylindrical chambers. The sensitivity of these results to parameters used to model the ion chambers is discussed and the uncertainty related to the practical use of these results is evaluated. These results will prove useful as electron beam reference dosimetry protocols are being updated. The analysis of this work indicates that cylindrical ion chambers may be appropriate for use in low-energy electron beams but measurements are required to characterize their use in these beams.

Finite fringe hologram

NASA Technical Reports Server (NTRS)

Heflinger, L. O.

1970-01-01

In holographic interferometry a small movement of apparatus between exposures causes the background of the reconstructed scene to be covered with interference fringes approximately parallel to each other. The three-dimensional quality of the holographic image is allowable since a mathematical model will give the location of the fringes.
The impact of traffic-flow patterns on air quality in urban street canyons.

PubMed

Thaker, Prashant; Gokhale, Sharad

2016-01-01

We investigated the effect of different urban traffic-flow patterns on pollutant dispersion in different winds in a real asymmetric street canyon. Free-flow traffic causes more turbulence in the canyon facilitating more dispersion and a reduction in pedestrian level concentration. The comparison of with and without a vehicle-induced-turbulence revealed that when winds were perpendicular, the free-flow traffic reduced the concentration by 73% on the windward side with a minor increase of 17% on the leeward side, whereas for parallel winds, it reduced the concentration by 51% and 29%. The congested-flow traffic increased the concentrations on the leeward side by 47% when winds were perpendicular posing a higher risk to health, whereas reduced it by 17-42% for parallel winds. The urban air quality and public health can, therefore, be improved by improving the traffic-flow patterns in street canyons as vehicle-induced turbulence has been shown to contribute significantly to dispersion. Copyright © 2015 Elsevier Ltd. All rights reserved.
Applications of ERTS-A Data Collection System (DCS) in the Arizona Regional Ecological Test Site (ARETS)

NASA Technical Reports Server (NTRS)

Schumann, H. H. (Principal Investigator)

1972-01-01

The author has identified the following significant results. Preliminary analysis of DCS data from the USGS Verde River stream flow measuring site indicates the DCS system is furnishing high quality data more frequently than had been expected. During the 43-day period between Nov. 3, and Dec. 15, 1972, 552 DCS transmissions were received during 193 data passes. The amount of data received far exceeded the single high quality transmission per 12-hour period expected from the DCS system. The digital-parallel ERTS-1 data has furnished sufficient to accurately compute mean daily gage heights. These in turn, are used to compute average daily streamflow rates during periods of stable or slowly changing flow conditions. The digital-parallel data has also furnished useful information during peak flow periods. However, the serial-digital DCS capability, currently under development for transmitting streamflow data, should provide data of greater utility for determining times of flood peaks.
Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling

NASA Technical Reports Server (NTRS)

Rios, Joseph Lucio; Ross, Kevin

2009-01-01

Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
High-speed parallel implementation of a modified PBR algorithm on DSP-based EH topology

NASA Astrophysics Data System (ADS)

Rajan, K.; Patnaik, L. M.; Ramakrishna, J.

1997-08-01

Algebraic Reconstruction Technique (ART) is an age-old method used for solving the problem of three-dimensional (3-D) reconstruction from projections in electron microscopy and radiology. In medical applications, direct 3-D reconstruction is at the forefront of investigation. The simultaneous iterative reconstruction technique (SIRT) is an ART-type algorithm with the potential of generating in a few iterations tomographic images of a quality comparable to that of convolution backprojection (CBP) methods. Pixel-based reconstruction (PBR) is similar to SIRT reconstruction, and it has been shown that PBR algorithms give better quality pictures compared to those produced by SIRT algorithms. In this work, we propose a few modifications to the PBR algorithms. The modified algorithms are shown to give better quality pictures compared to PBR algorithms. The PBR algorithm and the modified PBR algorithms are highly compute intensive, Not many attempts have been made to reconstruct objects in the true 3-D sense because of the high computational overhead. In this study, we have developed parallel two-dimensional (2-D) and 3-D reconstruction algorithms based on modified PBR. We attempt to solve the two problems encountered by the PBR and modified PBR algorithms, i.e., the long computational time and the large memory requirements, by parallelizing the algorithm on a multiprocessor system. We investigate the possible task and data partitioning schemes by exploiting the potential parallelism in the PBR algorithm subject to minimizing the memory requirement. We have implemented an extended hypercube (EH) architecture for the high-speed execution of the 3-D reconstruction algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs) and dual-port random access memories (DPR) as channels between the PEs. We discuss and compare the performances of the PBR algorithm on an IBM 6000 RISC workstation, on a Silicon Graphics Indigo 2 workstation, and on an EH system. The results show that an EH(3,1) using DSP chips as PEs executes the modified PBR algorithm about 100 times faster than an LBM 6000 RISC workstation. We have executed the algorithms on a 4-node IBM SP2 parallel computer. The results show that execution time of the algorithm on an EH(3,1) is better than that of a 4-node IBM SP2 system. The speed-up of an EH(3,1) system with eight PEs and one network controller is approximately 7.85.
Rapid evaluation and quality control of next generation sequencing data with FaQCs

DOE PAGES

Lo, Chien -Chi; Chain, Patrick S. G.

2014-12-01

Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Common Sense about Taste: From Mammals to Insects

PubMed Central

Yarmolinsky, David A.; Zuker, Charles S.; Ryba, Nicholas J.P.

2013-01-01

The sense of taste is a specialized chemosensory system dedicated to the evaluation of food and drink. Despite the fact that vertebrates and insects have independently evolved distinct anatomic and molecular pathways for taste sensation, there are clear parallels in the organization and coding logic between the two systems. There is now persuasive evidence that tastant quality is mediated by labeled lines, whereby distinct and strictly segregated populations of taste receptor cells encode each of the taste qualities. PMID:19837029
Methods and systems for fabricating high quality superconducting tapes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Majkic, Goran; Selvamanickam, Venkat

An MOCVD system fabricates high quality superconductor tapes with variable thicknesses. The MOCVD system can include a gas flow chamber between two parallel channels in a housing. A substrate tape is heated and then passed through the MOCVD housing such that the gas flow is perpendicular to the tape's surface. Precursors are injected into the gas flow for deposition on the substrate tape. In this way, superconductor tapes can be fabricated with variable thicknesses, uniform precursor deposition, and high critical current densities.
Computational strategy for the solution of large strain nonlinear problems using the Wilkins explicit finite-difference approach

NASA Technical Reports Server (NTRS)

Hofmann, R.

1980-01-01

The STEALTH code system, which solves large strain, nonlinear continuum mechanics problems, was rigorously structured in both overall design and programming standards. The design is based on the theoretical elements of analysis while the programming standards attempt to establish a parallelism between physical theory, programming structure, and documentation. These features have made it easy to maintain, modify, and transport the codes. It has also guaranteed users a high level of quality control and quality assurance.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, Chien -Chi; Chain, Patrick S. G.

Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Transforming Schools through Total Quality Education.

ERIC Educational Resources Information Center

Schmoker, Mike; Wilson, Richard B.

1993-01-01

Deming's work emphasizes advantages of teamwork, investment in ongoing training for all employees to increase their value to the company, and insistence that research and employee-gathered data guide and inform every decision and improvement effort. The parallel between psychologist Mihaly Csikszenmihalyi's work and Deming's shows that Total…
Applications of New Surrogate Global Optimization Algorithms including Efficient Synchronous and Asynchronous Parallelism for Calibration of Expensive Nonlinear Geophysical Simulation Models.

NASA Astrophysics Data System (ADS)

Shoemaker, C. A.; Pang, M.; Akhtar, T.; Bindel, D.

2016-12-01

New parallel surrogate global optimization algorithms are developed and applied to objective functions that are expensive simulations (possibly with multiple local minima). The algorithms can be applied to most geophysical simulations, including those with nonlinear partial differential equations. The optimization does not require simulations be parallelized. Asynchronous (and synchronous) parallel execution is available in the optimization toolbox "pySOT". The parallel algorithms are modified from serial to eliminate fine grained parallelism. The optimization is computed with open source software pySOT, a Surrogate Global Optimization Toolbox that allows user to pick the type of surrogate (or ensembles), the search procedure on surrogate, and the type of parallelism (synchronous or asynchronous). pySOT also allows the user to develop new algorithms by modifying parts of the code. In the applications here, the objective function takes up to 30 minutes for one simulation, and serial optimization can take over 200 hours. Results from Yellowstone (NSF) and NCSS (Singapore) supercomputers are given for groundwater contaminant hydrology simulations with applications to model parameter estimation and decontamination management. All results are compared with alternatives. The first results are for optimization of pumping at many wells to reduce cost for decontamination of groundwater at a superfund site. The optimization runs with up to 128 processors. Superlinear speed up is obtained for up to 16 processors, and efficiency with 64 processors is over 80%. Each evaluation of the objective function requires the solution of nonlinear partial differential equations to describe the impact of spatially distributed pumping and model parameters on model predictions for the spatial and temporal distribution of groundwater contaminants. The second application uses an asynchronous parallel global optimization for groundwater quality model calibration. The time for a single objective function evaluation varies unpredictably, so efficiency is improved with asynchronous parallel calculations to improve load balancing. The third application (done at NCSS) incorporates new global surrogate multi-objective parallel search algorithms into pySOT and applies it to a large watershed calibration problem.
No hospital left behind? Education policy lessons for value-based payment in healthcare.

PubMed

Maurer, Kristin A; Ryan, Andrew M

2016-01-01

Value-based payment systems have been widely implemented in healthcare in an effort to improve the quality of care. However, these programs have not broadly improved quality, and some evidence suggests that they may increase inequities in care. No Child Left Behind is a parallel effort in education to address uneven achievement and inequalities. Yet, by penalizing the lowest performers, No Child Left Behind's approach to accountability has led to a number of unintended consequences. This article draws lessons from education policy, arguing that financial incentives should be designed to support the lowest performers to improve quality. © 2015 Society of Hospital Medicine.
40 CFR 63.10010 - What are my monitoring, installation, operation, and maintenance requirements?

Code of Federal Regulations, 2013 CFR

2013-07-01

... that emissions are controlled with a common control device or series of control devices, are discharged... parallel control devices or multiple series of control devices are discharged to the atmosphere through... quality control activities (including, as applicable, calibration checks and required zero and span...
40 CFR 63.10010 - What are my monitoring, installation, operation, and maintenance requirements?

Code of Federal Regulations, 2014 CFR

2014-07-01

... that emissions are controlled with a common control device or series of control devices, are discharged... parallel control devices or multiple series of control devices are discharged to the atmosphere through... quality control activities (including, as applicable, calibration checks and required zero and span...
Parallel Computing and Model Evaluation for Environmental Systems: An Overview of the Supermuse and Frames Software Technologies

EPA Science Inventory

ERD’s Supercomputer for Model Uncertainty and Sensitivity Evaluation (SuperMUSE) is a key to enhancing quality assurance in environmental models and applications. Uncertainty analysis and sensitivity analysis remain critical, though often overlooked steps in the development and e...
Nonthermal plasma system for extending shelf life of raw broiler breast fillets

USDA-ARS?s Scientific Manuscript database

A nonthermal dielectric barrier discharge (DBD) plasma system was developed and enhanced to treat broiler breast fillets (BBF) in order to improve the microbial quality of the meat. The system consisted of a high-voltage source and two parallel, round-aluminum electrodes separated by three semi-rig...
Promoting Quality and Variety through the Public Financing of Privately Operated Schools in Qatar

ERIC Educational Resources Information Center

Constant, Louay; Goldman, Charles A.; Zellman, Gail L.; Augustine, Catherine H.; Galama, Titus; Gonzalez, Gabriella; Guarino, C. A.; Karam, Rita; Ryan, Gery W.; Salem, Hanine

2010-01-01

In 2002, Qatar began establishing publicly funded, privately operated "independent schools" in parallel with the existing, centralized Ministry of Education system. The reform that drove the establishment of the independent schools included accountability provisions such as (a) measuring school and student performance and (b)…
Improved parallel image reconstruction using feature refinement.

PubMed

Cheng, Jing; Jia, Sen; Ying, Leslie; Liu, Yuanyuan; Wang, Shanshan; Zhu, Yanjie; Li, Ye; Zou, Chao; Liu, Xin; Liang, Dong

2018-07-01

The aim of this study was to develop a novel feature refinement MR reconstruction method from highly undersampled multichannel acquisitions for improving the image quality and preserve more detail information. The feature refinement technique, which uses a feature descriptor to pick up useful features from residual image discarded by sparsity constrains, is applied to preserve the details of the image in compressed sensing and parallel imaging in MRI (CS-pMRI). The texture descriptor and structure descriptor recognizing different types of features are required for forming the feature descriptor. Feasibility of the feature refinement was validated using three different multicoil reconstruction methods on in vivo data. Experimental results show that reconstruction methods with feature refinement improve the quality of reconstructed image and restore the image details more accurately than the original methods, which is also verified by the lower values of the root mean square error and high frequency error norm. A simple and effective way to preserve more useful detailed information in CS-pMRI is proposed. This technique can effectively improve the reconstruction quality and has superior performance in terms of detail preservation compared with the original version without feature refinement. Magn Reson Med 80:211-223, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Investigation of vinegar production using a novel shaken repeated batch culture system.

PubMed

Schlepütz, Tino; Büchs, Jochen

2013-01-01

Nowadays, bioprocesses are developed or optimized on small scale. Also, vinegar industry is motivated to reinvestigate the established repeated batch fermentation process. As yet, there is no small-scale culture system for optimizing fermentation conditions for repeated batch bioprocesses. Thus, the aim of this study is to propose a new shaken culture system for parallel repeated batch vinegar fermentation. A new operation mode - the flushing repeated batch - was developed. Parallel repeated batch vinegar production could be established in shaken overflow vessels in a completely automated operation with only one pump per vessel. This flushing repeated batch was first theoretically investigated and then empirically tested. The ethanol concentration was online monitored during repeated batch fermentation by semiconductor gas sensors. It was shown that the switch from one ethanol substrate quality to different ethanol substrate qualities resulted in prolonged lag phases and durations of the first batches. In the subsequent batches the length of the fermentations decreased considerably. This decrease in the respective lag phases indicates an adaptation of the acetic acid bacteria mixed culture to the specific ethanol substrate quality. Consequently, flushing repeated batch fermentations on small scale are valuable for screening fermentation conditions and, thereby, improving industrial-scale bioprocesses such as vinegar production in terms of process robustness, stability, and productivity. Copyright © 2013 American Institute of Chemical Engineers.

Streamlined approach to high-quality purification and identification of compound series using high-resolution MS and NMR.

PubMed

Mühlebach, Anneke; Adam, Joachim; Schön, Uwe

2011-11-01

Automated medicinal chemistry (parallel chemistry) has become an integral part of the drug-discovery process in almost every large pharmaceutical company. Parallel array synthesis of individual organic compounds has been used extensively to generate diverse structural libraries to support different phases of the drug-discovery process, such as hit-to-lead, lead finding, or lead optimization. In order to guarantee effective project support, efficiency in the production of compound libraries has been maximized. As a consequence, also throughput in chromatographic purification and analysis has been adapted. As a recent trend, more laboratories are preparing smaller, yet more focused libraries with even increasing demands towards quality, i.e. optimal purity and unambiguous confirmation of identity. This paper presents an automated approach how to combine effective purification and structural conformation of a lead optimization library created by microwave-assisted organic synthesis. The results of complementary analytical techniques such as UHPLC-HRMS and NMR are not only regarded but even merged for fast and easy decision making, providing optimal quality of compound stock. In comparison with the previous procedures, throughput times are at least four times faster, while compound consumption could be decreased more than threefold. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Practical aspects of prestack depth migration with finite differences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ober, C.C.; Oldfield, R.A.; Womble, D.E.

1997-07-01

Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Design and Implementation of a Parallel Multivariate Ensemble Kalman Filter for the Poseidon Ocean General Circulation Model

NASA Technical Reports Server (NTRS)

Keppenne, Christian L.; Rienecker, Michele M.; Koblinsky, Chester (Technical Monitor)

2001-01-01

A multivariate ensemble Kalman filter (MvEnKF) implemented on a massively parallel computer architecture has been implemented for the Poseidon ocean circulation model and tested with a Pacific Basin model configuration. There are about two million prognostic state-vector variables. Parallelism for the data assimilation step is achieved by regionalization of the background-error covariances that are calculated from the phase-space distribution of the ensemble. Each processing element (PE) collects elements of a matrix measurement functional from nearby PEs. To avoid the introduction of spurious long-range covariances associated with finite ensemble sizes, the background-error covariances are given compact support by means of a Hadamard (element by element) product with a three-dimensional canonical correlation function. The methodology and the MvEnKF configuration are discussed. It is shown that the regionalization of the background covariances; has a negligible impact on the quality of the analyses. The parallel algorithm is very efficient for large numbers of observations but does not scale well beyond 100 PEs at the current model resolution. On a platform with distributed memory, memory rather than speed is the limiting factor.
Breaking Barriers to Low-Cost Modular Inverter Production & Use

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bogdan Borowy; Leo Casey; Jerry Foshage

2005-05-31

The goal of this cost share contract is to advance key technologies to reduce size, weight and cost while enhancing performance and reliability of Modular Inverter Product for Distributed Energy Resources (DER). Efforts address technology development to meet technical needs of DER market protection, isolation, reliability, and quality. Program activities build on SatCon Technology Corporation inverter experience (e.g., AIPM, Starsine, PowerGate) for Photovoltaic, Fuel Cell, Energy Storage applications. Efforts focused four technical areas, Capacitors, Cooling, Voltage Sensing and Control of Parallel Inverters. Capacitor efforts developed a hybrid capacitor approach for conditioning SatCon's AIPM unit supply voltages by incorporating several typesmore » and sizes to store energy and filter at high, medium and low frequencies while minimizing parasitics (ESR and ESL). Cooling efforts converted the liquid cooled AIPM module to an air-cooled unit using augmented fin, impingement flow cooling. Voltage sensing efforts successfully modified the existing AIPM sensor board to allow several, application dependent configurations and enabling voltage sensor galvanic isolation. Parallel inverter control efforts realized a reliable technique to control individual inverters, connected in a parallel configuration, without a communication link. Individual inverter currents, AC and DC, were balanced in the paralleled modules by introducing a delay to the individual PWM gate pulses. The load current sharing is robust and independent of load types (i.e., linear and nonlinear, resistive and/or inductive). It is a simple yet powerful method for paralleling both individual devices dramatically improves reliability and fault tolerance of parallel inverter power systems. A patent application has been made based on this control technology.« less
New machining method of high precision infrared window part

NASA Astrophysics Data System (ADS)

Yang, Haicheng; Su, Ying; Xu, Zengqi; Guo, Rui; Li, Wenting; Zhang, Feng; Liu, Xuanmin

2016-10-01

Most of the spherical shell of the photoelectric multifunctional instrument was designed as multi optical channel mode to adapt to the different band of the sensor, there were mainly TV, laser and infrared channels. Without affecting the optical diameter, wind resistance and pneumatic performance of the optical system, the overall layout of the spherical shell was optimized to save space and reduce weight. Most of the shape of the optical windows were special-shaped, each optical window directly participated in the high resolution imaging of the corresponding sensor system, and the optical axis parallelism of each sensor needed to meet the accuracy requirement of 0.05mrad.Therefore precision machining of optical window parts quality will directly affect the photoelectric system's pointing accuracy and interchangeability. Processing and testing of the TV and laser window had been very mature, while because of the special nature of the material, transparent and high refractive rate, infrared window parts had the problems of imaging quality and the control of the minimum focal length and second level parallel in the processing. Based on years of practical experience, this paper was focused on how to control the shape and parallel difference precision of infrared window parts in the processing. Single pass rate was increased from 40% to more than 95%, the processing efficiency was significantly enhanced, an effective solution to the bottleneck problem in the actual processing, which effectively solve the bottlenecks in research and production.
HARP: A Dynamic Inertial Spectral Partitioner

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Sohn, Andrew; Biswas, Rupak

1997-01-01

Partitioning unstructured graphs is central to the parallel solution of computational science and engineering problems. Spectral partitioners, such recursive spectral bisection (RSB), have proven effecfive in generating high-quality partitions of realistically-sized meshes. The major problem which hindered their wide-spread use was their long execution times. This paper presents a new inertial spectral partitioner, called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime in a manner that works efficiently for real applications in the context of distributed-memory machines. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigerivectors of the original mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime. Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP2 which is several times faster than other spectral partitioners while maintaining the solution quality of the proven RSB method. A parallel WI version of HARP has also been implemented on IBM SP2 and Cray T3E. Parallel HARP, running on 64 processors SP2 and T3E, can partition a mesh containing more than 100,000 vertices into 64 subgrids in about half a second. These results indicate that graph partitioning can now be truly embedded in dynamically-changing real-world applications.
Low heat transfer, high strength window materials

DOEpatents

Berlad, Abraham L.; Salzano, Francis J.; Batey, John E.

1978-01-01

A multi-pane window with improved insulating qualities; comprising a plurality of transparent or translucent panes held in an essentially parallel, spaced-apart relationship by a frame. Between at least one pair of panes is a convection defeating means comprising an array of parallel slats or cells so designed as to prevent convection currents from developing in the space between the two panes. The convection defeating structures may have reflective surfaces so as to improve the collection and transmittance of the incident radiant energy. These same means may be used to control (increase or decrease) the transmittance of solar energy as well as to decouple the radiative transfer between the interior surfaces of the transparent panes.
An information theory of image gathering

NASA Technical Reports Server (NTRS)

Fales, Carl L.; Huck, Friedrich O.

1991-01-01

Shannon's mathematical theory of communication is extended to image gathering. Expressions are obtained for the total information that is received with a single image-gathering channel and with parallel channels. It is concluded that the aliased signal components carry information even though these components interfere with the within-passband components in conventional image gathering and restoration, thereby degrading the fidelity and visual quality of the restored image. An examination of the expression for minimum mean-square-error, or Wiener-matrix, restoration from parallel image-gathering channels reveals a method for unscrambling the within-passband and aliased signal components to restore spatial frequencies beyond the sampling passband out to the spatial frequency response cutoff of the optical aperture.
Accelerating separable footprint (SF) forward and back projection on GPU

NASA Astrophysics Data System (ADS)

Xie, Xiaobin; McGaffin, Madison G.; Long, Yong; Fessler, Jeffrey A.; Wen, Minhua; Lin, James

2017-03-01

Statistical image reconstruction (SIR) methods for X-ray CT can improve image quality and reduce radiation dosages over conventional reconstruction methods, such as filtered back projection (FBP). However, SIR methods require much longer computation time. The separable footprint (SF) forward and back projection technique simplifies the calculation of intersecting volumes of image voxels and finite-size beams in a way that is both accurate and efficient for parallel implementation. We propose a new method to accelerate the SF forward and back projection on GPU with NVIDIA's CUDA environment. For the forward projection, we parallelize over all detector cells. For the back projection, we parallelize over all 3D image voxels. The simulation results show that the proposed method is faster than the acceleration method of the SF projectors proposed by Wu and Fessler.13 We further accelerate the proposed method using multiple GPUs. The results show that the computation time is reduced approximately proportional to the number of GPUs.
Why is intelligence correlated with semen quality?: Biochemical pathways common to sperm and neuron function and their vulnerability to pleiotropic mutations.

PubMed

Pierce, Arand; Miller, Geoffrey; Arden, Rosalind; Gottfredson, Linda S

2009-09-01

We recently found positive correlations between human general intelligence and three key indices of semen quality, and hypothesized that these correlations arise through a phenotype-wide 'general fitness factor' reflecting overall mutation load. In this addendum we consider some of the biochemical pathways that may act as targets for pleiotropic mutations that disrupt both neuron function and sperm function in parallel. We focus especially on the inter-related roles of polyunsaturated fatty acids, exocytosis and receptor signaling.
An "artificial retina" processor for track reconstruction at the full LHC crossing rate

NASA Astrophysics Data System (ADS)

Abba, A.; Bedeschi, F.; Caponio, F.; Cenci, R.; Citterio, M.; Cusimano, A.; Fu, J.; Geraci, A.; Grizzuti, M.; Lusardi, N.; Marino, P.; Morello, M. J.; Neri, N.; Ninci, D.; Petruzzo, M.; Piucci, A.; Punzi, G.; Ristori, L.; Spinella, F.; Stracka, S.; Tonelli, D.; Walsh, J.

2016-07-01

We present the latest results of an R&D study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.
An "artificial retina" processor for track reconstruction at the full LHC crossing rate

DOE PAGES

Abba, A.; F. Bedeschi; Caponio, F.; ...

2015-10-23

Here, we present the latest results of an R&D; study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.
76 FR 53638 - Approval and Promulgation of Air Quality Implementation Plans; Delaware; Infrastructure State...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-29

...), DNREC supplemented its September 16, 2009 submittal with a technical analysis submitted to EPA for... supplemental technical analysis, for which it has requested parallel-processing, through the public notice and... submitted the technical analysis to EPA as a formal supplement to its September 16, 2009 submittal. The...
Evaluation of Total Dissolved Solids and Specific Conductance Water Quality Targets with Paired Single-Species and Mesocosm Community Exposures

EPA Science Inventory

Isolated single-species exposures were conducted in parallel with 42 d mesocosm dosing studies that measured in-situ and whole community responses to different recipes of excess total dissolved solids (TDS). The studies were conducted with cultured species and native taxa from mo...
Properties of medium-density fiberboard related to hardwood specific gravity

Treesearch

George E. Woodson

1976-01-01

Boards of acceptable quality were made from barky material, pressure-refined from 14 species of southern hardwoods. Static bending and tensile properties (parallel to surface) of specimens were negatively correlated to stem specific gravity (wood plus bark), chip bulk density, and fiber bulk density. Bending and tensile properties increased with increasing...
The Comparability of Three Wechsler Adult Intelligence Scales in a College Sample.

ERIC Educational Resources Information Center

Quereshi, M. Y.; Ostrowski, Michael J.

1985-01-01

Administered three Wechsler adult intelligence scales to 72 undergraduates and tested the quality of means, variances, and covariances, utilizing subtest scale scores and IQs. Results indicated that the three scales were not parallel. Generally, the subtest scaled scores exhibited less similarity across the three scales than the IQ estimates.…
La estilistica bergsoniana y la teoria del acento, de Vossler (The Bergsonian Style and Vossler's Theory on Accent)

ERIC Educational Resources Information Center

Criado de Val, Manuel

1975-01-01

Discusses the esthetic ideas of Henri Bergson with particular emphasis on his thoughts about "intuition" in art and the importance of the prosodic features of literary style. The prosodic quality parallels or reproduces the rhythm of the thoughts being expressed. (Text is in Spanish.) (TL)
Randomization and Data-Analysis Items in Quality Standards for Single-Case Experimental Studies

ERIC Educational Resources Information Center

Heyvaert, Mieke; Wendt, Oliver; Van den Noortgate, Wim; Onghena, Patrick

2015-01-01

Reporting standards and critical appraisal tools serve as beacons for researchers, reviewers, and research consumers. Parallel to existing guidelines for researchers to report and evaluate group-comparison studies, single-case experimental (SCE) researchers are in need of guidelines for reporting and evaluating SCE studies. A systematic search was…
78 FR 11808 - Approval and Promulgation of Implementation Plans; Tennessee: Approve Knox County Supplemental...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-02-20

... INFORMATION: On December 18, 2012, EPA proposed to approve, through parallel processing, a draft revision to... County to account for changes in the emissions model and vehicle miles traveled projection model. EPA is... submit comments. FOR FURTHER INFORMATION CONTACT: Kelly Sheckler, Air Quality Modeling and Transportation...
78 FR 6064 - Approval and Promulgation of Air Quality Implementation Plans; Ohio and Indiana; Cincinnati...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-01-29

...) 2010a emissions model. The Ohio and Indiana portions of the Cincinnati- Hamilton area include the Ohio... SIP revision request to EPA for parallel processing with a letter dated October 12, 2012, and followed... special arrangements should be made for deliveries of boxed information. The Regional Office official...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.