Amesos2 and Belos: Direct and Iterative Solvers for Large Sparse Linear Systems
Bavier, Eric; Hoemmen, Mark; Rajamanickam, Sivasankaran; ...
2012-01-01
Solvers for large sparse linear systems come in two categories: direct and iterative. Amesos2, a package in the Trilinos software project, provides direct methods, and Belos, another Trilinos package, provides iterative methods. Amesos2 offers a common interface to many different sparse matrix factorization codes, and can handle any implementation of sparse matrices and vectors, via an easy-to-extend C++ traits interface. It can also factor matrices whose entries have arbitrary “Scalar” type, enabling extended-precision and mixed-precision algorithms. Belos includes many different iterative methods for solving large sparse linear systems and least-squares problems. Unlike competing iterative solver libraries, Belos completely decouples themore » algorithms from the implementations of the underlying linear algebra objects. This lets Belos exploit the latest hardware without changes to the code. Belos favors algorithms that solve higher-level problems, such as multiple simultaneous linear systems and sequences of related linear systems, faster than standard algorithms. The package also supports extended-precision and mixed-precision algorithms. Together, Amesos2 and Belos form a complete suite of sparse linear solvers.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
On improving linear solver performance: a block variant of GMRES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, A H; Dennis, J M; Jessup, E R
2004-05-10
The increasing gap between processor performance and memory access time warrants the re-examination of data movement in iterative linear solver algorithms. For this reason, we explore and establish the feasibility of modifying a standard iterative linear solver algorithm in a manner that reduces the movement of data through memory. In particular, we present an alternative to the restarted GMRES algorithm for solving a single right-hand side linear system Ax = b based on solving the block linear system AX = B. Algorithm performance, i.e. time to solution, is improved by using the matrix A in operations on groups of vectors.more » Experimental results demonstrate the importance of implementation choices on data movement as well as the effectiveness of the new method on a variety of problems from different application areas.« less
NASA Astrophysics Data System (ADS)
Cao, Jian; Chen, Jing-Bo; Dai, Meng-Xue
2018-01-01
An efficient finite-difference frequency-domain modeling of seismic wave propagation relies on the discrete schemes and appropriate solving methods. The average-derivative optimal scheme for the scalar wave modeling is advantageous in terms of the storage saving for the system of linear equations and the flexibility for arbitrary directional sampling intervals. However, using a LU-decomposition-based direct solver to solve its resulting system of linear equations is very costly for both memory and computational requirements. To address this issue, we consider establishing a multigrid-preconditioned BI-CGSTAB iterative solver fit for the average-derivative optimal scheme. The choice of preconditioning matrix and its corresponding multigrid components is made with the help of Fourier spectral analysis and local mode analysis, respectively, which is important for the convergence. Furthermore, we find that for the computation with unequal directional sampling interval, the anisotropic smoothing in the multigrid precondition may affect the convergence rate of this iterative solver. Successful numerical applications of this iterative solver for the homogenous and heterogeneous models in 2D and 3D are presented where the significant reduction of computer memory and the improvement of computational efficiency are demonstrated by comparison with the direct solver. In the numerical experiments, we also show that the unequal directional sampling interval will weaken the advantage of this multigrid-preconditioned iterative solver in the computing speed or, even worse, could reduce its accuracy in some cases, which implies the need for a reasonable control of directional sampling interval in the discretization.
Fault tolerance in an inner-outer solver: A GVR-enabled case study
Zhang, Ziming; Chien, Andrew A.; Teranishi, Keita
2015-04-18
Resilience is a major challenge for large-scale systems. It is particularly important for iterative linear solvers, since they take much of the time of many scientific applications. We show that single bit flip errors in the Flexible GMRES iterative linear solver can lead to high computational overhead or even failure to converge to the right answer. Informed by these results, we design and evaluate several strategies for fault tolerance in both inner and outer solvers appropriate across a range of error rates. We implement them, extending Trilinos’ solver library with the Global View Resilience (GVR) programming model, which provides multi-streammore » snapshots, multi-version data structures with portable and rich error checking/recovery. Lastly, experimental results validate correct execution with low performance overhead under varied error conditions.« less
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carleton, James Brian; Parks, Michael L.
Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
NASA Astrophysics Data System (ADS)
Betté, Srinivas; Diaz, Julio C.; Jines, William R.; Steihaug, Trond
1986-11-01
A preconditioned residual-norm-reducing iterative solver is described. Based on a truncated form of the generalized-conjugate-gradient method for nonsymmetric systems of linear equations, the iterative scheme is very effective for linear systems generated in reservoir simulation of thermal oil recovery processes. As a consequence of employing an adaptive implicit finite-difference scheme to solve the model equations, the number of variables per cell-block varies dynamically over the grid. The data structure allows for 5- and 9-point operators in the areal model, 5-point in the cross-sectional model, and 7- and 11-point operators in the three-dimensional model. Block-diagonal-scaling of the linear system, done prior to iteration, is found to have a significant effect on the rate of convergence. Block-incomplete-LU-decomposition (BILU) and block-symmetric-Gauss-Seidel (BSGS) methods, which result in no fill-in, are used as preconditioning procedures. A full factorization is done on the well terms, and the cells are ordered in a manner which minimizes the fill-in in the well-column due to this factorization. The convergence criterion for the linear (inner) iteration is linked to that of the nonlinear (Newton) iteration, thereby enhancing the efficiency of the computation. The algorithm, with both BILU and BSGS preconditioners, is evaluated in the context of a variety of thermal simulation problems. The solver is robust and can be used with little or no user intervention.
The solution of linear systems of equations with a structural analysis code on the NAS CRAY-2
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Overman, Andrea L.
1988-01-01
Two methods for solving linear systems of equations on the NAS Cray-2 are described. One is a direct method; the other is an iterative method. Both methods exploit the architecture of the Cray-2, particularly the vectorization, and are aimed at structural analysis applications. To demonstrate and evaluate the methods, they were installed in a finite element structural analysis code denoted the Computational Structural Mechanics (CSM) Testbed. A description of the techniques used to integrate the two solvers into the Testbed is given. Storage schemes, memory requirements, operation counts, and reformatting procedures are discussed. Finally, results from the new methods are compared with results from the initial Testbed sparse Choleski equation solver for three structural analysis problems. The new direct solvers described achieve the highest computational rates of the methods compared. The new iterative methods are not able to achieve as high computation rates as the vectorized direct solvers but are best for well conditioned problems which require fewer iterations to converge to the solution.
On some Aitken-like acceleration of the Schwarz method
NASA Astrophysics Data System (ADS)
Garbey, M.; Tromeur-Dervout, D.
2002-12-01
In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Elia, M.; Edwards, H. C.; Hu, J.
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
D'Elia, M.; Edwards, H. C.; Hu, J.; ...
2018-01-18
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
Naff, Richard L.; Banta, Edward R.
2008-01-01
The preconditioned conjugate gradient with improved nonlinear control (PCGN) package provides addi-tional means by which the solution of nonlinear ground-water flow problems can be controlled as compared to existing solver packages for MODFLOW. Picard iteration is used to solve nonlinear ground-water flow equations by iteratively solving a linear approximation of the nonlinear equations. The linear solution is provided by means of the preconditioned conjugate gradient algorithm where preconditioning is provided by the modi-fied incomplete Cholesky algorithm. The incomplete Cholesky scheme incorporates two levels of fill, 0 and 1, in which the pivots can be modified so that the row sums of the preconditioning matrix and the original matrix are approximately equal. A relaxation factor is used to implement the modified pivots, which determines the degree of modification allowed. The effects of fill level and degree of pivot modification are briefly explored by means of a synthetic, heterogeneous finite-difference matrix; results are reported in the final section of this report. The preconditioned conjugate gradient method is coupled with Picard iteration so as to efficiently solve the nonlinear equations associated with many ground-water flow problems. The description of this coupling of the linear solver with Picard iteration is a primary concern of this document.
LDRD final report on massively-parallel linear programming : the parPCx system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar
2005-02-01
This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Preconditioned conjugate-gradient methods for low-speed flow calculations
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing
1993-01-01
An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations is integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the Lower-Upper Successive Symmetric Over-Relaxation iterative scheme is more efficient than a preconditioner based on Incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional Line Gauss-Seidel Relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
Preconditioned Conjugate Gradient methods for low speed flow calculations
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing
1993-01-01
An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations are integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and the convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the lower-upper (L-U)-successive symmetric over-relaxation iterative scheme is more efficient than a preconditioner based on incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional line Gauss-Seidel relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.
2016-01-01
Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frinks, Neal T.
2016-01-01
Several improvements to the mixed-elementUSM3Ddiscretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro
Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators
NASA Astrophysics Data System (ADS)
Gonzalez, Juan; Núñez, Rafael C.
2009-07-01
We present LAPACKrc, a family of FPGA-based linear algebra solvers able to achieve more than 100x speedup per commodity processor on certain problems. LAPACKrc subsumes some of the LAPACK and ScaLAPACK functionalities, and it also incorporates sparse direct and iterative matrix solvers. Current LAPACKrc prototypes demonstrate between 40x-150x speedup compared against top-of-the-line hardware/software systems. A technology roadmap is in place to validate current performance of LAPACKrc in HPC applications, and to increase the computational throughput by factors of hundreds within the next few years.
Using parallel banded linear system solvers in generalized eigenvalue problems
NASA Technical Reports Server (NTRS)
Zhang, Hong; Moss, William F.
1993-01-01
Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
NASA Astrophysics Data System (ADS)
Raburn, Daniel Louis
We have developed a preconditioned, globalized Jacobian-free Newton-Krylov (JFNK) solver for calculating equilibria with magnetic islands. The solver has been developed in conjunction with the Princeton Iterative Equilibrium Solver (PIES) and includes two notable enhancements over a traditional JFNK scheme: (1) globalization of the algorithm by a sophisticated backtracking scheme, which optimizes between the Newton and steepest-descent directions; and, (2) adaptive preconditioning, wherein information regarding the system Jacobian is reused between Newton iterations to form a preconditioner for our GMRES-like linear solver. We have developed a formulation for calculating saturated neoclassical tearing modes (NTMs) which accounts for the incomplete loss of a bootstrap current due to gradients of multiple physical quantities. We have applied the coupled PIES-JFNK solver to calculate saturated island widths on several shots from the Tokamak Fusion Test Reactor (TFTR) and have found reasonable agreement with experimental measurement.
Oasis: A high-level/high-performance open source Navier-Stokes solver
NASA Astrophysics Data System (ADS)
Mortensen, Mikael; Valen-Sendstad, Kristian
2015-03-01
Oasis is a high-level/high-performance finite element Navier-Stokes solver written from scratch in Python using building blocks from the FEniCS project (fenicsproject.org). The solver is unstructured and targets large-scale applications in complex geometries on massively parallel clusters. Oasis utilizes MPI and interfaces, through FEniCS, to the linear algebra backend PETSc. Oasis advocates a high-level, programmable user interface through the creation of highly flexible Python modules for new problems. Through the high-level Python interface the user is placed in complete control of every aspect of the solver. A version of the solver, that is using piecewise linear elements for both velocity and pressure, is shown to reproduce very well the classical, spectral, turbulent channel simulations of Moser et al. (1999). The computational speed is strongly dominated by the iterative solvers provided by the linear algebra backend, which is arguably the best performance any similar implicit solver using PETSc may hope for. Higher order accuracy is also demonstrated and new solvers may be easily added within the same framework.
A generalized Poisson and Poisson-Boltzmann solver for electrostatic environments.
Fisicaro, G; Genovese, L; Andreussi, O; Marzari, N; Goedecker, S
2016-01-07
The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of applied electrochemical potentials, taking into account the non-trivial electrostatic screening coming from the solvent and the electrolytes. As a consequence, the electrostatic potential has to be found by solving the generalized Poisson and the Poisson-Boltzmann equations for neutral and ionic solutions, respectively. In the present work, solvers for both problems have been developed. A preconditioned conjugate gradient method has been implemented for the solution of the generalized Poisson equation and the linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations of the ordinary Poisson equation solver. In addition, a self-consistent procedure enables us to solve the non-linear Poisson-Boltzmann problem. Both solvers exhibit very high accuracy and parallel efficiency and allow for the treatment of periodic, free, and slab boundary conditions. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and will be released as an independent program, suitable for integration in other codes.
A generalized Poisson and Poisson-Boltzmann solver for electrostatic environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fisicaro, G., E-mail: giuseppe.fisicaro@unibas.ch; Goedecker, S.; Genovese, L.
2016-01-07
The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of applied electrochemical potentials, taking into account the non-trivial electrostatic screening coming from the solvent and the electrolytes. As a consequence, the electrostatic potential has to be found by solving the generalized Poisson and the Poisson-Boltzmann equations for neutral and ionic solutions, respectively. In the present work, solvers for both problems have been developed. A preconditioned conjugate gradient method has been implemented for the solution of the generalized Poisson equation and themore » linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations of the ordinary Poisson equation solver. In addition, a self-consistent procedure enables us to solve the non-linear Poisson-Boltzmann problem. Both solvers exhibit very high accuracy and parallel efficiency and allow for the treatment of periodic, free, and slab boundary conditions. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and will be released as an independent program, suitable for integration in other codes.« less
Comparing direct and iterative equation solvers in a large structural analysis software system
NASA Technical Reports Server (NTRS)
Poole, E. L.
1991-01-01
Two direct Choleski equation solvers and two iterative preconditioned conjugate gradient (PCG) equation solvers used in a large structural analysis software system are described. The two direct solvers are implementations of the Choleski method for variable-band matrix storage and sparse matrix storage. The two iterative PCG solvers include the Jacobi conjugate gradient method and an incomplete Choleski conjugate gradient method. The performance of the direct and iterative solvers is compared by solving several representative structural analysis problems. Some key factors affecting the performance of the iterative solvers relative to the direct solvers are identified.
Iterative Methods to Solve Linear RF Fields in Hot Plasma
NASA Astrophysics Data System (ADS)
Spencer, Joseph; Svidzinski, Vladimir; Evstatiev, Evstati; Galkin, Sergei; Kim, Jin-Soo
2014-10-01
Most magnetic plasma confinement devices use radio frequency (RF) waves for current drive and/or heating. Numerical modeling of RF fields is an important part of performance analysis of such devices and a predictive tool aiding design and development of future devices. Prior attempts at this modeling have mostly used direct solvers to solve the formulated linear equations. Full wave modeling of RF fields in hot plasma with 3D nonuniformities is mostly prohibited, with memory demands of a direct solver placing a significant limitation on spatial resolution. Iterative methods can significantly increase spatial resolution. We explore the feasibility of using iterative methods in 3D full wave modeling. The linear wave equation is formulated using two approaches: for cold plasmas the local cold plasma dielectric tensor is used (resolving resonances by particle collisions), while for hot plasmas the conductivity kernel (which includes a nonlocal dielectric response) is calculated by integrating along test particle orbits. The wave equation is discretized using a finite difference approach. The initial guess is important in iterative methods, and we examine different initial guesses including the solution to the cold plasma wave equation. Work is supported by the U.S. DOE SBIR program.
NASA Astrophysics Data System (ADS)
Seo, Jongmin; Schiavazzi, Daniele; Marsden, Alison
2017-11-01
Cardiovascular simulations are increasingly used in clinical decision making, surgical planning, and disease diagnostics. Patient-specific modeling and simulation typically proceeds through a pipeline from anatomic model construction using medical image data to blood flow simulation and analysis. To provide confidence intervals on simulation predictions, we use an uncertainty quantification (UQ) framework to analyze the effects of numerous uncertainties that stem from clinical data acquisition, modeling, material properties, and boundary condition selection. However, UQ poses a computational challenge requiring multiple evaluations of the Navier-Stokes equations in complex 3-D models. To achieve efficiency in UQ problems with many function evaluations, we implement and compare a range of iterative linear solver and preconditioning techniques in our flow solver. We then discuss applications to patient-specific cardiovascular simulation and how the problem/boundary condition formulation in the solver affects the selection of the most efficient linear solver. Finally, we discuss performance improvements in the context of uncertainty propagation. Support from National Institute of Health (R01 EB018302) is greatly appreciated.
NASA Astrophysics Data System (ADS)
Huismann, Immo; Stiller, Jörg; Fröhlich, Jochen
2017-10-01
The paper proposes a novel factorization technique for static condensation of a spectral-element discretization matrix that yields a linear operation count of just 13N multiplications for the residual evaluation, where N is the total number of unknowns. In comparison to previous work it saves a factor larger than 3 and outpaces unfactored variants for all polynomial degrees. Using the new technique as a building block for a preconditioned conjugate gradient method yields linear scaling of the runtime with N which is demonstrated for polynomial degrees from 2 to 32. This makes the spectral-element method cost effective even for low polynomial degrees. Moreover, the dependence of the iterative solution on the element aspect ratio is addressed, showing only a slight increase in the number of iterations for aspect ratios up to 128. Hence, the solver is very robust for practical applications.
A stopping criterion for the iterative solution of partial differential equations
NASA Astrophysics Data System (ADS)
Rao, Kaustubh; Malan, Paul; Perot, J. Blair
2018-01-01
A stopping criterion for iterative solution methods is presented that accurately estimates the solution error using low computational overhead. The proposed criterion uses information from prior solution changes to estimate the error. When the solution changes are noisy or stagnating it reverts to a less accurate but more robust, low-cost singular value estimate to approximate the error given the residual. This estimator can also be applied to iterative linear matrix solvers such as Krylov subspace or multigrid methods. Examples of the stopping criterion's ability to accurately estimate the non-linear and linear solution error are provided for a number of different test cases in incompressible fluid dynamics.
Performance issues for iterative solvers in device simulation
NASA Technical Reports Server (NTRS)
Fan, Qing; Forsyth, P. A.; Mcmacken, J. R. F.; Tang, Wei-Pai
1994-01-01
Due to memory limitations, iterative methods have become the method of choice for large scale semiconductor device simulation. However, it is well known that these methods still suffer from reliability problems. The linear systems which appear in numerical simulation of semiconductor devices are notoriously ill-conditioned. In order to produce robust algorithms for practical problems, careful attention must be given to many implementation issues. This paper concentrates on strategies for developing robust preconditioners. In addition, effective data structures and convergence check issues are also discussed. These algorithms are compared with a standard direct sparse matrix solver on a variety of problems.
Hesford, Andrew J.; Chew, Weng C.
2010-01-01
The distorted Born iterative method (DBIM) computes iterative solutions to nonlinear inverse scattering problems through successive linear approximations. By decomposing the scattered field into a superposition of scattering by an inhomogeneous background and by a material perturbation, large or high-contrast variations in medium properties can be imaged through iterations that are each subject to the distorted Born approximation. However, the need to repeatedly compute forward solutions still imposes a very heavy computational burden. To ameliorate this problem, the multilevel fast multipole algorithm (MLFMA) has been applied as a forward solver within the DBIM. The MLFMA computes forward solutions in linear time for volumetric scatterers. The typically regular distribution and shape of scattering elements in the inverse scattering problem allow the method to take advantage of data redundancy and reduce the computational demands of the normally expensive MLFMA setup. Additional benefits are gained by employing Kaczmarz-like iterations, where partial measurements are used to accelerate convergence. Numerical results demonstrate both the efficiency of the forward solver and the successful application of the inverse method to imaging problems with dimensions in the neighborhood of ten wavelengths. PMID:20707438
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saad, Yousef
2014-01-16
The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less
NASA Technical Reports Server (NTRS)
Jothiprasad, Giridhar; Mavriplis, Dimitri J.; Caughey, David A.; Bushnell, Dennis M. (Technical Monitor)
2002-01-01
The efficiency gains obtained using higher-order implicit Runge-Kutta schemes as compared with the second-order accurate backward difference schemes for the unsteady Navier-Stokes equations are investigated. Three different algorithms for solving the nonlinear system of equations arising at each timestep are presented. The first algorithm (NMG) is a pseudo-time-stepping scheme which employs a non-linear full approximation storage (FAS) agglomeration multigrid method to accelerate convergence. The other two algorithms are based on Inexact Newton's methods. The linear system arising at each Newton step is solved using iterative/Krylov techniques and left preconditioning is used to accelerate convergence of the linear solvers. One of the methods (LMG) uses Richardson's iterative scheme for solving the linear system at each Newton step while the other (PGMRES) uses the Generalized Minimal Residual method. Results demonstrating the relative superiority of these Newton's methods based schemes are presented. Efficiency gains as high as 10 are obtained by combining the higher-order time integration schemes with the more efficient nonlinear solvers.
Extending substructure based iterative solvers to multiple load and repeated analyses
NASA Technical Reports Server (NTRS)
Farhat, Charbel
1993-01-01
Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*
Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.
2014-01-01
We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094
Low-memory iterative density fitting.
Grajciar, Lukáš
2015-07-30
A new low-memory modification of the density fitting approximation based on a combination of a continuous fast multipole method (CFMM) and a preconditioned conjugate gradient solver is presented. Iterative conjugate gradient solver uses preconditioners formed from blocks of the Coulomb metric matrix that decrease the number of iterations needed for convergence by up to one order of magnitude. The matrix-vector products needed within the iterative algorithm are calculated using CFMM, which evaluates them with the linear scaling memory requirements only. Compared with the standard density fitting implementation, up to 15-fold reduction of the memory requirements is achieved for the most efficient preconditioner at a cost of only 25% increase in computational time. The potential of the method is demonstrated by performing density functional theory calculations for zeolite fragment with 2592 atoms and 121,248 auxiliary basis functions on a single 12-core CPU workstation. © 2015 Wiley Periodicals, Inc.
Improved Convergence and Robustness of USM3D Solutions on Mixed Element Grids (Invited)
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.
2015-01-01
Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Scheme (HANIS), has been developed and implemented. It provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier Stokes (RANS) equations and a nonlinear control of the solution update. Two variants of the new methodology are assessed on four benchmark cases, namely, a zero-pressure gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the baseline solver technology.
Adaptive Discrete Hypergraph Matching.
Yan, Junchi; Li, Changsheng; Li, Yin; Cao, Guitao
2018-02-01
This paper addresses the problem of hypergraph matching using higher-order affinity information. We propose a solver that iteratively updates the solution in the discrete domain by linear assignment approximation. The proposed method is guaranteed to converge to a stationary discrete solution and avoids the annealing procedure and ad-hoc post binarization step that are required in several previous methods. Specifically, we start with a simple iterative discrete gradient assignment solver. This solver can be trapped in an -circle sequence under moderate conditions, where is the order of the graph matching problem. We then devise an adaptive relaxation mechanism to jump out this degenerating case and show that the resulting new path will converge to a fixed solution in the discrete domain. The proposed method is tested on both synthetic and real-world benchmarks. The experimental results corroborate the efficacy of our method.
NASA Astrophysics Data System (ADS)
Safari, A.; Sharifi, M. A.; Amjadiparvar, B.
2010-05-01
The GRACE mission has substantiated the low-low satellite-to-satellite tracking (LL-SST) concept. The LL-SST configuration can be combined with the previously realized high-low SST concept in the CHAMP mission to provide a much higher accuracy. The line of sight (LOS) acceleration difference between the GRACE satellite pair is the mostly used observable for mapping the global gravity field of the Earth in terms of spherical harmonic coefficients. In this paper, mathematical formulae for LOS acceleration difference observations have been derived and the corresponding linear system of equations has been set up for spherical harmonic up to degree and order 120. The total number of unknowns is 14641. Such a linear equation system can be solved with iterative solvers or direct solvers. However, the runtime of direct methods or that of iterative solvers without a suitable preconditioner increases tremendously. This is the reason why we need a more sophisticated method to solve the linear system of problems with a large number of unknowns. Multiplicative variant of the Schwarz alternating algorithm is a domain decomposition method, which allows it to split the normal matrix of the system into several smaller overlaped submatrices. In each iteration step the multiplicative variant of the Schwarz alternating algorithm solves linear systems with the matrices obtained from the splitting successively. It reduces both runtime and memory requirements drastically. In this paper we propose the Multiplicative Schwarz Alternating Algorithm (MSAA) for solving the large linear system of gravity field recovery. The proposed algorithm has been tested on the International Association of Geodesy (IAG)-simulated data of the GRACE mission. The achieved results indicate the validity and efficiency of the proposed algorithm in solving the linear system of equations from accuracy and runtime points of view. Keywords: Gravity field recovery, Multiplicative Schwarz Alternating Algorithm, Low-Low Satellite-to-Satellite Tracking
Code Samples Used for Complexity and Control
NASA Astrophysics Data System (ADS)
Ivancevic, Vladimir G.; Reid, Darryn J.
2015-11-01
The following sections are included: * MathematicaⓇ Code * Generic Chaotic Simulator * Vector Differential Operators * NLS Explorer * 2C++ Code * C++ Lambda Functions for Real Calculus * Accelerometer Data Processor * Simple Predictor-Corrector Integrator * Solving the BVP with the Shooting Method * Linear Hyperbolic PDE Solver * Linear Elliptic PDE Solver * Method of Lines for a Set of the NLS Equations * C# Code * Iterative Equation Solver * Simulated Annealing: A Function Minimum * Simple Nonlinear Dynamics * Nonlinear Pendulum Simulator * Lagrangian Dynamics Simulator * Complex-Valued Crowd Attractor Dynamics * Freeform Fortran Code * Lorenz Attractor Simulator * Complex Lorenz Attractor * Simple SGE Soliton * Complex Signal Presentation * Gaussian Wave Packet * Hermitian Matrices * Euclidean L2-Norm * Vector/Matrix Operations * Plain C-Code: Levenberg-Marquardt Optimizer * Free Basic Code: 2D Crowd Dynamics with 3000 Agents
A Flexible CUDA LU-based Solver for Small, Batched Linear Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Gawande, Nitin A.; Villa, Oreste
This chapter presents the implementation of a batched CUDA solver based on LU factorization for small linear systems. This solver may be used in applications such as reactive flow transport models, which apply the Newton-Raphson technique to linearize and iteratively solve the sets of non linear equations that represent the reactions for ten of thousands to millions of physical locations. The implementation exploits somewhat counterintuitive GPGPU programming techniques: it assigns the solution of a matrix (representing a system) to a single CUDA thread, does not exploit shared memory and employs dynamic memory allocation on the GPUs. These techniques enable ourmore » implementation to simultaneously solve sets of systems with over 100 equations and to employ LU decomposition with complete pivoting, providing the higher numerical accuracy required by certain applications. Other currently available solutions for batched linear solvers are limited by size and only support partial pivoting, although they may result faster in certain conditions. We discuss the code of our implementation and present a comparison with the other implementations, discussing the various tradeoffs in terms of performance and flexibility. This work will enable developers that need batched linear solvers to choose whichever implementation is more appropriate to the features and the requirements of their applications, and even to implement dynamic switching approaches that can choose the best implementation depending on the input data.« less
A Fast Solver for Implicit Integration of the Vlasov--Poisson System in the Eulerian Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrett, C. Kristopher; Hauck, Cory D.
In this paper, we present a domain decomposition algorithm to accelerate the solution of Eulerian-type discretizations of the linear, steady-state Vlasov equation. The steady-state solver then forms a key component in the implementation of fully implicit or nearly fully implicit temporal integrators for the nonlinear Vlasov--Poisson system. The solver relies on a particular decomposition of phase space that enables the use of sweeping techniques commonly used in radiation transport applications. The original linear system for the phase space unknowns is then replaced by a smaller linear system involving only unknowns on the boundary between subdomains, which can then be solvedmore » efficiently with Krylov methods such as GMRES. Steady-state solves are combined to form an implicit Runge--Kutta time integrator, and the Vlasov equation is coupled self-consistently to the Poisson equation via a linearized procedure or a nonlinear fixed-point method for the electric field. Finally, numerical results for standard test problems demonstrate the efficiency of the domain decomposition approach when compared to the direct application of an iterative solver to the original linear system.« less
A Fast Solver for Implicit Integration of the Vlasov--Poisson System in the Eulerian Framework
Garrett, C. Kristopher; Hauck, Cory D.
2018-04-05
In this paper, we present a domain decomposition algorithm to accelerate the solution of Eulerian-type discretizations of the linear, steady-state Vlasov equation. The steady-state solver then forms a key component in the implementation of fully implicit or nearly fully implicit temporal integrators for the nonlinear Vlasov--Poisson system. The solver relies on a particular decomposition of phase space that enables the use of sweeping techniques commonly used in radiation transport applications. The original linear system for the phase space unknowns is then replaced by a smaller linear system involving only unknowns on the boundary between subdomains, which can then be solvedmore » efficiently with Krylov methods such as GMRES. Steady-state solves are combined to form an implicit Runge--Kutta time integrator, and the Vlasov equation is coupled self-consistently to the Poisson equation via a linearized procedure or a nonlinear fixed-point method for the electric field. Finally, numerical results for standard test problems demonstrate the efficiency of the domain decomposition approach when compared to the direct application of an iterative solver to the original linear system.« less
Domain decomposition methods for the parallel computation of reacting flows
NASA Technical Reports Server (NTRS)
Keyes, David E.
1988-01-01
Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Hydrodynamics of suspensions of passive and active rigid particles: a rigid multiblob approach
Usabiaga, Florencio Balboa; Kallemov, Bakytzhan; Delmotte, Blaise; ...
2016-01-12
We develop a rigid multiblob method for numerically solving the mobility problem for suspensions of passive and active rigid particles of complex shape in Stokes flow in unconfined, partially confined, and fully confined geometries. As in a number of existing methods, we discretize rigid bodies using a collection of minimally resolved spherical blobs constrained to move as a rigid body, to arrive at a potentially large linear system of equations for the unknown Lagrange multipliers and rigid-body motions. Here we develop a block-diagonal preconditioner for this linear system and show that a standard Krylov solver converges in a modest numbermore » of iterations that is essentially independent of the number of particles. Key to the efficiency of the method is a technique for fast computation of the product of the blob-blob mobility matrix and a vector. For unbounded suspensions, we rely on existing analytical expressions for the Rotne-Prager-Yamakawa tensor combined with a fast multipole method (FMM) to obtain linear scaling in the number of particles. For suspensions sedimented against a single no-slip boundary, we use a direct summation on a graphical processing unit (GPU), which gives quadratic asymptotic scaling with the number of particles. For fully confined domains, such as periodic suspensions or suspensions confined in slit and square channels, we extend a recently developed rigid-body immersed boundary method by B. Kallemov, A. P. S. Bhalla, B. E. Griffith, and A. Donev (Commun. Appl. Math. Comput. Sci. 11 (2016), no. 1, 79-141) to suspensions of freely moving passive or active rigid particles at zero Reynolds number. We demonstrate that the iterative solver for the coupled fluid and rigid-body equations converges in a bounded number of iterations regardless of the system size. In our approach, each iteration only requires a few cycles of a geometric multigrid solver for the Poisson equation, and an application of the block-diagonal preconditioner, leading to linear scaling with the number of particles. We optimize a number of parameters in the iterative solvers and apply our method to a variety of benchmark problems to carefully assess the accuracy of the rigid multiblob approach as a function of the resolution. We also model the dynamics of colloidal particles studied in recent experiments, such as passive boomerangs in a slit channel, as well as a pair of non-Brownian active nanorods sedimented against a wall.« less
Thyra Abstract Interface Package
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bartlett, Roscoe A.
2005-09-01
Thrya primarily defines a set of abstract C++ class interfaces needed for the development of abstract numerical atgorithms (ANAs) such as iterative linear solvers, transient solvers all the way up to optimization. At the foundation of these interfaces are abstract C++ classes for vectors, vector spaces, linear operators and multi-vectors. Also included in the Thyra package is C++ code for creating concrete vector, vector space, linear operator, and multi-vector subclasses as well as other utilities to aid in the development of ANAs. Currently, very general and efficient concrete subclass implementations exist for serial and SPMD in-core vectors and multi-vectors. Codemore » also currently exists for testing objects and providing composite objects such as product vectors.« less
NASA Technical Reports Server (NTRS)
Koppenhoefer, Kyle C.; Gullerud, Arne S.; Ruggieri, Claudio; Dodds, Robert H., Jr.; Healy, Brian E.
1998-01-01
This report describes theoretical background material and commands necessary to use the WARP3D finite element code. WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of ductile fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, face loading), an element extinction facility to model crack growth, nonlinear material models including viscoplastic effects, and the Gurson-Tver-gaard dilatant plasticity model for void growth. The nonlinear, dynamic equilibrium equations are solved using an incremental-iterative, implicit formulation with full Newton iterations to eliminate residual nodal forces. The history integration of the nonlinear equations of motion is accomplished with Newmarks Beta method. A central feature of WARP3D involves the use of a linear-preconditioned conjugate gradient (LPCG) solver implemented in an element-by-element format to replace a conventional direct linear equation solver. This software architecture dramatically reduces both the memory requirements and CPU time for very large, nonlinear solid models since formation of the assembled (dynamic) stiffness matrix is avoided. Analyses thus exhibit the numerical stability for large time (load) steps provided by the implicit formulation coupled with the low memory requirements characteristic of an explicit code. In addition to the much lower memory requirements of the LPCG solver, the CPU time required for solution of the linear equations during each Newton iteration is generally one-half or less of the CPU time required for a traditional direct solver. All other computational aspects of the code (element stiffnesses, element strains, stress updating, element internal forces) are implemented in the element-by- element, blocked architecture. This greatly improves vectorization of the code on uni-processor hardware and enables straightforward parallel-vector processing of element blocks on multi-processor hardware.
NASA Astrophysics Data System (ADS)
Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo
2016-07-01
Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
The Mixed Finite Element Multigrid Method for Stokes Equations
Muzhinji, K.; Shateyi, S.; Motsa, S. S.
2015-01-01
The stable finite element discretization of the Stokes problem produces a symmetric indefinite system of linear algebraic equations. A variety of iterative solvers have been proposed for such systems in an attempt to construct efficient, fast, and robust solution techniques. This paper investigates one of such iterative solvers, the geometric multigrid solver, to find the approximate solution of the indefinite systems. The main ingredient of the multigrid method is the choice of an appropriate smoothing strategy. This study considers the application of different smoothers and compares their effects in the overall performance of the multigrid solver. We study the multigrid method with the following smoothers: distributed Gauss Seidel, inexact Uzawa, preconditioned MINRES, and Braess-Sarazin type smoothers. A comparative study of the smoothers shows that the Braess-Sarazin smoothers enhance good performance of the multigrid method. We study the problem in a two-dimensional domain using stable Hood-Taylor Q 2-Q 1 pair of finite rectangular elements. We also give the main theoretical convergence results. We present the numerical results to demonstrate the efficiency and robustness of the multigrid method and confirm the theoretical results. PMID:25945361
NASA Astrophysics Data System (ADS)
Lu, Benzhuo; Cheng, Xiaolin; Huang, Jingfang; McCammon, J. Andrew
2010-06-01
A Fortran program package is introduced for rapid evaluation of the electrostatic potentials and forces in biomolecular systems modeled by the linearized Poisson-Boltzmann equation. The numerical solver utilizes a well-conditioned boundary integral equation (BIE) formulation, a node-patch discretization scheme, a Krylov subspace iterative solver package with reverse communication protocols, and an adaptive new version of fast multipole method in which the exponential expansions are used to diagonalize the multipole-to-local translations. The program and its full description, as well as several closely related libraries and utility tools are available at http://lsec.cc.ac.cn/~lubz/afmpb.html and a mirror site at http://mccammon.ucsd.edu/. This paper is a brief summary of the program: the algorithms, the implementation and the usage. Program summaryProgram title: AFMPB: Adaptive fast multipole Poisson-Boltzmann solver Catalogue identifier: AEGB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGB_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL 2.0 No. of lines in distributed program, including test data, etc.: 453 649 No. of bytes in distributed program, including test data, etc.: 8 764 754 Distribution format: tar.gz Programming language: Fortran Computer: Any Operating system: Any RAM: Depends on the size of the discretized biomolecular system Classification: 3 External routines: Pre- and post-processing tools are required for generating the boundary elements and for visualization. Users can use MSMS ( http://www.scripps.edu/~sanner/html/msms_home.html) for pre-processing, and VMD ( http://www.ks.uiuc.edu/Research/vmd/) for visualization. Sub-programs included: An iterative Krylov subspace solvers package from SPARSKIT by Yousef Saad ( http://www-users.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html), and the fast multipole methods subroutines from FMMSuite ( http://www.fastmultipole.org/). Nature of problem: Numerical solution of the linearized Poisson-Boltzmann equation that describes electrostatic interactions of molecular systems in ionic solutions. Solution method: A novel node-patch scheme is used to discretize the well-conditioned boundary integral equation formulation of the linearized Poisson-Boltzmann equation. Various Krylov subspace solvers can be subsequently applied to solve the resulting linear system, with a bounded number of iterations independent of the number of discretized unknowns. The matrix-vector multiplication at each iteration is accelerated by the adaptive new versions of fast multipole methods. The AFMPB solver requires other stand-alone pre-processing tools for boundary mesh generation, post-processing tools for data analysis and visualization, and can be conveniently coupled with different time stepping methods for dynamics simulation. Restrictions: Only three or six significant digits options are provided in this version. Unusual features: Most of the codes are in Fortran77 style. Memory allocation functions from Fortran90 and above are used in a few subroutines. Additional comments: The current version of the codes is designed and written for single core/processor desktop machines. Check http://lsec.cc.ac.cn/~lubz/afmpb.html and http://mccammon.ucsd.edu/ for updates and changes. Running time: The running time varies with the number of discretized elements ( N) in the system and their distributions. In most cases, it scales linearly as a function of N.
Acceleration of GPU-based Krylov solvers via data transfer reduction
Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...
2015-04-08
Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less
Shape reanalysis and sensitivities utilizing preconditioned iterative boundary solvers
NASA Technical Reports Server (NTRS)
Guru Prasad, K.; Kane, J. H.
1992-01-01
The computational advantages associated with the utilization of preconditined iterative equation solvers are quantified for the reanalysis of perturbed shapes using continuum structural boundary element analysis (BEA). Both single- and multi-zone three-dimensional problems are examined. Significant reductions in computer time are obtained by making use of previously computed solution vectors and preconditioners in subsequent analyses. The effectiveness of this technique is demonstrated for the computation of shape response sensitivities required in shape optimization. Computer times and accuracies achieved using the preconditioned iterative solvers are compared with those obtained via direct solvers and implicit differentiation of the boundary integral equations. It is concluded that this approach employing preconditioned iterative equation solvers in reanalysis and sensitivity analysis can be competitive with if not superior to those involving direct solvers.
NASA Astrophysics Data System (ADS)
Han, B.; Li, Y.
2016-12-01
We present a three-dimensional (3D) forward and inverse modeling code for marine controlled-source electromagnetic (CSEM) surveys in anisotropic media. The forward solution is based on a primary/secondary field approach, in which secondary fields are solved using a staggered finite-volume (FV) method and primary fields are solved for 1D isotropic background models analytically. It is shown that it is rather straightforward to extend the isotopic 3D FV algorithm to a triaxial anisotropic one, while additional coefficients are required to account for full tensor conductivity. To solve the linear system resulting from FV discretization of Maxwell' s equations, both iterative Krylov solvers (e.g. BiCGSTAB) and direct solvers (e.g. MUMPS) have been implemented, makes the code flexible for different computing platforms and different problems. For iterative soloutions, the linear system in terms of electromagnetic potentials (A-Phi) is used to precondition the original linear system, transforming the discretized Curl-Curl equations to discretized Laplace-like equations, thus much more favorable numerical properties can be obtained. Numerical experiments suggest that this A-Phi preconditioner can dramatically improve the convergence rate of an iterative solver and high accuracy can be achieved without divergence correction even for low frequencies. To efficiently calculate the sensitivities, i.e. the derivatives of CSEM data with respect to tensor conductivity, the adjoint method is employed. For inverse modeling, triaxial anisotropy is taken into account. Since the number of model parameters to be resolved of triaxial anisotropic medias is twice or thrice that of isotropic medias, the data-space version of the Gauss-Newton (GN) minimization method is preferred due to its lower computational cost compared with the traditional model-space GN method. We demonstrate the effectiveness of the code with synthetic examples.
NASA Technical Reports Server (NTRS)
Turc, Catalin; Anand, Akash; Bruno, Oscar; Chaubell, Julian
2011-01-01
We present a computational methodology (a novel Nystrom approach based on use of a non-overlapping patch technique and Chebyshev discretizations) for efficient solution of problems of acoustic and electromagnetic scattering by open surfaces. Our integral equation formulations (1) Incorporate, as ansatz, the singular nature of open-surface integral-equation solutions, and (2) For the Electric Field Integral Equation (EFIE), use analytical regularizes that effectively reduce the number of iterations required by iterative linear-algebra solution based on Krylov-subspace iterative solvers.
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme
Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...
2016-11-07
Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.
Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
An iterative solver for the 3D Helmholtz equation
NASA Astrophysics Data System (ADS)
Belonosov, Mikhail; Dmitriev, Maxim; Kostin, Victor; Neklyudov, Dmitry; Tcheverda, Vladimir
2017-09-01
We develop a frequency-domain iterative solver for numerical simulation of acoustic waves in 3D heterogeneous media. It is based on the application of a unique preconditioner to the Helmholtz equation that ensures convergence for Krylov subspace iteration methods. Effective inversion of the preconditioner involves the Fast Fourier Transform (FFT) and numerical solution of a series of boundary value problems for ordinary differential equations. Matrix-by-vector multiplication for iterative inversion of the preconditioned matrix involves inversion of the preconditioner and pointwise multiplication of grid functions. Our solver has been verified by benchmarking against exact solutions and a time-domain solver.
Eigenvalue Solvers for Modeling Nuclear Reactors on Leadership Class Machines
Slaybaugh, R. N.; Ramirez-Zweiger, M.; Pandya, Tara; ...
2018-02-20
In this paper, three complementary methods have been implemented in the code Denovo that accelerate neutral particle transport calculations with methods that use leadership-class computers fully and effectively: a multigroup block (MG) Krylov solver, a Rayleigh quotient iteration (RQI) eigenvalue solver, and a multigrid in energy (MGE) preconditioner. The MG Krylov solver converges more quickly than Gauss Seidel and enables energy decomposition such that Denovo can scale to hundreds of thousands of cores. RQI should converge in fewer iterations than power iteration (PI) for large and challenging problems. RQI creates shifted systems that would not be tractable without the MGmore » Krylov solver. It also creates ill-conditioned matrices. The MGE preconditioner reduces iteration count significantly when used with RQI and takes advantage of the new energy decomposition such that it can scale efficiently. Each individual method has been described before, but this is the first time they have been demonstrated to work together effectively. The combination of solvers enables the RQI eigenvalue solver to work better than the other available solvers for large reactors problems on leadership-class machines. Using these methods together, RQI converged in fewer iterations and in less time than PI for a full pressurized water reactor core. These solvers also performed better than an Arnoldi eigenvalue solver for a reactor benchmark problem when energy decomposition is needed. The MG Krylov, MGE preconditioner, and RQI solver combination also scales well in energy. Finally, this solver set is a strong choice for very large and challenging problems.« less
Eigenvalue Solvers for Modeling Nuclear Reactors on Leadership Class Machines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slaybaugh, R. N.; Ramirez-Zweiger, M.; Pandya, Tara
In this paper, three complementary methods have been implemented in the code Denovo that accelerate neutral particle transport calculations with methods that use leadership-class computers fully and effectively: a multigroup block (MG) Krylov solver, a Rayleigh quotient iteration (RQI) eigenvalue solver, and a multigrid in energy (MGE) preconditioner. The MG Krylov solver converges more quickly than Gauss Seidel and enables energy decomposition such that Denovo can scale to hundreds of thousands of cores. RQI should converge in fewer iterations than power iteration (PI) for large and challenging problems. RQI creates shifted systems that would not be tractable without the MGmore » Krylov solver. It also creates ill-conditioned matrices. The MGE preconditioner reduces iteration count significantly when used with RQI and takes advantage of the new energy decomposition such that it can scale efficiently. Each individual method has been described before, but this is the first time they have been demonstrated to work together effectively. The combination of solvers enables the RQI eigenvalue solver to work better than the other available solvers for large reactors problems on leadership-class machines. Using these methods together, RQI converged in fewer iterations and in less time than PI for a full pressurized water reactor core. These solvers also performed better than an Arnoldi eigenvalue solver for a reactor benchmark problem when energy decomposition is needed. The MG Krylov, MGE preconditioner, and RQI solver combination also scales well in energy. Finally, this solver set is a strong choice for very large and challenging problems.« less
Gauss-Seidel Iterative Method as a Real-Time Pile-Up Solver of Scintillation Pulses
NASA Astrophysics Data System (ADS)
Novak, Roman; Vencelj, Matja¿
2009-12-01
The pile-up rejection in nuclear spectroscopy has been confronted recently by several pile-up correction schemes that compensate for distortions of the signal and subsequent energy spectra artifacts as the counting rate increases. We study here a real-time capability of the event-by-event correction method, which at the core translates to solving many sets of linear equations. Tight time limits and constrained front-end electronics resources make well-known direct solvers inappropriate. We propose a novel approach based on the Gauss-Seidel iterative method, which turns out to be a stable and cost-efficient solution to improve spectroscopic resolution in the front-end electronics. We show the method convergence properties for a class of matrices that emerge in calorimetric processing of scintillation detector signals and demonstrate the ability of the method to support the relevant resolutions. The sole iteration-based error component can be brought below the sliding window induced errors in a reasonable number of iteration steps, thus allowing real-time operation. An area-efficient hardware implementation is proposed that fully utilizes the method's inherent parallelism.
NONLINEAR MULTIGRID SOLVER EXPLOITING AMGe COARSE SPACES WITH APPROXIMATION PROPERTIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christensen, Max La Cour; Villa, Umberto E.; Engsig-Karup, Allan P.
The paper introduces a nonlinear multigrid solver for mixed nite element discretizations based on the Full Approximation Scheme (FAS) and element-based Algebraic Multigrid (AMGe). The main motivation to use FAS for unstruc- tured problems is the guaranteed approximation property of the AMGe coarse spaces that were developed recently at Lawrence Livermore National Laboratory. These give the ability to derive stable and accurate coarse nonlinear discretization problems. The previous attempts (including ones with the original AMGe method, [5, 11]), were less successful due to lack of such good approximation properties of the coarse spaces. With coarse spaces with approximation properties, ourmore » FAS approach on un- structured meshes should be as powerful/successful as FAS on geometrically re ned meshes. For comparison, Newton's method and Picard iterations with an inner state-of-the-art linear solver is compared to FAS on a nonlinear saddle point problem with applications to porous media ow. It is demonstrated that FAS is faster than Newton's method and Picard iterations for the experiments considered here. Due to the guaranteed approximation properties of our AMGe, the coarse spaces are very accurate, providing a solver with the potential for mesh-independent convergence on general unstructured meshes.« less
Till, Andrew T.; Warsa, James S.; Morel, Jim E.
2018-06-15
The thermal radiative transfer (TRT) equations comprise a radiation equation coupled to the material internal energy equation. Linearization of these equations produces effective, thermally-redistributed scattering through absorption-reemission. In this paper, we investigate the effectiveness and efficiency of Linear-Multi-Frequency-Grey (LMFG) acceleration that has been reformulated for use as a preconditioner to Krylov iterative solution methods. We introduce two general frameworks, the scalar flux formulation (SFF) and the absorption rate formulation (ARF), and investigate their iterative properties in the absence and presence of true scattering. SFF has a group-dependent state size but may be formulated without inner iterations in the presence ofmore » scattering, while ARF has a group-independent state size but requires inner iterations when scattering is present. We compare and evaluate the computational cost and efficiency of LMFG applied to these two formulations using a direct solver for the preconditioners. Finally, this work is novel because the use of LMFG for the radiation transport equation, in conjunction with Krylov methods, involves special considerations not required for radiation diffusion.« less
A survey of packages for large linear systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Milne, Brent
2000-02-11
This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
ML 3.0 smoothed aggregation user's guide.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sala, Marzio; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen
2004-05-01
ML is a multigrid preconditioning package intended to solve linear systems of equations Az = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package ormore » to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the AZTEC 2.1 and AZTECOO iterative package [15]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and non-symmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.« less
ML 3.1 smoothed aggregation user's guide.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sala, Marzio; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen
2004-10-01
ML is a multigrid preconditioning package intended to solve linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package ormore » to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the Aztec 2.1 and AztecOO iterative package [16]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.« less
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Sankar, Lakshmi N.; Hixon, Duane
1992-01-01
The development of efficient iterative solution methods for the numerical solution of two- and three-dimensional compressible Navier-Stokes equations is discussed. Iterative time marching methods have several advantages over classical multi-step explicit time marching schemes, and non-iterative implicit time marching schemes. Iterative schemes have better stability characteristics than non-iterative explicit and implicit schemes. In this work, another approach based on the classical conjugate gradient method, known as the Generalized Minimum Residual (GMRES) algorithm is investigated. The GMRES algorithm has been used in the past by a number of researchers for solving steady viscous and inviscid flow problems. Here, we investigate the suitability of this algorithm for solving the system of non-linear equations that arise in unsteady Navier-Stokes solvers at each time step.
Use of general purpose graphics processing units with MODFLOW
Hughes, Joseph D.; White, Jeremy T.
2013-01-01
To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Application of Nearly Linear Solvers to Electric Power System Computation
NASA Astrophysics Data System (ADS)
Grant, Lisa L.
To meet the future needs of the electric power system, improvements need to be made in the areas of power system algorithms, simulation, and modeling, specifically to achieve a time frame that is useful to industry. If power system time-domain simulations could run in real-time, then system operators would have situational awareness to implement online control and avoid cascading failures, significantly improving power system reliability. Several power system applications rely on the solution of a very large linear system. As the demands on power systems continue to grow, there is a greater computational complexity involved in solving these large linear systems within reasonable time. This project expands on the current work in fast linear solvers, developed for solving symmetric and diagonally dominant linear systems, in order to produce power system specific methods that can be solved in nearly-linear run times. The work explores a new theoretical method that is based on ideas in graph theory and combinatorics. The technique builds a chain of progressively smaller approximate systems with preconditioners based on the system's low stretch spanning tree. The method is compared to traditional linear solvers and shown to reduce the time and iterations required for an accurate solution, especially as the system size increases. A simulation validation is performed, comparing the solution capabilities of the chain method to LU factorization, which is the standard linear solver for power flow. The chain method was successfully demonstrated to produce accurate solutions for power flow simulation on a number of IEEE test cases, and a discussion on how to further improve the method's speed and accuracy is included.
CUDA GPU based full-Stokes finite difference modelling of glaciers
NASA Astrophysics Data System (ADS)
Brædstrup, C. F.; Egholm, D. L.
2012-04-01
Many have stressed the limitations of using the shallow shelf and shallow ice approximations when modelling ice streams or surging glaciers. Using a full-stokes approach requires either large amounts of computer power or time and is therefore seldom an option for most glaciologists. Recent advances in graphics card (GPU) technology for high performance computing have proven extremely efficient in accelerating many large scale scientific computations. The general purpose GPU (GPGPU) technology is cheap, has a low power consumption and fits into a normal desktop computer. It could therefore provide a powerful tool for many glaciologists. Our full-stokes ice sheet model implements a Red-Black Gauss-Seidel iterative linear solver to solve the full stokes equations. This technique has proven very effective when applied to the stokes equation in geodynamics problems, and should therefore also preform well in glaciological flow probems. The Gauss-Seidel iterator is known to be robust but several other linear solvers have a much faster convergence. To aid convergence, the solver uses a multigrid approach where values are interpolated and extrapolated between different grid resolutions to minimize the short wavelength errors efficiently. This reduces the iteration count by several orders of magnitude. The run-time is further reduced by using the GPGPU technology where each card has up to 448 cores. Researchers utilizing the GPGPU technique in other areas have reported between 2 - 11 times speedup compared to multicore CPU implementations on similar problems. The goal of these initial investigations into the possible usage of GPGPU technology in glacial modelling is to apply the enhanced resolution of a full-stokes solver to ice streams and surging glaciers. This is a area of growing interest because ice streams are the main drainage conjugates for large ice sheets. It is therefore crucial to understand this streaming behavior and it's impact up-ice.
NASA Astrophysics Data System (ADS)
Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike
The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Three-dimensional unstructured grid Euler computations using a fully-implicit, upwind method
NASA Technical Reports Server (NTRS)
Whitaker, David L.
1993-01-01
A method has been developed to solve the Euler equations on a three-dimensional unstructured grid composed of tetrahedra. The method uses an upwind flow solver with a linearized, backward-Euler time integration scheme. Each time step results in a sparse linear system of equations which is solved by an iterative, sparse matrix solver. Local-time stepping, switched evolution relaxation (SER), preconditioning and reuse of the Jacobian are employed to accelerate the convergence rate. Implicit boundary conditions were found to be extremely important for fast convergence. Numerical experiments have shown that convergence rates comparable to that of a multigrid, central-difference scheme are achievable on the same mesh. Results are presented for several grids about an ONERA M6 wing.
NASA Technical Reports Server (NTRS)
Datta, Anubhav; Johnson, Wayne R.
2009-01-01
This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil
2012-01-01
The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.
1991-01-01
Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
NASA Astrophysics Data System (ADS)
Sanan, Patrick; May, Dave A.; Schenk, Olaf; Bollhöffer, Matthias
2017-04-01
Geodynamics simulations typically involve the repeated solution of saddle-point systems arising from the Stokes equations. These computations often dominate the time to solution. Direct solvers are known for their robustness and ``black box'' properties, yet exhibit superlinear memory requirements and time to solution. More complex multilevel-preconditioned iterative solvers have been very successful for large problems, yet their use can require more effort from the practitioner in terms of setting up a solver and choosing its parameters. We champion an intermediate approach, based on leveraging the power of modern incomplete factorization techniques for indefinite symmetric matrices. These provide an interesting alternative in situations in between the regimes where direct solvers are an obvious choice and those where complex, scalable, iterative solvers are an obvious choice. That is, much like their relatives for definite systems, ILU/ICC-preconditioned Krylov methods and ILU/ICC-smoothed multigrid methods, the approaches demonstrated here provide a useful addition to the solver toolkit. We present results with a simple, PETSc-based, open-source Q2-Q1 (Taylor-Hood) finite element discretization, in 2 and 3 dimensions, with the Stokes and Lamé (linear elasticity) saddle point systems. Attention is paid to cases in which full-operator incomplete factorization gives an improvement in time to solution over direct solution methods (which may not even be feasible due to memory limitations), without the complication of more complex (or at least, less-automatic) preconditioners or smoothers. As an important factor in the relevance of these tools is their availability in portable software, we also describe open-source PETSc interfaces to the factorization routines.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. Dale, Jr.
1990-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number of operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. D., Jr.
1992-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number od operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
Parallel Symmetric Eigenvalue Problem Solvers
2015-05-01
get research, tutoring, and mentoring experience as an undergraduate. Last but not least, I thank my family for their love and support. v TABLE OF...32 4.6.2 Choice of the Ritz shifts . . . . . . . . . . . . . . . . . . . . 37 4.7 Relationship between...pencil. I will conclude with a discussion of the relationship between Trace- Min and simultaneous iteration. If both methods solve the linear systems
Generalized conjugate-gradient methods for the Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing
1991-01-01
A generalized conjugate-gradient method is used to solve the two-dimensional, compressible Navier-Stokes equations of fluid flow. The equations are discretized with an implicit, upwind finite-volume formulation. Preconditioning techniques are incorporated into the new solver to accelerate convergence of the overall iterative method. The superiority of the new solver is demonstrated by comparisons with a conventional line Gauss-Siedel Relaxation solver. Computational test results for transonic flow (trailing edge flow in a transonic turbine cascade) and hypersonic flow (M = 6.0 shock-on-shock phenoena on a cylindrical leading edge) are presented. When applied to the transonic cascade case, the new solver is 4.4 times faster in terms of number of iterations and 3.1 times faster in terms of CPU time than the Relaxation solver. For the hypersonic shock case, the new solver is 3.0 times faster in terms of number of iterations and 2.2 times faster in terms of CPU time than the Relaxation solver.
NASA Astrophysics Data System (ADS)
Chacon, L.; Finn, J. M.; Knoll, D. A.
2000-10-01
Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
Nonlinear Krylov and moving nodes in the method of lines
NASA Astrophysics Data System (ADS)
Miller, Keith
2005-11-01
We report on some successes and problem areas in the Method of Lines from our work with moving node finite element methods. First, we report on our "nonlinear Krylov accelerator" for the modified Newton's method on the nonlinear equations of our stiff ODE solver. Since 1990 it has been robust, simple, cheap, and automatic on all our moving node computations. We publicize further trials with it here because it should be of great general usefulness to all those solving evolutionary equations. Second, we discuss the need for reliable automatic choice of spatially variable time steps. Third, we discuss the need for robust and efficient iterative solvers for the difficult linearized equations (Jx=b) of our stiff ODE solver. Here, the 1997 thesis of Zulu Xaba has made significant progress.
Application of Conjugate Gradient methods to tidal simulation
Barragy, E.; Carey, G.F.; Walters, R.A.
1993-01-01
A harmonic decomposition technique is applied to the shallow water equations to yield a complex, nonsymmetric, nonlinear, Helmholtz type problem for the sea surface and an accompanying complex, nonlinear diagonal problem for the velocities. The equation for the sea surface is linearized using successive approximation and then discretized with linear, triangular finite elements. The study focuses on applying iterative methods to solve the resulting complex linear systems. The comparative evaluation includes both standard iterative methods for the real subsystems and complex versions of the well known Bi-Conjugate Gradient and Bi-Conjugate Gradient Squared methods. Several Incomplete LU type preconditioners are discussed, and the effects of node ordering, rejection strategy, domain geometry and Coriolis parameter (affecting asymmetry) are investigated. Implementation details for the complex case are discussed. Performance studies are presented and comparisons made with a frontal solver. ?? 1993.
Multi-GPU Accelerated Admittance Method for High-Resolution Human Exposure Evaluation.
Xiong, Zubiao; Feng, Shi; Kautz, Richard; Chandra, Sandeep; Altunyurt, Nevin; Chen, Ji
2015-12-01
A multi-graphics processing unit (GPU) accelerated admittance method solver is presented for solving the induced electric field in high-resolution anatomical models of human body when exposed to external low-frequency magnetic fields. In the solver, the anatomical model is discretized as a three-dimensional network of admittances. The conjugate orthogonal conjugate gradient (COCG) iterative algorithm is employed to take advantage of the symmetric property of the complex-valued linear system of equations. Compared against the widely used biconjugate gradient stabilized method, the COCG algorithm can reduce the solving time by 3.5 times and reduce the storage requirement by about 40%. The iterative algorithm is then accelerated further by using multiple NVIDIA GPUs. The computations and data transfers between GPUs are overlapped in time by using asynchronous concurrent execution design. The communication overhead is well hidden so that the acceleration is nearly linear with the number of GPU cards. Numerical examples show that our GPU implementation running on four NVIDIA Tesla K20c cards can reach 90 times faster than the CPU implementation running on eight CPU cores (two Intel Xeon E5-2603 processors). The implemented solver is able to solve large dimensional problems efficiently. A whole adult body discretized in 1-mm resolution can be solved in just several minutes. The high efficiency achieved makes it practical to investigate human exposure involving a large number of cases with a high resolution that meets the requirements of international dosimetry guidelines.
Efficient Iterative Methods Applied to the Solution of Transonic Flows
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.
1996-02-01
We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
NASA Technical Reports Server (NTRS)
Ortega, J. M.
1986-01-01
Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blanford, M.
1997-12-31
Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less
Brown, Angus M
2006-04-01
The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.
Parallel iterative solution for h and p approximations of the shallow water equations
Barragy, E.J.; Walters, R.A.
1998-01-01
A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.
Non-linear eigensolver-based alternative to traditional SCF methods
NASA Astrophysics Data System (ADS)
Gavin, Brendan; Polizzi, Eric
2013-03-01
The self-consistent iterative procedure in Density Functional Theory calculations is revisited using a new, highly efficient and robust algorithm for solving the non-linear eigenvector problem (i.e. H(X)X = EX;) of the Kohn-Sham equations. This new scheme is derived from a generalization of the FEAST eigenvalue algorithm, and provides a fundamental and practical numerical solution for addressing the non-linearity of the Hamiltonian with the occupied eigenvectors. In contrast to SCF techniques, the traditional outer iterations are replaced by subspace iterations that are intrinsic to the FEAST algorithm, while the non-linearity is handled at the level of a projected reduced system which is orders of magnitude smaller than the original one. Using a series of numerical examples, it will be shown that our approach can outperform the traditional SCF mixing techniques such as Pulay-DIIS by providing a high converge rate and by converging to the correct solution regardless of the choice of the initial guess. We also discuss a practical implementation of the technique that can be achieved effectively using the FEAST solver package. This research is supported by NSF under Grant #ECCS-0846457 and Intel Corporation.
1984-04-01
numerical solution, of sstem ot stiff Wh-f Cr ODs. Fro- qontl. a substantial portia of the total computationskwok and cooap required! to solve stiff...exep, possl- bly, foreciadalms of problem. That is% a syste of linewat o nonlinear algebrac equa- tion mumt be solved at auk step of the numerical ...onjugate gradient method [431 is a mall-know ezuze, have prove to be particularly -2- efecti for solving the linear stwem that &ise in the numerical
Efficient iterative methods applied to the solution of transonic flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wissink, A.M.; Lyrintzis, A.S.; Chronopoulos, A.T.
1996-02-01
We investigate the use of an inexact Newton`s method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton`s method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GIVIRES method. The preconditionermore » is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton- GIVIRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems. 38 refs., 14 figs., 7 tabs.« less
Decreasing the temporal complexity for nonlinear, implicit reduced-order models by forecasting
Carlberg, Kevin; Ray, Jaideep; van Bloemen Waanders, Bart
2015-02-14
Implicit numerical integration of nonlinear ODEs requires solving a system of nonlinear algebraic equations at each time step. Each of these systems is often solved by a Newton-like method, which incurs a sequence of linear-system solves. Most model-reduction techniques for nonlinear ODEs exploit knowledge of system's spatial behavior to reduce the computational complexity of each linear-system solve. However, the number of linear-system solves for the reduced-order simulation often remains roughly the same as that for the full-order simulation. We propose exploiting knowledge of the model's temporal behavior to (1) forecast the unknown variable of the reduced-order system of nonlinear equationsmore » at future time steps, and (2) use this forecast as an initial guess for the Newton-like solver during the reduced-order-model simulation. To compute the forecast, we propose using the Gappy POD technique. As a result, the goal is to generate an accurate initial guess so that the Newton solver requires many fewer iterations to converge, thereby decreasing the number of linear-system solves in the reduced-order-model simulation.« less
A second order discontinuous Galerkin fast sweeping method for Eikonal equations
NASA Astrophysics Data System (ADS)
Li, Fengyan; Shu, Chi-Wang; Zhang, Yong-Tao; Zhao, Hongkai
2008-09-01
In this paper, we construct a second order fast sweeping method with a discontinuous Galerkin (DG) local solver for computing viscosity solutions of a class of static Hamilton-Jacobi equations, namely the Eikonal equations. Our piecewise linear DG local solver is built on a DG method developed recently [Y. Cheng, C.-W. Shu, A discontinuous Galerkin finite element method for directly solving the Hamilton-Jacobi equations, Journal of Computational Physics 223 (2007) 398-415] for the time-dependent Hamilton-Jacobi equations. The causality property of Eikonal equations is incorporated into the design of this solver. The resulting local nonlinear system in the Gauss-Seidel iterations is a simple quadratic system and can be solved explicitly. The compactness of the DG method and the fast sweeping strategy lead to fast convergence of the new scheme for Eikonal equations. Extensive numerical examples verify efficiency, convergence and second order accuracy of the proposed method.
Complex wet-environments in electronic-structure calculations
NASA Astrophysics Data System (ADS)
Fisicaro, Giuseppe; Genovese, Luigi; Andreussi, Oliviero; Marzari, Nicola; Goedecker, Stefan
The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of an applied electrochemical potentials, including complex electrostatic screening coming from the solvent. In the present work we present a solver to handle both the Generalized Poisson and the Poisson-Boltzmann equation. A preconditioned conjugate gradient (PCG) method has been implemented for the Generalized Poisson and the linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations. On the other hand, a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. The algorithms take advantage of a preconditioning procedure based on the BigDFT Poisson solver for the standard Poisson equation. They exhibit very high accuracy and parallel efficiency, and allow different boundary conditions, including surfaces. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and it will be released as a independent program, suitable for integration in other codes. We present test calculations for large proteins to demonstrate efficiency and performances. This work was done within the PASC and NCCR MARVEL projects. Computer resources were provided by the Swiss National Supercomputing Centre (CSCS) under Project ID s499. LG acknowledges also support from the EXTMOS EU project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laboure, Vincent M., E-mail: vincent.laboure@tamu.edu; McClarren, Ryan G., E-mail: rgm@tamu.edu; Hauck, Cory D., E-mail: hauckc@ornl.gov
2016-09-15
In this work, we provide a fully-implicit implementation of the time-dependent, filtered spherical harmonics (FP{sub N}) equations for non-linear, thermal radiative transfer. We investigate local filtering strategies and analyze the effect of the filter on the conditioning of the system, showing in particular that the filter improves the convergence properties of the iterative solver. We also investigate numerically the rigorous error estimates derived in the linear setting, to determine whether they hold also for the non-linear case. Finally, we simulate a standard test problem on an unstructured mesh and make comparisons with implicit Monte Carlo (IMC) calculations.
MILAMIN 2 - Fast MATLAB FEM solver
NASA Astrophysics Data System (ADS)
Dabrowski, Marcin; Krotkiewski, Marcin; Schmid, Daniel W.
2013-04-01
MILAMIN is a free and efficient MATLAB-based two-dimensional FEM solver utilizing unstructured meshes [Dabrowski et al., G-cubed (2008)]. The code consists of steady-state thermal diffusion and incompressible Stokes flow solvers implemented in approximately 200 lines of native MATLAB code. The brevity makes the code easily customizable. An important quality of MILAMIN is speed - it can handle millions of nodes within minutes on one CPU core of a standard desktop computer, and is faster than many commercial solutions. The new MILAMIN 2 allows three-dimensional modeling. It is designed as a set of functional modules that can be used as building blocks for efficient FEM simulations using MATLAB. The utilities are largely implemented as native MATLAB functions. For performance critical parts we use MUTILS - a suite of compiled MEX functions optimized for shared memory multi-core computers. The most important features of MILAMIN 2 are: 1. Modular approach to defining, tracking, and discretizing the geometry of the model 2. Interfaces to external mesh generators (e.g., Triangle, Fade2d, T3D) and mesh utilities (e.g., element type conversion, fast point location, boundary extraction) 3. Efficient computation of the stiffness matrix for a wide range of element types, anisotropic materials and three-dimensional problems 4. Fast global matrix assembly using a dedicated MEX function 5. Automatic integration rules 6. Flexible prescription (spatial, temporal, and field functions) and efficient application of Dirichlet, Neuman, and periodic boundary conditions 7. Treatment of transient and non-linear problems 8. Various iterative and multi-level solution strategies 9. Post-processing tools (e.g., numerical integration) 10. Visualization primitives using MATLAB, and VTK export functions We provide a large number of examples that show how to implement a custom FEM solver using the MILAMIN 2 framework. The examples are MATLAB scripts of increasing complexity that address a given technical topic (e.g., creating meshes, reordering nodes, applying boundary conditions), a given numerical topic (e.g., using various solution strategies, non-linear iterations), or that present a fully-developed solver designed to address a scientific topic (e.g., performing Stokes flow simulations in synthetic porous medium). References: Dabrowski, M., M. Krotkiewski, and D. W. Schmid MILAMIN: MATLAB-based finite element method solver for large problems, Geochem. Geophys. Geosyst., 9, Q04030, 2008
IETI – Isogeometric Tearing and Interconnecting
Kleiss, Stefan K.; Pechstein, Clemens; Jüttler, Bert; Tomar, Satyendra
2012-01-01
Finite Element Tearing and Interconnecting (FETI) methods are a powerful approach to designing solvers for large-scale problems in computational mechanics. The numerical simulation problem is subdivided into a number of independent sub-problems, which are then coupled in appropriate ways. NURBS- (Non-Uniform Rational B-spline) based isogeometric analysis (IGA) applied to complex geometries requires to represent the computational domain as a collection of several NURBS geometries. Since there is a natural decomposition of the computational domain into several subdomains, NURBS-based IGA is particularly well suited for using FETI methods. This paper proposes the new IsogEometric Tearing and Interconnecting (IETI) method, which combines the advanced solver design of FETI with the exact geometry representation of IGA. We describe the IETI framework for two classes of simple model problems (Poisson and linearized elasticity) and discuss the coupling of the subdomains along interfaces (both for matching interfaces and for interfaces with T-joints, i.e. hanging nodes). Special attention is paid to the construction of a suitable preconditioner for the iterative linear solver used for the interface problem. We report several computational experiments to demonstrate the performance of the proposed IETI method. PMID:24511167
Performance of a parallel thermal-hydraulics code TEMPEST
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fann, G.I.; Trent, D.S.
The authors describe the parallelization of the Tempest thermal-hydraulics code. The serial version of this code is used for production quality 3-D thermal-hydraulics simulations. Good speedup was obtained with a parallel diagonally preconditioned BiCGStab non-symmetric linear solver, using a spatial domain decomposition approach for the semi-iterative pressure-based and mass-conserved algorithm. The test case used here to illustrate the performance of the BiCGStab solver is a 3-D natural convection problem modeled using finite volume discretization in cylindrical coordinates. The BiCGStab solver replaced the LSOR-ADI method for solving the pressure equation in TEMPEST. BiCGStab also solves the coupled thermal energy equation. Scalingmore » performance of 3 problem sizes (221220 nodes, 358120 nodes, and 701220 nodes) are presented. These problems were run on 2 different parallel machines: IBM-SP and SGI PowerChallenge. The largest problem attains a speedup of 68 on an 128 processor IBM-SP. In real terms, this is over 34 times faster than the fastest serial production time using the LSOR-ADI solver.« less
Use of direct and iterative solvers for estimation of SNP effects in genome-wide selection
2010-01-01
The aim of this study was to compare iterative and direct solvers for estimation of marker effects in genomic selection. One iterative and two direct methods were used: Gauss-Seidel with Residual Update, Cholesky Decomposition and Gentleman-Givens rotations. For resembling different scenarios with respect to number of markers and of genotyped animals, a simulated data set divided into 25 subsets was used. Number of markers ranged from 1,200 to 5,925 and number of animals ranged from 1,200 to 5,865. Methods were also applied to real data comprising 3081 individuals genotyped for 45181 SNPs. Results from simulated data showed that the iterative solver was substantially faster than direct methods for larger numbers of markers. Use of a direct solver may allow for computing (co)variances of SNP effects. When applied to real data, performance of the iterative method varied substantially, depending on the level of ill-conditioning of the coefficient matrix. From results with real data, Gentleman-Givens rotations would be the method of choice in this particular application as it provided an exact solution within a fairly reasonable time frame (less than two hours). It would indeed be the preferred method whenever computer resources allow its use. PMID:21637627
DOE Office of Scientific and Technical Information (OSTI.GOV)
Srinath Vadlamani; Scott Kruger; Travis Austin
Extended magnetohydrodynamic (MHD) codes are used to model the large, slow-growing instabilities that are projected to limit the performance of International Thermonuclear Experimental Reactor (ITER). The multiscale nature of the extended MHD equations requires an implicit approach. The current linear solvers needed for the implicit algorithm scale poorly because the resultant matrices are so ill-conditioned. A new solver is needed, especially one that scales to the petascale. The most successful scalable parallel processor solvers to date are multigrid solvers. Applying multigrid techniques to a set of equations whose fundamental modes are dispersive waves is a promising solution to CEMM problems.more » For the Phase 1, we implemented multigrid preconditioners from the HYPRE project of the Center for Applied Scientific Computing at LLNL via PETSc of the DOE SciDAC TOPS for the real matrix systems of the extended MHD code NIMROD which is a one of the primary modeling codes of the OFES-funded Center for Extended Magnetohydrodynamic Modeling (CEMM) SciDAC. We implemented the multigrid solvers on the fusion test problem that allows for real matrix systems with success, and in the process learned about the details of NIMROD data structures and the difficulties of inverting NIMROD operators. The further success of this project will allow for efficient usage of future petascale computers at the National Leadership Facilities: Oak Ridge National Laboratory, Argonne National Laboratory, and National Energy Research Scientific Computing Center. The project will be a collaborative effort between computational plasma physicists and applied mathematicians at Tech-X Corporation, applied mathematicians Front Range Scientific Computations, Inc. (who are collaborators on the HYPRE project), and other computational plasma physicists involved with the CEMM project.« less
Performance Models for the Spike Banded Linear System Solver
Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...
2011-01-01
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.
2005-01-01
A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.
Woodward, Carol S.; Gardner, David J.; Evans, Katherine J.
2015-01-01
Efficient solutions of global climate models require effectively handling disparate length and time scales. Implicit solution approaches allow time integration of the physical system with a step size governed by accuracy of the processes of interest rather than by stability of the fastest time scales present. Implicit approaches, however, require the solution of nonlinear systems within each time step. Usually, a Newton's method is applied to solve these systems. Each iteration of the Newton's method, in turn, requires the solution of a linear model of the nonlinear system. This model employs the Jacobian of the problem-defining nonlinear residual, but thismore » Jacobian can be costly to form. If a Krylov linear solver is used for the solution of the linear system, the action of the Jacobian matrix on a given vector is required. In the case of spectral element methods, the Jacobian is not calculated but only implemented through matrix-vector products. The matrix-vector multiply can also be approximated by a finite difference approximation which may introduce inaccuracy in the overall nonlinear solver. In this paper, we review the advantages and disadvantages of finite difference approximations of these matrix-vector products for climate dynamics within the spectral element shallow water dynamical core of the Community Atmosphere Model.« less
Fast secant methods for the iterative solution of large nonsymmetric linear systems
NASA Technical Reports Server (NTRS)
Deuflhard, Peter; Freund, Roland; Walter, Artur
1990-01-01
A family of secant methods based on general rank-1 updates was revisited in view of the construction of iterative solvers for large non-Hermitian linear systems. As it turns out, both Broyden's good and bad update techniques play a special role, but should be associated with two different line search principles. For Broyden's bad update technique, a minimum residual principle is natural, thus making it theoretically comparable with a series of well known algorithms like GMRES. Broyden's good update technique, however, is shown to be naturally linked with a minimum next correction principle, which asymptotically mimics a minimum error principle. The two minimization principles differ significantly for sufficiently large system dimension. Numerical experiments on discretized partial differential equations of convection diffusion type in 2-D with integral layers give a first impression of the possible power of the derived good Broyden variant.
Matrix decomposition graphics processing unit solver for Poisson image editing
NASA Astrophysics Data System (ADS)
Lei, Zhao; Wei, Li
2012-10-01
In recent years, gradient-domain methods have been widely discussed in the image processing field, including seamless cloning and image stitching. These algorithms are commonly carried out by solving a large sparse linear system: the Poisson equation. However, solving the Poisson equation is a computational and memory intensive task which makes it not suitable for real-time image editing. A new matrix decomposition graphics processing unit (GPU) solver (MDGS) is proposed to settle the problem. A matrix decomposition method is used to distribute the work among GPU threads, so that MDGS will take full advantage of the computing power of current GPUs. Additionally, MDGS is a hybrid solver (combines both the direct and iterative techniques) and has two-level architecture. These enable MDGS to generate identical solutions with those of the common Poisson methods and achieve high convergence rate in most cases. This approach is advantageous in terms of parallelizability, enabling real-time image processing, low memory-taken and extensive applications.
The novel high-performance 3-D MT inverse solver
NASA Astrophysics Data System (ADS)
Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey
2016-04-01
We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
Solution Methods for 3D Tomographic Inversion Using A Highly Non-Linear Ray Tracer
NASA Astrophysics Data System (ADS)
Hipp, J. R.; Ballard, S.; Young, C. J.; Chang, M.
2008-12-01
To develop 3D velocity models to improve nuclear explosion monitoring capability, we have developed a 3D tomographic modeling system that traces rays using an implementation of the Um and Thurber ray pseudo- bending approach, with full enforcement of Snell's Law in 3D at the major discontinuities. Due to the highly non-linear nature of the ray tracer, however, we are forced to substantially damp the inversion in order to converge on a reasonable model. Unfortunately the amount of damping is not known a priori and can significantly extend the number of calls of the computationally expensive ray-tracer and the least squares matrix solver. If the damping term is too small the solution step-size produces either an un-realistic model velocity change or places the solution in or near a local minimum from which extrication is nearly impossible. If the damping term is too large, convergence can be very slow or premature convergence can occur. Standard approaches involve running inversions with a suite of damping parameters to find the best model. A better solution methodology is to take advantage of existing non-linear solution techniques such as Levenberg-Marquardt (LM) or quasi-newton iterative solvers. In particular, the LM algorithm was specifically designed to find the minimum of a multi-variate function that is expressed as the sum of squares of non-linear real-valued functions. It has become a standard technique for solving non-linear least squared problems, and is widely adopted in a broad spectrum of disciplines, including the geosciences. At each iteration, the LM approach dynamically varies the level of damping to optimize convergence. When the current estimate of the solution is far from the ultimate solution LM behaves as a steepest decent method, but transitions to Gauss- Newton behavior, with near quadratic convergence, as the estimate approaches the final solution. We show typical linear solution techniques and how they can lead to local minima if the damping is set too low. We also describe the LM technique and show how it automatically determines the appropriate damping factor as it iteratively converges on the best solution. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under Contract DE-AC04- 94AL85000.
Implicit solvers for unstructured meshes
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.; Mavriplis, Dimitri J.
1991-01-01
Implicit methods were developed and tested for unstructured mesh computations. The approximate system which arises from the Newton linearization of the nonlinear evolution operator is solved by using the preconditioned GMRES (Generalized Minimum Residual) technique. Three different preconditioners were studied, namely, the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over relaxation (SSOR). The preconditioners were optimized to have good vectorization properties. SSOR and ILU were also studied as iterative schemes. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also studied. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spotz, William F.
PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
A Kernel-free Boundary Integral Method for Elliptic Boundary Value Problems ⋆
Ying, Wenjun; Henriquez, Craig S.
2013-01-01
This paper presents a class of kernel-free boundary integral (KFBI) methods for general elliptic boundary value problems (BVPs). The boundary integral equations reformulated from the BVPs are solved iteratively with the GMRES method. During the iteration, the boundary and volume integrals involving Green's functions are approximated by structured grid-based numerical solutions, which avoids the need to know the analytical expressions of Green's functions. The KFBI method assumes that the larger regular domain, which embeds the original complex domain, can be easily partitioned into a hierarchy of structured grids so that fast elliptic solvers such as the fast Fourier transform (FFT) based Poisson/Helmholtz solvers or those based on geometric multigrid iterations are applicable. The structured grid-based solutions are obtained with standard finite difference method (FDM) or finite element method (FEM), where the right hand side of the resulting linear system is appropriately modified at irregular grid nodes to recover the formal accuracy of the underlying numerical scheme. Numerical results demonstrating the efficiency and accuracy of the KFBI methods are presented. It is observed that the number of GM-RES iterations used by the method for solving isotropic and moderately anisotropic BVPs is independent of the sizes of the grids that are employed to approximate the boundary and volume integrals. With the standard second-order FEMs and FDMs, the KFBI method shows a second-order convergence rate in accuracy for all of the tested Dirichlet/Neumann BVPs when the anisotropy of the diffusion tensor is not too strong. PMID:23519600
Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Widlund, Olof B.
2015-06-09
The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less
Inlet Spillage Drag Predictions Using the AIRPLANE Code
NASA Technical Reports Server (NTRS)
Thomas, Scott D.; Won, Mark A.; Cliff, Susan E.
1999-01-01
AIRPLANE (Jameson/Baker) is a steady inviscid unstructured Euler flow solver. It has been validated on many HSR geometries. It is implemented as MESHPLANE, an unstructured mesh generator, and FLOPLANE, an iterative flow solver. The surface description from an Intergraph CAD system goes into MESHPLANE as collections of polygonal curves to generate the 3D mesh. The flow solver uses a multistage time stepping scheme with residual averaging to approach steady state, but R is not time accurate. The flow solver was ported from Cray to IBM SP2 by Wu-Sun Cheng (IBM); it could only be run on 4 CPUs at a time because of memory limitations. Meshes for the four cases had about 655,000 points in the flow field, about 3.9 million tetrahedra, about 77,500 points on the surface. The flow solver took about 23 wall seconds per iteration when using 4 CPUs. It took about eight and a half wall hours to run 1,300 iterations at a time (the queue limit is 10 hours). A revised version of FLOPLANE (Thomas) was used on up to 64 CPUs to finish up some calculations at the end. We had to turn on more communication when using more processors to eliminate noise that was contaminating the flow field; this added about 50% to the elapsed wall time per iteration when using 64 CPUs. This study involved computing lift and drag for a wing/body/nacelle configuration at Mach 0.9 and 4 degrees pitch. Four cases were considered, corresponding to four nacelle mass flow conditions.
Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy.
Zelyak, O; Fallone, B G; St-Aubin, J
2017-12-14
Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.
Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy
NASA Astrophysics Data System (ADS)
Zelyak, O.; Fallone, B. G.; St-Aubin, J.
2018-01-01
Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.
Corrigendum to "Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy".
Zelyak, Oleksandr; Fallone, B Gino; St-Aubin, Joel
2018-03-12
Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation. © 2018 Institute of Physics and Engineering in Medicine.
NASA Astrophysics Data System (ADS)
Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.
2016-05-01
This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj
2006-01-01
A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.
NASA Astrophysics Data System (ADS)
Cools, S.; Vanroose, W.
2016-03-01
This paper improves the convergence and robustness of a multigrid-based solver for the cross sections of the driven Schrödinger equation. Adding a Coupled Channel Correction Step (CCCS) after each multigrid (MG) V-cycle efficiently removes the errors that remain after the V-cycle sweep. The combined iterative solution scheme (MG-CCCS) is shown to feature significantly improved convergence rates over the classical MG method at energies where bound states dominate the solution, resulting in a fast and scalable solution method for the complex-valued Schrödinger break-up problem for any energy regime. The proposed solver displays optimal scaling; a solution is found in a time that is linear in the number of unknowns. The method is validated on a 2D Temkin-Poet model problem, and convergence results both as a solver and preconditioner are provided to support the O (N) scalability of the method. This paper extends the applicability of the complex contour approach for far field map computation (Cools et al. (2014) [10]).
Template-Based 3D Reconstruction of Non-rigid Deformable Object from Monocular Video
NASA Astrophysics Data System (ADS)
Liu, Yang; Peng, Xiaodong; Zhou, Wugen; Liu, Bo; Gerndt, Andreas
2018-06-01
In this paper, we propose a template-based 3D surface reconstruction system of non-rigid deformable objects from monocular video sequence. Firstly, we generate a semi-dense template of the target object with structure from motion method using a subsequence video. This video can be captured by rigid moving camera orienting the static target object or by a static camera observing the rigid moving target object. Then, with the reference template mesh as input and based on the framework of classical template-based methods, we solve an energy minimization problem to get the correspondence between the template and every frame to get the time-varying mesh to present the deformation of objects. The energy terms combine photometric cost, temporal and spatial smoothness cost as well as as-rigid-as-possible cost which can enable elastic deformation. In this paper, an easy and controllable solution to generate the semi-dense template for complex objects is presented. Besides, we use an effective iterative Schur based linear solver for the energy minimization problem. The experimental evaluation presents qualitative deformation objects reconstruction results with real sequences. Compare against the results with other templates as input, the reconstructions based on our template have more accurate and detailed results for certain regions. The experimental results show that the linear solver we used performs better efficiency compared to traditional conjugate gradient based solver.
Finite element analysis of periodic transonic flow problems
NASA Technical Reports Server (NTRS)
Fix, G. J.
1978-01-01
Flow about an oscillating thin airfoil in a transonic stream was considered. It was assumed that the flow field can be decomposed into a mean flow plus a periodic perturbation. On the surface of the airfoil the usual Neumman conditions are imposed. Two computer programs were written, both using linear basis functions over triangles for the finite element space. The first program uses a banded Gaussian elimination solver to solve the matrix problem, while the second uses an iterative technique, namely SOR. The only results obtained are for an oscillating flat plate.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gearhart, Jared Lee; Adair, Kristin Lynn; Durfee, Justin David.
When developing linear programming models, issues such as budget limitations, customer requirements, or licensing may preclude the use of commercial linear programming solvers. In such cases, one option is to use an open-source linear programming solver. A survey of linear programming tools was conducted to identify potential open-source solvers. From this survey, four open-source solvers were tested using a collection of linear programming test problems and the results were compared to IBM ILOG CPLEX Optimizer (CPLEX) [1], an industry standard. The solvers considered were: COIN-OR Linear Programming (CLP) [2], [3], GNU Linear Programming Kit (GLPK) [4], lp_solve [5] and Modularmore » In-core Nonlinear Optimization System (MINOS) [6]. As no open-source solver outperforms CPLEX, this study demonstrates the power of commercial linear programming software. CLP was found to be the top performing open-source solver considered in terms of capability and speed. GLPK also performed well but cannot match the speed of CLP or CPLEX. lp_solve and MINOS were considerably slower and encountered issues when solving several test problems.« less
Conservative tightly-coupled simulations of stochastic multiscale systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Taverniers, Søren; Pigarov, Alexander Y.; Tartakovsky, Daniel M., E-mail: dmt@ucsd.edu
2016-05-15
Multiphysics problems often involve components whose macroscopic dynamics is driven by microscopic random fluctuations. The fidelity of simulations of such systems depends on their ability to propagate these random fluctuations throughout a computational domain, including subdomains represented by deterministic solvers. When the constituent processes take place in nonoverlapping subdomains, system behavior can be modeled via a domain-decomposition approach that couples separate components at the interfaces between these subdomains. Its coupling algorithm has to maintain a stable and efficient numerical time integration even at high noise strength. We propose a conservative domain-decomposition algorithm in which tight coupling is achieved by employingmore » either Picard's or Newton's iterative method. Coupled diffusion equations, one of which has a Gaussian white-noise source term, provide a computational testbed for analysis of these two coupling strategies. Fully-converged (“implicit”) coupling with Newton's method typically outperforms its Picard counterpart, especially at high noise levels. This is because the number of Newton iterations scales linearly with the amplitude of the Gaussian noise, while the number of Picard iterations can scale superlinearly. At large time intervals between two subsequent inter-solver communications, the solution error for single-iteration (“explicit”) Picard's coupling can be several orders of magnitude higher than that for implicit coupling. Increasing the explicit coupling's communication frequency reduces this difference, but the resulting increase in computational cost can make it less efficient than implicit coupling at similar levels of solution error, depending on the communication frequency of the latter and the noise strength. This trend carries over into higher dimensions, although at high noise strength explicit coupling may be the only computationally viable option.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Kuo -Ling; Mehrotra, Sanjay
We present a homogeneous algorithm equipped with a modified potential function for the monotone complementarity problem. We show that this potential function is reduced by at least a constant amount if a scaled Lipschitz condition (SLC) is satisfied. A practical algorithm based on this potential function is implemented in a software package named iOptimize. The implementation in iOptimize maintains global linear and polynomial time convergence properties, while achieving practical performance. It either successfully solves the problem, or concludes that the SLC is not satisfied. When compared with the mature software package MOSEK (barrier solver version 6.0.0.106), iOptimize solves convex quadraticmore » programming problems, convex quadratically constrained quadratic programming problems, and general convex programming problems in fewer iterations. Moreover, several problems for which MOSEK fails are solved to optimality. In addition, we also find that iOptimize detects infeasibility more reliably than the general nonlinear solvers Ipopt (version 3.9.2) and Knitro (version 8.0).« less
Wang, Wansheng; Chen, Long; Zhou, Jie
2015-01-01
A postprocessing technique for mixed finite element methods for the Cahn-Hilliard equation is developed and analyzed. Once the mixed finite element approximations have been computed at a fixed time on the coarser mesh, the approximations are postprocessed by solving two decoupled Poisson equations in an enriched finite element space (either on a finer grid or a higher-order space) for which many fast Poisson solvers can be applied. The nonlinear iteration is only applied to a much smaller size problem and the computational cost using Newton and direct solvers is negligible compared with the cost of the linear problem. The analysis presented here shows that this technique remains the optimal rate of convergence for both the concentration and the chemical potential approximations. The corresponding error estimate obtained in our paper, especially the negative norm error estimates, are non-trivial and different with the existing results in the literatures. PMID:27110063
An interior-point method-based solver for simulation of aircraft parts riveting
NASA Astrophysics Data System (ADS)
Stefanova, Maria; Yakunin, Sergey; Petukhova, Margarita; Lupuleac, Sergey; Kokkolaras, Michael
2018-05-01
The particularities of the aircraft parts riveting process simulation necessitate the solution of a large amount of contact problems. A primal-dual interior-point method-based solver is proposed for solving such problems efficiently. The proposed method features a worst case polynomial complexity bound ? on the number of iterations, where n is the dimension of the problem and ε is a threshold related to desired accuracy. In practice, the convergence is often faster than this worst case bound, which makes the method applicable to large-scale problems. The computational challenge is solving the system of linear equations because the associated matrix is ill conditioned. To that end, the authors introduce a preconditioner and a strategy for determining effective initial guesses based on the physics of the problem. Numerical results are compared with ones obtained using the Goldfarb-Idnani algorithm. The results demonstrate the efficiency of the proposed method.
Strongly Coupled Fluid-Body Dynamics in the Immersed Boundary Projection Method
NASA Astrophysics Data System (ADS)
Wang, Chengjie; Eldredge, Jeff D.
2014-11-01
A computational algorithm is developed to simulate dynamically coupled interaction between fluid and rigid bodies. The basic computational framework is built upon a multi-domain immersed boundary method library, whirl, developed in previous work. In this library, the Navier-Stokes equations for incompressible flow are solved on a uniform Cartesian grid by the vorticity-based immersed boundary projection method of Colonius and Taira. A solver for the dynamics of rigid-body systems is also included. The fluid and rigid-body solvers are strongly coupled with an iterative approach based on the block Gauss-Seidel method. Interfacial force, with its intimate connection with the Lagrange multipliers used in the fluid solver, is used as the primary iteration variable. Relaxation, developed from a stability analysis of the iterative scheme, is used to achieve convergence in only 2-4 iterations per time step. Several two- and three-dimensional numerical tests are conducted to validate and demonstrate the method, including flapping of flexible wings, self-excited oscillations of a system of linked plates and three-dimensional propulsion of flexible fluked tail. This work has been supported by AFOSR, under Award FA9550-11-1-0098.
Tezaur, I. K.; Perego, M.; Salinger, A. G.; ...
2015-04-27
This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, alongmore » with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.« less
On the implementation of an accurate and efficient solver for convection-diffusion equations
NASA Astrophysics Data System (ADS)
Wu, Chin-Tien
In this dissertation, we examine several different aspects of computing the numerical solution of the convection-diffusion equation. The solution of this equation often exhibits sharp gradients due to Dirichlet outflow boundaries or discontinuities in boundary conditions. Because of the singular-perturbed nature of the equation, numerical solutions often have severe oscillations when grid sizes are not small enough to resolve sharp gradients. To overcome such difficulties, the streamline diffusion discretization method can be used to obtain an accurate approximate solution in regions where the solution is smooth. To increase accuracy of the solution in the regions containing layers, adaptive mesh refinement and mesh movement based on a posteriori error estimations can be employed. An error-adapted mesh refinement strategy based on a posteriori error estimations is also proposed to resolve layers. For solving the sparse linear systems that arise from discretization, goemetric multigrid (MG) and algebraic multigrid (AMG) are compared. In addition, both methods are also used as preconditioners for Krylov subspace methods. We derive some convergence results for MG with line Gauss-Seidel smoothers and bilinear interpolation. Finally, while considering adaptive mesh refinement as an integral part of the solution process, it is natural to set a stopping tolerance for the iterative linear solvers on each mesh stage so that the difference between the approximate solution obtained from iterative methods and the finite element solution is bounded by an a posteriori error bound. Here, we present two stopping criteria. The first is based on a residual-type a posteriori error estimator developed by Verfurth. The second is based on an a posteriori error estimator, using local solutions, developed by Kay and Silvester. Our numerical results show the refined mesh obtained from the iterative solution which satisfies the second criteria is similar to the refined mesh obtained from the finite element solution.
Iterative Addition of Kinetic Effects to Cold Plasma RF Wave Solvers
NASA Astrophysics Data System (ADS)
Green, David; Berry, Lee; RF-SciDAC Collaboration
2017-10-01
The hot nature of fusion plasmas requires a wave vector dependent conductivity tensor for accurate calculation of wave heating and current drive. Traditional methods for calculating the linear, kinetic full-wave plasma response rely on a spectral method such that the wave vector dependent conductivity fits naturally within the numerical method. These methods have seen much success for application to the well-confined core plasma of tokamaks. However, quantitative prediction of high power RF antenna designs for fusion applications has meant a requirement of resolving the geometric details of the antenna and other plasma facing surfaces for which the Fourier spectral method is ill-suited. An approach to enabling the addition of kinetic effects to the more versatile finite-difference and finite-element cold-plasma full-wave solvers was presented by where an operator-split iterative method was outlined. Here we expand on this approach, examine convergence and present a simplified kinetic current estimator for rapidly updating the right-hand side of the wave equation with kinetic corrections. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Computational efficiency improvements for image colorization
NASA Astrophysics Data System (ADS)
Yu, Chao; Sharma, Gaurav; Aly, Hussein
2013-03-01
We propose an efficient algorithm for colorization of greyscale images. As in prior work, colorization is posed as an optimization problem: a user specifies the color for a few scribbles drawn on the greyscale image and the color image is obtained by propagating color information from the scribbles to surrounding regions, while maximizing the local smoothness of colors. In this formulation, colorization is obtained by solving a large sparse linear system, which normally requires substantial computation and memory resources. Our algorithm improves the computational performance through three innovations over prior colorization implementations. First, the linear system is solved iteratively without explicitly constructing the sparse matrix, which significantly reduces the required memory. Second, we formulate each iteration in terms of integral images obtained by dynamic programming, reducing repetitive computation. Third, we use a coarseto- fine framework, where a lower resolution subsampled image is first colorized and this low resolution color image is upsampled to initialize the colorization process for the fine level. The improvements we develop provide significant speedup and memory savings compared to the conventional approach of solving the linear system directly using off-the-shelf sparse solvers, and allow us to colorize images with typical sizes encountered in realistic applications on typical commodity computing platforms.
Analysis Tools for CFD Multigrid Solvers
NASA Technical Reports Server (NTRS)
Mineck, Raymond E.; Thomas, James L.; Diskin, Boris
2004-01-01
Analysis tools are needed to guide the development and evaluate the performance of multigrid solvers for the fluid flow equations. Classical analysis tools, such as local mode analysis, often fail to accurately predict performance. Two-grid analysis tools, herein referred to as Idealized Coarse Grid and Idealized Relaxation iterations, have been developed and evaluated within a pilot multigrid solver. These new tools are applicable to general systems of equations and/or discretizations and point to problem areas within an existing multigrid solver. Idealized Relaxation and Idealized Coarse Grid are applied in developing textbook-efficient multigrid solvers for incompressible stagnation flow problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aliaga, José I., E-mail: aliaga@uji.es; Alonso, Pedro; Badía, José M.
We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousandsmore » degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.« less
Numerical simulations of microwave heating of liquids: enhancements using Krylov subspace methods
NASA Astrophysics Data System (ADS)
Lollchund, M. R.; Dookhitram, K.; Sunhaloo, M. S.; Boojhawon, R.
2013-04-01
In this paper, we compare the performances of three iterative solvers for large sparse linear systems arising in the numerical computations of incompressible Navier-Stokes (NS) equations. These equations are employed mainly in the simulation of microwave heating of liquids. The emphasis of this work is on the application of Krylov projection techniques such as Generalized Minimal Residual (GMRES) to solve the Pressure Poisson Equations that result from discretisation of the NS equations. The performance of the GMRES method is compared with the traditional Gauss-Seidel (GS) and point successive over relaxation (PSOR) techniques through their application to simulate the dynamics of water housed inside a vertical cylindrical vessel which is subjected to microwave radiation. It is found that as the mesh size increases, GMRES gives the fastest convergence rate in terms of computational times and number of iterations.
An accurate, fast, and scalable solver for high-frequency wave propagation
NASA Astrophysics Data System (ADS)
Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.
2017-12-01
In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
Input-output-controlled nonlinear equation solvers
NASA Technical Reports Server (NTRS)
Padovan, Joseph
1988-01-01
To upgrade the efficiency and stability of the successive substitution (SS) and Newton-Raphson (NR) schemes, the concept of input-output-controlled solvers (IOCS) is introduced. By employing the formal properties of the constrained version of the SS and NR schemes, the IOCS algorithm can handle indefiniteness of the system Jacobian, can maintain iterate monotonicity, and provide for separate control of load incrementation and iterate excursions, as well as having other features. To illustrate the algorithmic properties, the results for several benchmark examples are presented. These define the associated numerical efficiency and stability of the IOCS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stimpson, Shane; Collins, Benjamin; Kochunas, Brendan
The MPACT code, being developed collaboratively by the University of Michigan and Oak Ridge National Laboratory, is the primary deterministic neutron transport solver being deployed within the Virtual Environment for Reactor Applications (VERA) as part of the Consortium for Advanced Simulation of Light Water Reactors (CASL). In many applications of the MPACT code, transport-corrected scattering has proven to be an obstacle in terms of stability, and considerable effort has been made to try to resolve the convergence issues that arise from it. Most of the convergence problems seem related to the transport-corrected cross sections, particularly when used in the 2Dmore » method of characteristics (MOC) solver, which is the focus of this work. Here in this paper, the stability and performance of the 2-D MOC solver in MPACT is evaluated for two iteration schemes: Gauss-Seidel and Jacobi. With the Gauss-Seidel approach, as the MOC solver loops over groups, it uses the flux solution from the previous group to construct the inscatter source for the next group. Alternatively, the Jacobi approach uses only the fluxes from the previous outer iteration to determine the inscatter source for each group. Consequently for the Jacobi iteration, the loop over groups can be moved from the outermost loop$-$as is the case with the Gauss-Seidel sweeper$-$to the innermost loop, allowing for a substantial increase in efficiency by minimizing the overhead of retrieving segment, region, and surface index information from the ray tracing data. Several test problems are assessed: (1) Babcock & Wilcox 1810 Core I, (2) Dimple S01A-Sq, (3) VERA Progression Problem 5a, and (4) VERA Problem 2a. The Jacobi iteration exhibits better stability than Gauss-Seidel, allowing for converged solutions to be obtained over a much wider range of iteration control parameters. Additionally, the MOC solve time with the Jacobi approach is roughly 2.0-2.5× faster per sweep. While the performance and stability of the Jacobi iteration are substantially improved compared to the Gauss-Seidel iteration, it does yield a roughly 8$-$10% increase in the overall memory requirement.« less
Stimpson, Shane; Collins, Benjamin; Kochunas, Brendan
2017-03-10
The MPACT code, being developed collaboratively by the University of Michigan and Oak Ridge National Laboratory, is the primary deterministic neutron transport solver being deployed within the Virtual Environment for Reactor Applications (VERA) as part of the Consortium for Advanced Simulation of Light Water Reactors (CASL). In many applications of the MPACT code, transport-corrected scattering has proven to be an obstacle in terms of stability, and considerable effort has been made to try to resolve the convergence issues that arise from it. Most of the convergence problems seem related to the transport-corrected cross sections, particularly when used in the 2Dmore » method of characteristics (MOC) solver, which is the focus of this work. Here in this paper, the stability and performance of the 2-D MOC solver in MPACT is evaluated for two iteration schemes: Gauss-Seidel and Jacobi. With the Gauss-Seidel approach, as the MOC solver loops over groups, it uses the flux solution from the previous group to construct the inscatter source for the next group. Alternatively, the Jacobi approach uses only the fluxes from the previous outer iteration to determine the inscatter source for each group. Consequently for the Jacobi iteration, the loop over groups can be moved from the outermost loop$-$as is the case with the Gauss-Seidel sweeper$-$to the innermost loop, allowing for a substantial increase in efficiency by minimizing the overhead of retrieving segment, region, and surface index information from the ray tracing data. Several test problems are assessed: (1) Babcock & Wilcox 1810 Core I, (2) Dimple S01A-Sq, (3) VERA Progression Problem 5a, and (4) VERA Problem 2a. The Jacobi iteration exhibits better stability than Gauss-Seidel, allowing for converged solutions to be obtained over a much wider range of iteration control parameters. Additionally, the MOC solve time with the Jacobi approach is roughly 2.0-2.5× faster per sweep. While the performance and stability of the Jacobi iteration are substantially improved compared to the Gauss-Seidel iteration, it does yield a roughly 8$-$10% increase in the overall memory requirement.« less
NASA Astrophysics Data System (ADS)
Southworth, Benjamin Scott
PART I: One of the most fascinating questions to humans has long been whether life exists outside of our planet. To our knowledge, water is a fundamental building block of life, which makes liquid water on other bodies in the universe a topic of great interest. In fact, there are large bodies of water right here in our solar system, underneath the icy crust of moons around Saturn and Jupiter. The NASA-ESA Cassini Mission spent two decades studying the Saturnian system. One of the many exciting discoveries was a "plume" on the south pole of Enceladus, emitting hundreds of kg/s of water vapor and frozen water-ice particles from Enceladus' subsurface ocean. It has since been determined that Enceladus likely has a global liquid water ocean separating its rocky core from icy surface, with conditions that are relatively favorable to support life. The plume is of particular interest because it gives direct access to ocean particles from space, by flying through the plume. Recently, evidence has been found for similar geological activity occurring on Jupiter's moon Europa, long considered one of the most likely candidate bodies to support life in our solar system. Here, a model for plume-particle dynamics is developed based on studies of the Enceladus plume and data from the Cassini Cosmic Dust Analyzer. A C++, OpenMP/MPI parallel software package is then built to run large scale simulations of dust plumes on planetary satellites. In the case of Enceladus, data from simulations and the Cassini mission provide insight into the structure of emissions on the surface, the total mass production of the plume, and the distribution of particles being emitted. Each of these are fundamental to understanding the plume and, for Europa and Enceladus, simulation data provide important results for the planning of future missions to these icy moons. In particular, this work has contributed to the Europa Clipper mission and proposed Enceladus Life Finder. PART II: Solving large, sparse linear systems arises often in the modeling of biological and physical phenomenon, data analysis through graphs and networks, and other scientific applications. This work focuses primarily on linear systems resulting from the discretization of partial differential equations (PDEs). Because solving linear systems is the bottleneck of many large simulation codes, there is a rich field of research in developing "fast" solvers, with the ultimate goal being a method that solves an n x n linear system in O(n) operations. One of the most effective classes of solvers is algebraic multigrid (AMG), which is a multilevel iterative method based on projecting the problem into progressively smaller spaces, and scales like O(n) or O(nlog n) for certain classes of problems. The field of AMG is well-developed for symmetric positive definite matrices, and is typically most effective on linear systems resulting from the discretization of scalar elliptic PDEs, such as the heat equation. Systems of PDEs can add additional difficulties, but the underlying linear algebraic theory is consistent and, in many cases, an elliptic system of PDEs can be handled well by AMG with appropriate modifications of the solver. Solving general, nonsymmetric linear systems remains the wild west of AMG (and other fast solvers), lacking significant results in convergence theory as well as robust methods. Here, we develop new theoretical motivation and practical variations of AMG to solve nonsymmetric linear systems, often resulting from the discretization of hyperbolic PDEs. In particular, multilevel convergence of AMG for nonsymmetric systems is proven for the first time. A new nonsymmetric AMG solver is also developed based on an approximate ideal restriction, referred to as AIR, which is able to solve advection-dominated, hyperbolic-type problems that are outside the scope of existing AMG solvers and other fast iterative methods. AIR demonstrates scalable convergence on unstructured meshes, in multiple dimensions, and with high-order finite elements, expanding the applicability of AMG to a new class of problems.
Enhancing Scalability and Efficiency of the TOUGH2_MP for LinuxClusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Wu, Yu-Shu
2006-04-17
TOUGH2{_}MP, the parallel version TOUGH2 code, has been enhanced by implementing more efficient communication schemes. This enhancement is achieved through reducing the amount of small-size messages and the volume of large messages. The message exchange speed is further improved by using non-blocking communications for both linear and nonlinear iterations. In addition, we have modified the AZTEC parallel linear-equation solver to nonblocking communication. Through the improvement of code structuring and bug fixing, the new version code is now more stable, while demonstrating similar or even better nonlinear iteration converging speed than the original TOUGH2 code. As a result, the new versionmore » of TOUGH2{_}MP is improved significantly in its efficiency. In this paper, the scalability and efficiency of the parallel code are demonstrated by solving two large-scale problems. The testing results indicate that speedup of the code may depend on both problem size and complexity. In general, the code has excellent scalability in memory requirement as well as computing time.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Shadid, John N.; Tsuji, Paul H.
Here, this study explores the performance and scaling of a GMRES Krylov method employed as a smoother for an algebraic multigrid (AMG) preconditioned Newton- Krylov solution approach applied to a fully-implicit variational multiscale (VMS) nite element (FE) resistive magnetohydrodynamics (MHD) formulation. In this context a Newton iteration is used for the nonlinear system and a Krylov (GMRES) method is employed for the linear subsystems. The efficiency of this approach is critically dependent on the scalability and performance of the AMG preconditioner for the linear solutions and the performance of the smoothers play a critical role. Krylov smoothers are considered inmore » an attempt to reduce the time and memory requirements of existing robust smoothers based on additive Schwarz domain decomposition (DD) with incomplete LU factorization solves on each subdomain. Three time dependent resistive MHD test cases are considered to evaluate the method. The results demonstrate that the GMRES smoother can be faster due to a decrease in the preconditioner setup time and a reduction in outer GMRESR solver iterations, and requires less memory (typically 35% less memory for global GMRES smoother) than the DD ILU smoother.« less
Parameter investigation with line-implicit lower-upper symmetric Gauss-Seidel on 3D stretched grids
NASA Astrophysics Data System (ADS)
Otero, Evelyn; Eliasson, Peter
2015-03-01
An implicit lower-upper symmetric Gauss-Seidel (LU-SGS) solver has been implemented as a multigrid smoother combined with a line-implicit method as an acceleration technique for Reynolds-averaged Navier-Stokes (RANS) simulation on stretched meshes. The computational fluid dynamics code concerned is Edge, an edge-based finite volume Navier-Stokes flow solver for structured and unstructured grids. The paper focuses on the investigation of the parameters related to our novel line-implicit LU-SGS solver for convergence acceleration on 3D RANS meshes. The LU-SGS parameters are defined as the Courant-Friedrichs-Lewy number, the left-hand side dissipation, and the convergence of iterative solution of the linear problem arising from the linearisation of the implicit scheme. The influence of these parameters on the overall convergence is presented and default values are defined for maximum convergence acceleration. The optimised settings are applied to 3D RANS computations for comparison with explicit and line-implicit Runge-Kutta smoothing. For most of the cases, a computing time acceleration of the order of 2 is found depending on the mesh type, namely the boundary layer and the magnitude of residual reduction.
Verification of continuum drift kinetic equation solvers in NIMROD
DOE Office of Scientific and Technical Information (OSTI.GOV)
Held, E. D.; Ji, J.-Y.; Kruger, S. E.
Verification of continuum solutions to the electron and ion drift kinetic equations (DKEs) in NIMROD [C. R. Sovinec et al., J. Comp. Phys. 195, 355 (2004)] is demonstrated through comparison with several neoclassical transport codes, most notably NEO [E. A. Belli and J. Candy, Plasma Phys. Controlled Fusion 54, 015015 (2012)]. The DKE solutions use NIMROD's spatial representation, 2D finite-elements in the poloidal plane and a 1D Fourier expansion in toroidal angle. For 2D velocity space, a novel 1D expansion in finite elements is applied for the pitch angle dependence and a collocation grid is used for the normalized speedmore » coordinate. The full, linearized Coulomb collision operator is kept and shown to be important for obtaining quantitative results. Bootstrap currents, parallel ion flows, and radial particle and heat fluxes show quantitative agreement between NIMROD and NEO for a variety of tokamak equilibria. In addition, velocity space distribution function contours for ions and electrons show nearly identical detailed structure and agree quantitatively. A Θ-centered, implicit time discretization and a block-preconditioned, iterative linear algebra solver provide efficient electron and ion DKE solutions that ultimately will be used to obtain closures for NIMROD's evolving fluid model.« less
NASA Technical Reports Server (NTRS)
Jameson, A.
1975-01-01
The use of a fast elliptic solver in combination with relaxation is presented as an effective way to accelerate the convergence of transonic flow calculations, particularly when a marching scheme can be used to treat the supersonic zone in the relaxation process.
NASA Astrophysics Data System (ADS)
Yihaa Roodhiyah, Lisa’; Tjong, Tiffany; Nurhasan; Sutarno, D.
2018-04-01
The late research, linear matrices of vector finite element in two dimensional(2-D) magnetotelluric (MT) responses modeling was solved by non-sparse direct solver in TE mode. Nevertheless, there is some weakness which have to be improved especially accuracy in the low frequency (10-3 Hz-10-5 Hz) which is not achieved yet and high cost computation in dense mesh. In this work, the solver which is used is sparse direct solver instead of non-sparse direct solverto overcome the weaknesses of solving linear matrices of vector finite element metod using non-sparse direct solver. Sparse direct solver will be advantageous in solving linear matrices of vector finite element method because of the matrix properties which is symmetrical and sparse. The validation of sparse direct solver in solving linear matrices of vector finite element has been done for a homogen half-space model and vertical contact model by analytical solution. Thevalidation result of sparse direct solver in solving linear matrices of vector finite element shows that sparse direct solver is more stable than non-sparse direct solver in computing linear problem of vector finite element method especially in low frequency. In the end, the accuracy of 2D MT responses modelling in low frequency (10-3 Hz-10-5 Hz) has been reached out under the efficient allocation memory of array and less computational time consuming.
Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation
NASA Astrophysics Data System (ADS)
Costa, Carlos A. N.; Campos, Itamara S.; Costa, Jessé C.; Neto, Francisco A.; Schleicher, Jörg; Novais, Amélia
2013-08-01
Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality.
Numerical Simulation of Illumination and Thermal Conditions at the Lunar Poles Using LOLA DTMs
NASA Technical Reports Server (NTRS)
Glaser, P.; Glaser, D.; Oberst, J.; Neumann, G. A.; Mazarico, E.; Siegler, M. A.
2017-01-01
We are interested in illumination conditions and the temperature distribution within the upper two meters of regolith near the lunar poles. Here, areas exist receiving almost constant illumination near areas in permanent shadow, which were identified as potential exploration sites for future missions. For our study a numerical simulation of the illumination and thermal environment for lunar near-polar regions is needed. Our study is based on high-resolution, twenty meters per pixel and 400 x 400 km large polar Digital Terrain Models (DTMs), which were derived from Lunar Orbiter Laser Altimeter (LOLA) data. Illumination conditions were simulated by synthetically illuminating the LOLA DTMs using the horizon method considering the Sun as an extended source. We model polar illumination for the central 50 x 50 km subset and use it as an input at each time-step (2 h) to evaluate the heating of the lunar surface and subsequent conduction in the sub-surface. At surface level we balance the incoming insolation with the subsurface conduction and radiation into space, whereas in the sub-surface we consider conduction with an additional constant radiogenic heat source at the bottom of our two-meter layer. Density is modeled as depth-dependent, the specific heat parameter as temperature-dependent and the thermal conductivity as depth- and temperature-dependent. We implemented a fully implicit finite-volume method in space and backward Euler scheme in time to solve the one-dimensional heat equation at each pixel in our 50 x 50 km DTM. Due to the non-linear dependencies of the parameters mentioned above, Newton's method is employed as the non-linear solver together with the Gauss-Seidel method as the iterative linear solver in each Newton iteration. The software is written in OpenCL and runs in parallel on the GPU cores, which allows for fast computation of large areas and long time scales.
YORP torques with 1D thermal model
NASA Astrophysics Data System (ADS)
Breiter, S.; Bartczak, P.; Czekaj, M.
2010-11-01
A numerical model of the Yarkovsky-O'Keefe-Radzievskii-Paddack (YORP) effect for objects defined in terms of a triangular mesh is described. The algorithm requires that each surface triangle can be handled independently, which implies the use of a 1D thermal model. Insolation of each triangle is determined by an optimized ray-triangle intersection search. Surface temperature is modelled with a spectral approach; imposing a quasi-periodic solution we replace heat conduction equation by the Helmholtz equation. Non-linear boundary conditions are handled by an iterative, fast Fourier transform based solver. The results resolve the question of the YORP effect in rotation rate independence on conductivity within the non-linear 1D thermal model regardless of the accuracy issues and homogeneity assumptions. A seasonal YORP effect in attitude is revealed for objects moving on elliptic orbits when a non-linear thermal model is used.
Mang, Andreas; Ruthotto, Lars
2017-01-01
We present an efficient solver for diffeomorphic image registration problems in the framework of Large Deformations Diffeomorphic Metric Mappings (LDDMM). We use an optimal control formulation, in which the velocity field of a hyperbolic PDE needs to be found such that the distance between the final state of the system (the transformed/transported template image) and the observation (the reference image) is minimized. Our solver supports both stationary and non-stationary (i.e., transient or time-dependent) velocity fields. As transformation models, we consider both the transport equation (assuming intensities are preserved during the deformation) and the continuity equation (assuming mass-preservation). We consider the reduced form of the optimal control problem and solve the resulting unconstrained optimization problem using a discretize-then-optimize approach. A key contribution is the elimination of the PDE constraint using a Lagrangian hyperbolic PDE solver. Lagrangian methods rely on the concept of characteristic curves. We approximate these curves using a fourth-order Runge-Kutta method. We also present an efficient algorithm for computing the derivatives of the final state of the system with respect to the velocity field. This allows us to use fast Gauss-Newton based methods. We present quickly converging iterative linear solvers using spectral preconditioners that render the overall optimization efficient and scalable. Our method is embedded into the image registration framework FAIR and, thus, supports the most commonly used similarity measures and regularization functionals. We demonstrate the potential of our new approach using several synthetic and real world test problems with up to 14.7 million degrees of freedom.
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess
2011-01-01
More efficient versions of an interpolation method, called kriging, have been introduced in order to reduce its traditionally high computational cost. Written in C++, these approaches were tested on both synthetic and real data. Kriging is a best unbiased linear estimator and suitable for interpolation of scattered data points. Kriging has long been used in the geostatistic and mining communities, but is now being researched for use in the image fusion of remotely sensed data. This allows a combination of data from various locations to be used to fill in any missing data from any single location. To arrive at the faster algorithms, sparse SYMMLQ iterative solver, covariance tapering, Fast Multipole Methods (FMM), and nearest neighbor searching techniques were used. These implementations were used when the coefficient matrix in the linear system is symmetric, but not necessarily positive-definite.
Construction, classification and parametrization of complex Hadamard matrices
NASA Astrophysics Data System (ADS)
Szöllősi, Ferenc
To improve the design of nuclear systems, high-fidelity neutron fluxes are required. Leadership-class machines provide platforms on which very large problems can be solved. Computing such fluxes efficiently requires numerical methods with good convergence properties and algorithms that can scale to hundreds of thousands of cores. Many 3-D deterministic transport codes are decomposable in space and angle only, limiting them to tens of thousands of cores. Most codes rely on methods such as Gauss Seidel for fixed source problems and power iteration for eigenvalue problems, which can be slow to converge for challenging problems like those with highly scattering materials or high dominance ratios. Three methods have been added to the 3-D SN transport code Denovo that are designed to improve convergence and enable the full use of cutting-edge computers. The first is a multigroup Krylov solver that converges more quickly than Gauss Seidel and parallelizes the code in energy such that Denovo can use hundreds of thousand of cores effectively. The second is Rayleigh quotient iteration (RQI), an old method applied in a new context. This eigenvalue solver finds the dominant eigenvalue in a mathematically optimal way and should converge in fewer iterations than power iteration. RQI creates energy-block-dense equations that the new Krylov solver treats efficiently. However, RQI can have convergence problems because it creates poorly conditioned systems. This can be overcome with preconditioning. The third method is a multigrid-in-energy preconditioner. The preconditioner takes advantage of the new energy decomposition because the grids are in energy rather than space or angle. The preconditioner greatly reduces iteration count for many problem types and scales well in energy. It also allows RQI to be successful for problems it could not solve otherwise. The methods added to Denovo accomplish the goals of this work. They converge in fewer iterations than traditional methods and enable the use of hundreds of thousands of cores. Each method can be used individually, with the multigroup Krylov solver and multigrid-in-energy preconditioner being particularly successful on their own. The largest benefit, though, comes from using these methods in concert.
Huang, Kuo -Ling; Mehrotra, Sanjay
2016-11-08
We present a homogeneous algorithm equipped with a modified potential function for the monotone complementarity problem. We show that this potential function is reduced by at least a constant amount if a scaled Lipschitz condition (SLC) is satisfied. A practical algorithm based on this potential function is implemented in a software package named iOptimize. The implementation in iOptimize maintains global linear and polynomial time convergence properties, while achieving practical performance. It either successfully solves the problem, or concludes that the SLC is not satisfied. When compared with the mature software package MOSEK (barrier solver version 6.0.0.106), iOptimize solves convex quadraticmore » programming problems, convex quadratically constrained quadratic programming problems, and general convex programming problems in fewer iterations. Moreover, several problems for which MOSEK fails are solved to optimality. In addition, we also find that iOptimize detects infeasibility more reliably than the general nonlinear solvers Ipopt (version 3.9.2) and Knitro (version 8.0).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, Mitchell T.; Johnson, Seth R.; Prokopenko, Andrey V.
With the development of a Fortran Interface to Trilinos, ForTrilinos, modelers using modern Fortran will beable to provide their codes the capability to use solvers and other capabilities on exascale machines via astraightforward infrastructure that accesses Trilinos. This document outlines what Fortrilinos does andexplains briefly how it works. We show it provides a general access to packages via an entry point and usesan xml file from fortran code. With the first release, ForTrilinos will enable Teuchos to take xml parameterlists from Fortran code and set up data structures. It will provide access to linear solvers and eigensolvers.Several examples are providedmore » to illustrate the capabilities in practice. We explain what the user shouldhave already with their code and what Trilinos provides and returns to the Fortran code. We provideinformation about the build process for ForTrilinos, with a practical example. In future releases, nonlinearsolvers, time iteration, advanced preconditioning techniques, and inversion of control (IoC), to enablecallbacks to Fortran routines, will be available.« less
Ringe, Stefan; Oberhofer, Harald; Hille, Christoph; Matera, Sebastian; Reuter, Karsten
2016-08-09
The size-modified Poisson-Boltzmann (MPB) equation is an efficient implicit solvation model which also captures electrolytic solvent effects. It combines an account of the dielectric solvent response with a mean-field description of solvated finite-sized ions. We present a general solution scheme for the MPB equation based on a fast function-space-oriented Newton method and a Green's function preconditioned iterative linear solver. In contrast to popular multigrid solvers, this approach allows us to fully exploit specialized integration grids and optimized integration schemes. We describe a corresponding numerically efficient implementation for the full-potential density-functional theory (DFT) code FHI-aims. We show that together with an additional Stern layer correction the DFT+MPB approach can describe the mean activity coefficient of a KCl aqueous solution over a wide range of concentrations. The high sensitivity of the calculated activity coefficient on the employed ionic parameters thereby suggests to use extensively tabulated experimental activity coefficients of salt solutions for a systematic parametrization protocol.
A Comparison of Solver Performance for Complex Gastric Electrophysiology Models
Sathar, Shameer; Cheng, Leo K.; Trew, Mark L.
2016-01-01
Computational techniques for solving systems of equations arising in gastric electrophysiology have not been studied for efficient solution process. We present a computationally challenging problem of simulating gastric electrophysiology in anatomically realistic stomach geometries with multiple intracellular and extracellular domains. The multiscale nature of the problem and mesh resolution required to capture geometric and functional features necessitates efficient solution methods if the problem is to be tractable. In this study, we investigated and compared several parallel preconditioners for the linear systems arising from tetrahedral discretisation of electrically isotropic and anisotropic problems, with and without stimuli. The results showed that the isotropic problem was computationally less challenging than the anisotropic problem and that the application of extracellular stimuli increased workload considerably. Preconditioning based on block Jacobi and algebraic multigrid solvers were found to have the best overall solution times and least iteration counts, respectively. The algebraic multigrid preconditioner would be expected to perform better on large problems. PMID:26736543
NASA Astrophysics Data System (ADS)
Li, Meng; Gu, Xian-Ming; Huang, Chengming; Fei, Mingfa; Zhang, Guoyu
2018-04-01
In this paper, a fast linearized conservative finite element method is studied for solving the strongly coupled nonlinear fractional Schrödinger equations. We prove that the scheme preserves both the mass and energy, which are defined by virtue of some recursion relationships. Using the Sobolev inequalities and then employing the mathematical induction, the discrete scheme is proved to be unconditionally convergent in the sense of L2-norm and H α / 2-norm, which means that there are no any constraints on the grid ratios. Then, the prior bound of the discrete solution in L2-norm and L∞-norm are also obtained. Moreover, we propose an iterative algorithm, by which the coefficient matrix is independent of the time level, and thus it leads to Toeplitz-like linear systems that can be efficiently solved by Krylov subspace solvers with circulant preconditioners. This method can reduce the memory requirement of the proposed linearized finite element scheme from O (M2) to O (M) and the computational complexity from O (M3) to O (Mlog M) in each iterative step, where M is the number of grid nodes. Finally, numerical results are carried out to verify the correction of the theoretical analysis, simulate the collision of two solitary waves, and show the utility of the fast numerical solution techniques.
NASA Astrophysics Data System (ADS)
Huang, Xiaomeng; Tang, Qiang; Tseng, Yuheng; Hu, Yong; Baker, Allison H.; Bryan, Frank O.; Dennis, John; Fu, Haohuan; Yang, Guangwen
2016-11-01
In the Community Earth System Model (CESM), the ocean model is computationally expensive for high-resolution grids and is often the least scalable component for high-resolution production experiments. The major bottleneck is that the barotropic solver scales poorly at high core counts. We design a new barotropic solver to accelerate the high-resolution ocean simulation. The novel solver adopts a Chebyshev-type iterative method to reduce the global communication cost in conjunction with an effective block preconditioner to further reduce the iterations. The algorithm and its computational complexity are theoretically analyzed and compared with other existing methods. We confirm the significant reduction of the global communication time with a competitive convergence rate using a series of idealized tests. Numerical experiments using the CESM 0.1° global ocean model show that the proposed approach results in a factor of 1.7 speed-up over the original method with no loss of accuracy, achieving 10.5 simulated years per wall-clock day on 16 875 cores.
An installed nacelle design code using a multiblock Euler solver. Volume 1: Theory document
NASA Technical Reports Server (NTRS)
Chen, H. C.
1992-01-01
An efficient multiblock Euler design code was developed for designing a nacelle installed on geometrically complex airplane configurations. This approach employed a design driver based on a direct iterative surface curvature method developed at LaRC. A general multiblock Euler flow solver was used for computing flow around complex geometries. The flow solver used a finite-volume formulation with explicit time-stepping to solve the Euler Equations. It used a multiblock version of the multigrid method to accelerate the convergence of the calculations. The design driver successively updated the surface geometry to reduce the difference between the computed and target pressure distributions. In the flow solver, the change in surface geometry was simulated by applying surface transpiration boundary conditions to avoid repeated grid generation during design iterations. Smoothness of the designed surface was ensured by alternate application of streamwise and circumferential smoothings. The capability and efficiency of the code was demonstrated through the design of both an isolated nacelle and an installed nacelle at various flow conditions. Information on the execution of the computer program is provided in volume 2.
An incremental strategy for calculating consistent discrete CFD sensitivity derivatives
NASA Technical Reports Server (NTRS)
Korivi, Vamshi Mohan; Taylor, Arthur C., III; Newman, Perry A.; Hou, Gene W.; Jones, Henry E.
1992-01-01
In this preliminary study involving advanced computational fluid dynamic (CFD) codes, an incremental formulation, also known as the 'delta' or 'correction' form, is presented for solving the very large sparse systems of linear equations which are associated with aerodynamic sensitivity analysis. For typical problems in 2D, a direct solution method can be applied to these linear equations which are associated with aerodynamic sensitivity analysis. For typical problems in 2D, a direct solution method can be applied to these linear equations in either the standard or the incremental form, in which case the two are equivalent. Iterative methods appear to be needed for future 3D applications; however, because direct solver methods require much more computer memory than is currently available. Iterative methods for solving these equations in the standard form result in certain difficulties, such as ill-conditioning of the coefficient matrix, which can be overcome when these equations are cast in the incremental form; these and other benefits are discussed. The methodology is successfully implemented and tested in 2D using an upwind, cell-centered, finite volume formulation applied to the thin-layer Navier-Stokes equations. Results are presented for two laminar sample problems: (1) transonic flow through a double-throat nozzle; and (2) flow over an isolated airfoil.
Studies of numerical algorithms for gyrokinetics and the effects of shaping on plasma turbulence
NASA Astrophysics Data System (ADS)
Belli, Emily Ann
Advanced numerical algorithms for gyrokinetic simulations are explored for more effective studies of plasma turbulent transport. The gyrokinetic equations describe the dynamics of particles in 5-dimensional phase space, averaging over the fast gyromotion, and provide a foundation for studying plasma microturbulence in fusion devices and in astrophysical plasmas. Several algorithms for Eulerian/continuum gyrokinetic solvers are compared. An iterative implicit scheme based on numerical approximations of the plasma response is developed. This method reduces the long time needed to set-up implicit arrays, yet still has larger time step advantages similar to a fully implicit method. Various model preconditioners and iteration schemes, including Krylov-based solvers, are explored. An Alternating Direction Implicit algorithm is also studied and is surprisingly found to yield a severe stability restriction on the time step. Overall, an iterative Krylov algorithm might be the best approach for extensions of core tokamak gyrokinetic simulations to edge kinetic formulations and may be particularly useful for studies of large-scale ExB shear effects. The effects of flux surface shape on the gyrokinetic stability and transport of tokamak plasmas are studied using the nonlinear GS2 gyrokinetic code with analytic equilibria based on interpolations of representative JET-like shapes. High shaping is found to be a stabilizing influence on both the linear ITG instability and nonlinear ITG turbulence. A scaling of the heat flux with elongation of chi ˜ kappa-1.5 or kappa-2 (depending on the triangularity) is observed, which is consistent with previous gyrofluid simulations. Thus, the GS2 turbulence simulations are explaining a significant fraction, but not all, of the empirical elongation scaling. The remainder of the scaling may come from (1) the edge boundary conditions for core turbulence, and (2) the larger Dimits nonlinear critical temperature gradient shift due to the enhancement of zonal flows with shaping, which is observed with the GS2 simulations. Finally, a local linear trial function-based gyrokinetic code is developed to aid in fast scoping studies of gyrokinetic linear stability. This code is successfully benchmarked with the full GS2 code in the collisionless, electrostatic limit, as well as in the more general electromagnetic description with higher-order Hermite basis functions.
NASA Astrophysics Data System (ADS)
Arndt, S.; Merkel, P.; Monticello, D. A.; Reiman, A. H.
1999-04-01
Fixed- and free-boundary equilibria for Wendelstein 7-X (W7-X) [W. Lotz et al., Plasma Physics and Controlled Nuclear Fusion Research 1990 (Proc. 13th Int. Conf. Washington, DC, 1990), (International Atomic Energy Agency, Vienna, 1991), Vol. 2, p. 603] configurations are calculated using the Princeton Iterative Equilibrium Solver (PIES) [A. H. Reiman et al., Comput. Phys. Commun., 43, 157 (1986)] to deal with magnetic islands and stochastic regions. Usually, these W7-X configurations require a large number of iterations for PIES convergence. Here, two methods have been successfully tested in an attempt to decrease the number of iterations needed for convergence. First, periodic sequences of different blending parameters are used. Second, the initial guess is vastly improved by using results of the Variational Moments Equilibrium Code (VMEC) [S. P. Hirshmann et al., Phys. Fluids 26, 3553 (1983)]. Use of these two methods have allowed verification of the Hamada condition and tendency of "self-healing" of islands has been observed.
Brown, A M
2001-06-01
The objective of this present study was to introduce a simple, easily understood method for carrying out non-linear regression analysis based on user input functions. While it is relatively straightforward to fit data with simple functions such as linear or logarithmic functions, fitting data with more complicated non-linear functions is more difficult. Commercial specialist programmes are available that will carry out this analysis, but these programmes are expensive and are not intuitive to learn. An alternative method described here is to use the SOLVER function of the ubiquitous spreadsheet programme Microsoft Excel, which employs an iterative least squares fitting routine to produce the optimal goodness of fit between data and function. The intent of this paper is to lead the reader through an easily understood step-by-step guide to implementing this method, which can be applied to any function in the form y=f(x), and is well suited to fast, reliable analysis of data in all fields of biology.
Preconditioned augmented Lagrangian formulation for nearly incompressible cardiac mechanics.
Campos, Joventino Oliveira; Dos Santos, Rodrigo Weber; Sundnes, Joakim; Rocha, Bernardo Martins
2018-04-01
Computational modeling of the heart is a subject of substantial medical and scientific interest, which may contribute to increase the understanding of several phenomena associated with cardiac physiological and pathological states. Modeling the mechanics of the heart have led to considerable insights, but it still represents a complex and a demanding computational problem, especially in a strongly coupled electromechanical setting. Passive cardiac tissue is commonly modeled as hyperelastic and is characterized by quasi-incompressible, orthotropic, and nonlinear material behavior. These factors are known to be very challenging for the numerical solution of the model. The near-incompressibility is known to cause numerical issues such as the well-known locking phenomenon and ill-conditioning of the stiffness matrix. In this work, the augmented Lagrangian method is used to handle the nearly incompressible condition. This approach can potentially improve computational performance by reducing the condition number of the stiffness matrix and thereby improving the convergence of iterative solvers. We also improve the performance of iterative solvers by the use of an algebraic multigrid preconditioner. Numerical results of the augmented Lagrangian method combined with a preconditioned iterative solver for a cardiac mechanics benchmark suite are presented to show its improved performance. Copyright © 2017 John Wiley & Sons, Ltd.
Performance evaluation of OpenFOAM on many-core architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brzobohatý, Tomáš; Říha, Lubomír; Karásek, Tomáš, E-mail: tomas.karasek@vsb.cz
In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented.
Implicit solvers for unstructured meshes
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.; Mavriplis, Dimitri J.
1991-01-01
Implicit methods for unstructured mesh computations are developed and tested. The approximate system which arises from the Newton-linearization of the nonlinear evolution operator is solved by using the preconditioned generalized minimum residual technique. These different preconditioners are investigated: the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over-relaxation (SSOR). The preconditioners have been optimized to have good vectorization properties. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also investigated. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
From 2D to 3D modelling in long term tectonics: Modelling challenges and HPC solutions (Invited)
NASA Astrophysics Data System (ADS)
Le Pourhiet, L.; May, D.
2013-12-01
Over the last decades, 3D thermo-mechanical codes have been made available to the long term tectonics community either as open source (Underworld, Gale) or more limited access (Fantom, Elvis3D, Douar, LaMem etc ...). However, to date, few published results using these methods have included the coupling between crustal and lithospheric dynamics at large strain. The fact that these computations are computational expensive is not the primary reason for the relatively slow development of 3D modeling in the long term tectonics community, as compare to the rapid development observed within the mantle dynamic community, or in the short-term tectonics field. Long term tectonics problems have specific issues not found in either of these two field, including; large strain (not an issue for short-term), the inclusion of free surface and the occurence of large viscosity contrasts. The first issue is typically eliminated using a combined marker-ALE method instead of fully lagrangian method, however, the marker-ALE approach can pose some algorithmic challenges in a massively parallel environment. The two last issues are more problematic because they affect the convergence of the linear/non-linear solver and the memory cost. Two options have been tested so far, using low order element and solving with a sparse direct solver, or using higher order stable elements together with a multi-grid solver. The first options, is simpler to code and to use but reaches its limit at around 80^3 low order elements. The second option requires more operations but allows using iterative solver on extremely large computers. In this presentation, I will describe the design philosophy and highlight results obtained using a code from the second-class method. The presentation will be oriented from an end-user point of view, using an application from 3D continental break up to illustrate key concepts. The description will proceed point by point from implementing physics into the code, to dealing with specific issues related to solving the discrete system of non linear equations.
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue
NASA Astrophysics Data System (ADS)
Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.
2017-11-01
The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
NASA Astrophysics Data System (ADS)
Debreu, Laurent; Neveu, Emilie; Simon, Ehouarn; Le Dimet, Francois Xavier; Vidard, Arthur
2014-05-01
In order to lower the computational cost of the variational data assimilation process, we investigate the use of multigrid methods to solve the associated optimal control system. On a linear advection equation, we study the impact of the regularization term on the optimal control and the impact of discretization errors on the efficiency of the coarse grid correction step. We show that even if the optimal control problem leads to the solution of an elliptic system, numerical errors introduced by the discretization can alter the success of the multigrid methods. The view of the multigrid iteration as a preconditioner for a Krylov optimization method leads to a more robust algorithm. A scale dependent weighting of the multigrid preconditioner and the usual background error covariance matrix based preconditioner is proposed and brings significant improvements. [1] Laurent Debreu, Emilie Neveu, Ehouarn Simon, François-Xavier Le Dimet and Arthur Vidard, 2014: Multigrid solvers and multigrid preconditioners for the solution of variational data assimilation problems, submitted to QJRMS, http://hal.inria.fr/hal-00874643 [2] Emilie Neveu, Laurent Debreu and François-Xavier Le Dimet, 2011: Multigrid methods and data assimilation - Convergence study and first experiments on non-linear equations, ARIMA, 14, 63-80, http://intranet.inria.fr/international/arima/014/014005.html
Efficient convolutional sparse coding
Wohlberg, Brendt
2017-06-20
Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
NASA Astrophysics Data System (ADS)
Reimer, Ashton S.; Cheviakov, Alexei F.
2013-03-01
A Matlab-based finite-difference numerical solver for the Poisson equation for a rectangle and a disk in two dimensions, and a spherical domain in three dimensions, is presented. The solver is optimized for handling an arbitrary combination of Dirichlet and Neumann boundary conditions, and allows for full user control of mesh refinement. The solver routines utilize effective and parallelized sparse vector and matrix operations. Computations exhibit high speeds, numerical stability with respect to mesh size and mesh refinement, and acceptable error values even on desktop computers. Catalogue identifier: AENQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AENQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License v3.0 No. of lines in distributed program, including test data, etc.: 102793 No. of bytes in distributed program, including test data, etc.: 369378 Distribution format: tar.gz Programming language: Matlab 2010a. Computer: PC, Macintosh. Operating system: Windows, OSX, Linux. RAM: 8 GB (8, 589, 934, 592 bytes) Classification: 4.3. Nature of problem: To solve the Poisson problem in a standard domain with “patchy surface”-type (strongly heterogeneous) Neumann/Dirichlet boundary conditions. Solution method: Finite difference with mesh refinement. Restrictions: Spherical domain in 3D; rectangular domain or a disk in 2D. Unusual features: Choice between mldivide/iterative solver for the solution of large system of linear algebraic equations that arise. Full user control of Neumann/Dirichlet boundary conditions and mesh refinement. Running time: Depending on the number of points taken and the geometry of the domain, the routine may take from less than a second to several hours to execute.
Accelerating Subsurface Transport Simulation on Heterogeneous Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Gawande, Nitin A.; Tumeo, Antonino
Reactive transport numerical models simulate chemical and microbiological reactions that occur along a flowpath. These models have to compute reactions for a large number of locations. They solve the set of ordinary differential equations (ODEs) that describes the reaction for each location through the Newton-Raphson technique. This technique involves computing a Jacobian matrix and a residual vector for each set of equation, and then solving iteratively the linearized system by performing Gaussian Elimination and LU decomposition until convergence. STOMP, a well known subsurface flow simulation tool, employs matrices with sizes in the order of 100x100 elements and, for numerical accuracy,more » LU factorization with full pivoting instead of the faster partial pivoting. Modern high performance computing systems are heterogeneous machines whose nodes integrate both CPUs and GPUs, exposing unprecedented amounts of parallelism. To exploit all their computational power, applications must use both the types of processing elements. For the case of subsurface flow simulation, this mainly requires implementing efficient batched LU-based solvers and identifying efficient solutions for enabling load balancing among the different processors of the system. In this paper we discuss two approaches that allows scaling STOMP's performance on heterogeneous clusters. We initially identify the challenges in implementing batched LU-based solvers for small matrices on GPUs, and propose an implementation that fulfills STOMP's requirements. We compare this implementation to other existing solutions. Then, we combine the batched GPU solver with an OpenMP-based CPU solver, and present an adaptive load balancer that dynamically distributes the linear systems to solve between the two components inside a node. We show how these approaches, integrated into the full application, provide speed ups from 6 to 7 times on large problems, executed on up to 16 nodes of a cluster with two AMD Opteron 6272 and a Tesla M2090 per node.« less
NASA Astrophysics Data System (ADS)
Macomber, B.; Woollands, R. M.; Probe, A.; Younes, A.; Bai, X.; Junkins, J.
2013-09-01
Modified Chebyshev Picard Iteration (MCPI) is an iterative numerical method for approximating solutions of linear or non-linear Ordinary Differential Equations (ODEs) to obtain time histories of system state trajectories. Unlike other step-by-step differential equation solvers, the Runge-Kutta family of numerical integrators for example, MCPI approximates long arcs of the state trajectory with an iterative path approximation approach, and is ideally suited to parallel computation. Orthogonal Chebyshev Polynomials are used as basis functions during each path iteration; the integrations of the Picard iteration are then done analytically. Due to the orthogonality of the Chebyshev basis functions, the least square approximations are computed without matrix inversion; the coefficients are computed robustly from discrete inner products. As a consequence of discrete sampling and weighting adopted for the inner product definition, Runge phenomena errors are minimized near the ends of the approximation intervals. The MCPI algorithm utilizes a vector-matrix framework for computational efficiency. Additionally, all Chebyshev coefficients and integrand function evaluations are independent, meaning they can be simultaneously computed in parallel for further decreased computational cost. Over an order of magnitude speedup from traditional methods is achieved in serial processing, and an additional order of magnitude is achievable in parallel architectures. This paper presents a new MCPI library, a modular toolset designed to allow MCPI to be easily applied to a wide variety of ODE systems. Library users will not have to concern themselves with the underlying mathematics behind the MCPI method. Inputs are the boundary conditions of the dynamical system, the integrand function governing system behavior, and the desired time interval of integration, and the output is a time history of the system states over the interval of interest. Examples from the field of astrodynamics are presented to compare the output from the MCPI library to current state-of-practice numerical integration methods. It is shown that MCPI is capable of out-performing the state-of-practice in terms of computational cost and accuracy.
Effects of high-frequency damping on iterative convergence of implicit viscous solver
NASA Astrophysics Data System (ADS)
Nishikawa, Hiroaki; Nakashima, Yoshitaka; Watanabe, Norihiko
2017-11-01
This paper discusses effects of high-frequency damping on iterative convergence of an implicit defect-correction solver for viscous problems. The study targets a finite-volume discretization with a one parameter family of damped viscous schemes. The parameter α controls high-frequency damping: zero damping with α = 0, and larger damping for larger α (> 0). Convergence rates are predicted for a model diffusion equation by a Fourier analysis over a practical range of α. It is shown that the convergence rate attains its minimum at α = 1 on regular quadrilateral grids, and deteriorates for larger values of α. A similar behavior is observed for regular triangular grids. In both quadrilateral and triangular grids, the solver is predicted to diverge for α smaller than approximately 0.5. Numerical results are shown for the diffusion equation and the Navier-Stokes equations on regular and irregular grids. The study suggests that α = 1 and 4/3 are suitable values for robust and efficient computations, and α = 4 / 3 is recommended for the diffusion equation, which achieves higher-order accuracy on regular quadrilateral grids. Finally, a Jacobian-Free Newton-Krylov solver with the implicit solver (a low-order Jacobian approximately inverted by a multi-color Gauss-Seidel relaxation scheme) used as a variable preconditioner is recommended for practical computations, which provides robust and efficient convergence for a wide range of α.
NASA Astrophysics Data System (ADS)
Guda, A. A.; Guda, S. A.; Soldatov, M. A.; Lomachenko, K. A.; Bugaev, A. L.; Lamberti, C.; Gawelda, W.; Bressler, C.; Smolentsev, G.; Soldatov, A. V.; Joly, Y.
2016-05-01
Finite difference method (FDM) implemented in the FDMNES software [Phys. Rev. B, 2001, 63, 125120] was revised. Thorough analysis shows, that the calculated diagonal in the FDM matrix consists of about 96% zero elements. Thus a sparse solver would be more suitable for the problem instead of traditional Gaussian elimination for the diagonal neighbourhood. We have tried several iterative sparse solvers and the direct one MUMPS solver with METIS ordering turned out to be the best. Compared to the Gaussian solver present method is up to 40 times faster and allows XANES simulations for complex systems already on personal computers. We show applicability of the software for metal-organic [Fe(bpy)3]2+ complex both for low spin and high spin states populated after laser excitation.
Large-scale 3-D EM modelling with a Block Low-Rank multifrontal direct solver
NASA Astrophysics Data System (ADS)
Shantsev, Daniil V.; Jaysaval, Piyoosh; de la Kethulle de Ryhove, Sébastien; Amestoy, Patrick R.; Buttari, Alfredo; L'Excellent, Jean-Yves; Mary, Theo
2017-06-01
We put forward the idea of using a Block Low-Rank (BLR) multifrontal direct solver to efficiently solve the linear systems of equations arising from a finite-difference discretization of the frequency-domain Maxwell equations for 3-D electromagnetic (EM) problems. The solver uses a low-rank representation for the off-diagonal blocks of the intermediate dense matrices arising in the multifrontal method to reduce the computational load. A numerical threshold, the so-called BLR threshold, controlling the accuracy of low-rank representations was optimized by balancing errors in the computed EM fields against savings in floating point operations (flops). Simulations were carried out over large-scale 3-D resistivity models representing typical scenarios for marine controlled-source EM surveys, and in particular the SEG SEAM model which contains an irregular salt body. The flop count, size of factor matrices and elapsed run time for matrix factorization are reduced dramatically by using BLR representations and can go down to, respectively, 10, 30 and 40 per cent of their full-rank values for our largest system with N = 20.6 million unknowns. The reductions are almost independent of the number of MPI tasks and threads at least up to 90 × 10 = 900 cores. The BLR savings increase for larger systems, which reduces the factorization flop complexity from O(N2) for the full-rank solver to O(Nm) with m = 1.4-1.6. The BLR savings are significantly larger for deep-water environments that exclude the highly resistive air layer from the computational domain. A study in a scenario where simulations are required at multiple source locations shows that the BLR solver can become competitive in comparison to iterative solvers as an engine for 3-D controlled-source electromagnetic Gauss-Newton inversion that requires forward modelling for a few thousand right-hand sides.
Mathematical and Numerical Aspects of the Adaptive Fast Multipole Poisson-Boltzmann Solver
Zhang, Bo; Lu, Benzhuo; Cheng, Xiaolin; ...
2013-01-01
This paper summarizes the mathematical and numerical theories and computational elements of the adaptive fast multipole Poisson-Boltzmann (AFMPB) solver. We introduce and discuss the following components in order: the Poisson-Boltzmann model, boundary integral equation reformulation, surface mesh generation, the nodepatch discretization approach, Krylov iterative methods, the new version of fast multipole methods (FMMs), and a dynamic prioritization technique for scheduling parallel operations. For each component, we also remark on feasible approaches for further improvements in efficiency, accuracy and applicability of the AFMPB solver to large-scale long-time molecular dynamics simulations. Lastly, the potential of the solver is demonstrated with preliminary numericalmore » results.« less
Upwind relaxation methods for the Navier-Stokes equations using inner iterations
NASA Technical Reports Server (NTRS)
Taylor, Arthur C., III; Ng, Wing-Fai; Walters, Robert W.
1992-01-01
A subsonic and a supersonic problem are respectively treated by an upwind line-relaxation algorithm for the Navier-Stokes equations using inner iterations to accelerate steady-state solution convergence and thereby minimize CPU time. While the ability of the inner iterative procedure to mimic the quadratic convergence of the direct solver method is attested to in both test problems, some of the nonquadratic inner iterative results are noted to have been more efficient than the quadratic. In the more successful, supersonic test case, inner iteration required only about 65 percent of the line-relaxation method-entailed CPU time.
NASA Astrophysics Data System (ADS)
Chaillat, Stéphanie; Desiderio, Luca; Ciarlet, Patrick
2017-12-01
In this work, we study the accuracy and efficiency of hierarchical matrix (H-matrix) based fast methods for solving dense linear systems arising from the discretization of the 3D elastodynamic Green's tensors. It is well known in the literature that standard H-matrix based methods, although very efficient tools for asymptotically smooth kernels, are not optimal for oscillatory kernels. H2-matrix and directional approaches have been proposed to overcome this problem. However the implementation of such methods is much more involved than the standard H-matrix representation. The central questions we address are twofold. (i) What is the frequency-range in which the H-matrix format is an efficient representation for 3D elastodynamic problems? (ii) What can be expected of such an approach to model problems in mechanical engineering? We show that even though the method is not optimal (in the sense that more involved representations can lead to faster algorithms) an efficient solver can be easily developed. The capabilities of the method are illustrated on numerical examples using the Boundary Element Method.
NASA Astrophysics Data System (ADS)
Weiss, Chester J.
2013-08-01
An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.
Hybrid discrete ordinates and characteristics method for solving the linear Boltzmann equation
NASA Astrophysics Data System (ADS)
Yi, Ce
With the ability of computer hardware and software increasing rapidly, deterministic methods to solve the linear Boltzmann equation (LBE) have attracted some attention for computational applications in both the nuclear engineering and medical physics fields. Among various deterministic methods, the discrete ordinates method (SN) and the method of characteristics (MOC) are two of the most widely used methods. The SN method is the traditional approach to solve the LBE for its stability and efficiency. While the MOC has some advantages in treating complicated geometries. However, in 3-D problems requiring a dense discretization grid in phase space (i.e., a large number of spatial meshes, directions, or energy groups), both methods could suffer from the need for large amounts of memory and computation time. In our study, we developed a new hybrid algorithm by combing the two methods into one code, TITAN. The hybrid approach is specifically designed for application to problems containing low scattering regions. A new serial 3-D time-independent transport code has been developed. Under the hybrid approach, the preferred method can be applied in different regions (blocks) within the same problem model. Since the characteristics method is numerically more efficient in low scattering media, the hybrid approach uses a block-oriented characteristics solver in low scattering regions, and a block-oriented SN solver in the remainder of the physical model. In the TITAN code, a physical problem model is divided into a number of coarse meshes (blocks) in Cartesian geometry. Either the characteristics solver or the SN solver can be chosen to solve the LBE within a coarse mesh. A coarse mesh can be filled with fine meshes or characteristic rays depending on the solver assigned to the coarse mesh. Furthermore, with its object-oriented programming paradigm and layered code structure, TITAN allows different individual spatial meshing schemes and angular quadrature sets for each coarse mesh. Two quadrature types (level-symmetric and Legendre-Chebyshev quadrature) along with the ordinate splitting techniques (rectangular splitting and PN-TN splitting) are implemented. In the S N solver, we apply a memory-efficient 'front-line' style paradigm to handle the fine mesh interface fluxes. In the characteristics solver, we have developed a novel 'backward' ray-tracing approach, in which a bi-linear interpolation procedure is used on the incoming boundaries of a coarse mesh. A CPU-efficient scattering kernel is shared in both solvers within the source iteration scheme. Angular and spatial projection techniques are developed to transfer the angular fluxes on the interfaces of coarse meshes with different discretization grids. The performance of the hybrid algorithm is tested in a number of benchmark problems in both nuclear engineering and medical physics fields. Among them are the Kobayashi benchmark problems and a computational tomography (CT) device model. We also developed an extra sweep procedure with the fictitious quadrature technique to calculate angular fluxes along directions of interest. The technique is applied in a single photon emission computed tomography (SPECT) phantom model to simulate the SPECT projection images. The accuracy and efficiency of the TITAN code are demonstrated in these benchmarks along with its scalability. A modified version of the characteristics solver is integrated in the PENTRAN code and tested within the parallel engine of PENTRAN. The limitations on the hybrid algorithm are also studied.
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
NASA Astrophysics Data System (ADS)
Kifonidis, K.; Müller, E.
2012-08-01
Aims: We describe and study a family of new multigrid iterative solvers for the multidimensional, implicitly discretized equations of hydrodynamics. Schemes of this class are free of the Courant-Friedrichs-Lewy condition. They are intended for simulations in which widely differing wave propagation timescales are present. A preferred solver in this class is identified. Applications to some simple stiff test problems that are governed by the compressible Euler equations, are presented to evaluate the convergence behavior, and the stability properties of this solver. Algorithmic areas are determined where further work is required to make the method sufficiently efficient and robust for future application to difficult astrophysical flow problems. Methods: The basic equations are formulated and discretized on non-orthogonal, structured curvilinear meshes. Roe's approximate Riemann solver and a second-order accurate reconstruction scheme are used for spatial discretization. Implicit Runge-Kutta (ESDIRK) schemes are employed for temporal discretization. The resulting discrete equations are solved with a full-coarsening, non-linear multigrid method. Smoothing is performed with multistage-implicit smoothers. These are applied here to the time-dependent equations by means of dual time stepping. Results: For steady-state problems, our results show that the efficiency of the present approach is comparable to the best implicit solvers for conservative discretizations of the compressible Euler equations that can be found in the literature. The use of red-black as opposed to symmetric Gauss-Seidel iteration in the multistage-smoother is found to have only a minor impact on multigrid convergence. This should enable scalable parallelization without having to seriously compromise the method's algorithmic efficiency. For time-dependent test problems, our results reveal that the multigrid convergence rate degrades with increasing Courant numbers (i.e. time step sizes). Beyond a Courant number of nine thousand, even complete multigrid breakdown is observed. Local Fourier analysis indicates that the degradation of the convergence rate is associated with the coarse-grid correction algorithm. An implicit scheme for the Euler equations that makes use of the present method was, nevertheless, able to outperform a standard explicit scheme on a time-dependent problem with a Courant number of order 1000. Conclusions: For steady-state problems, the described approach enables the construction of parallelizable, efficient, and robust implicit hydrodynamics solvers. The applicability of the method to time-dependent problems is presently restricted to cases with moderately high Courant numbers. This is due to an insufficient coarse-grid correction of the employed multigrid algorithm for large time steps. Further research will be required to help us to understand and overcome the observed multigrid convergence difficulties for time-dependent problems.
Solving Upwind-Biased Discretizations: Defect-Correction Iterations
NASA Technical Reports Server (NTRS)
Diskin, Boris; Thomas, James L.
1999-01-01
This paper considers defect-correction solvers for a second order upwind-biased discretization of the 2D convection equation. The following important features are reported: (1) The asymptotic convergence rate is about 0.5 per defect-correction iteration. (2) If the operators involved in defect-correction iterations have different approximation order, then the initial convergence rates may be very slow. The number of iterations required to get into the asymptotic convergence regime might grow on fine grids as a negative power of h. In the case of a second order target operator and a first order driver operator, this number of iterations is roughly proportional to h-1/3. (3) If both the operators have the second approximation order, the defect-correction solver demonstrates the asymptotic convergence rate after three iterations at most. The same three iterations are required to converge algebraic error below the truncation error level. A novel comprehensive half-space Fourier mode analysis (which, by the way, can take into account the influence of discretized outflow boundary conditions as well) for the defect-correction method is developed. This analysis explains many phenomena observed in solving non-elliptic equations and provides a close prediction of the actual solution behavior. It predicts the convergence rate for each iteration and the asymptotic convergence rate. As a result of this analysis, a new very efficient adaptive multigrid algorithm solving the discrete problem to within a given accuracy is proposed. Numerical simulations confirm the accuracy of the analysis and the efficiency of the proposed algorithm. The results of the numerical tests are reported.
Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness
NASA Astrophysics Data System (ADS)
Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.
2018-03-01
This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational Awareness computer cluster at the LASR Lab, Texas A&M University. We demonstrate the power of our tool by solving a highly parallel example problem, that is the generation of extremal field maps for optimal spacecraft rendezvous (and eventual orbit debris removal). In addition we demonstrate the need for including perturbative effects in simulations for satellite tracking or data association. The unified Lambert tool is ideal for but not limited to space situational awareness applications.
Convergence Speed of a Dynamical System for Sparse Recovery
NASA Astrophysics Data System (ADS)
Balavoine, Aurele; Rozell, Christopher J.; Romberg, Justin
2013-09-01
This paper studies the convergence rate of a continuous-time dynamical system for L1-minimization, known as the Locally Competitive Algorithm (LCA). Solving L1-minimization} problems efficiently and rapidly is of great interest to the signal processing community, as these programs have been shown to recover sparse solutions to underdetermined systems of linear equations and come with strong performance guarantees. The LCA under study differs from the typical L1 solver in that it operates in continuous time: instead of being specified by discrete iterations, it evolves according to a system of nonlinear ordinary differential equations. The LCA is constructed from simple components, giving it the potential to be implemented as a large-scale analog circuit. The goal of this paper is to give guarantees on the convergence time of the LCA system. To do so, we analyze how the LCA evolves as it is recovering a sparse signal from underdetermined measurements. We show that under appropriate conditions on the measurement matrix and the problem parameters, the path the LCA follows can be described as a sequence of linear differential equations, each with a small number of active variables. This allows us to relate the convergence time of the system to the restricted isometry constant of the matrix. Interesting parallels to sparse-recovery digital solvers emerge from this study. Our analysis covers both the noisy and noiseless settings and is supported by simulation results.
The semi-discrete Galerkin finite element modelling of compressible viscous flow past an airfoil
NASA Technical Reports Server (NTRS)
Meade, Andrew J., Jr.
1992-01-01
A method is developed to solve the two-dimensional, steady, compressible, turbulent boundary-layer equations and is coupled to an existing Euler solver for attached transonic airfoil analysis problems. The boundary-layer formulation utilizes the semi-discrete Galerkin (SDG) method to model the spatial variable normal to the surface with linear finite elements and the time-like variable with finite differences. A Dorodnitsyn transformed system of equations is used to bound the infinite spatial domain thereby permitting the use of a uniform finite element grid which provides high resolution near the wall and automatically follows boundary-layer growth. The second-order accurate Crank-Nicholson scheme is applied along with a linearization method to take advantage of the parabolic nature of the boundary-layer equations and generate a non-iterative marching routine. The SDG code can be applied to any smoothly-connected airfoil shape without modification and can be coupled to any inviscid flow solver. In this analysis, a direct viscous-inviscid interaction is accomplished between the Euler and boundary-layer codes, through the application of a transpiration velocity boundary condition. Results are presented for compressible turbulent flow past NACA 0012 and RAE 2822 airfoils at various freestream Mach numbers, Reynolds numbers, and angles of attack. All results show good agreement with experiment, and the coupled code proved to be a computationally-efficient and accurate airfoil analysis tool.
NASA Astrophysics Data System (ADS)
Jahandari, H.; Farquharson, C. G.
2017-11-01
Unstructured grids enable representing arbitrary structures more accurately and with fewer cells compared to regular structured grids. These grids also allow more efficient refinements compared to rectilinear meshes. In this study, tetrahedral grids are used for the inversion of magnetotelluric (MT) data, which allows for the direct inclusion of topography in the model, for constraining an inversion using a wireframe-based geological model and for local refinement at the observation stations. A minimum-structure method with an iterative model-space Gauss-Newton algorithm for optimization is used. An iterative solver is employed for solving the normal system of equations at each Gauss-Newton step and the sensitivity matrix-vector products that are required by this solver are calculated using pseudo-forward problems. This method alleviates the need to explicitly form the Hessian or Jacobian matrices which significantly reduces the required computation memory. Forward problems are formulated using an edge-based finite-element approach and a sparse direct solver is used for the solutions. This solver allows saving and re-using the factorization of matrices for similar pseudo-forward problems within a Gauss-Newton iteration which greatly minimizes the computation time. Two examples are presented to show the capability of the algorithm: the first example uses a benchmark model while the second example represents a realistic geological setting with topography and a sulphide deposit. The data that are inverted are the full-tensor impedance and the magnetic transfer function vector. The inversions sufficiently recovered the models and reproduced the data, which shows the effectiveness of unstructured grids for complex and realistic MT inversion scenarios. The first example is also used to demonstrate the computational efficiency of the presented model-space method by comparison with its data-space counterpart.
D4Z - a new renumbering for iterative solution of ground-water flow and solute- transport equations
Kipp, K.L.; Russell, T.F.; Otto, J.S.
1992-01-01
D4 zig-zag (D4Z) is a new renumbering scheme for producing a reduced matrix to be solved by an incomplete LU preconditioned, restarted conjugate-gradient iterative solver. By renumbering alternate diagonals in a zig-zag fashion, a very low sensitivity of convergence rate to renumbering direction is obtained. For two demonstration problems involving groundwater flow and solute transport, iteration counts are related to condition numbers and spectra of the reduced matrices.
Large Deviations and Quasipotential for Finite State Mean Field Interacting Particle Systems
2014-05-01
The conclusion then follows by applying Lemma 4.4.2. 132 119 4.4.1 Iterative solver: The widest neighborhood structure We employ Gauss - Seidel ...nearest neighborhood structure described in Section 4.4.2. We use Gauss - Seidel iterative method for our numerical experiments. The Gauss - Seidel ...x ∈ Bh, M x ∈ Sh\\Bh, where M ∈ (V,∞) is a very large number, so that the iteration (4.5.1) converges quickly. For simplicity, we restrict our
Performance of Nonlinear Finite-Difference Poisson-Boltzmann Solvers
Cai, Qin; Hsieh, Meng-Juei; Wang, Jun; Luo, Ray
2014-01-01
We implemented and optimized seven finite-difference solvers for the full nonlinear Poisson-Boltzmann equation in biomolecular applications, including four relaxation methods, one conjugate gradient method, and two inexact Newton methods. The performance of the seven solvers was extensively evaluated with a large number of nucleic acids and proteins. Worth noting is the inexact Newton method in our analysis. We investigated the role of linear solvers in its performance by incorporating the incomplete Cholesky conjugate gradient and the geometric multigrid into its inner linear loop. We tailored and optimized both linear solvers for faster convergence rate. In addition, we explored strategies to optimize the successive over-relaxation method to reduce its convergence failures without too much sacrifice in its convergence rate. Specifically we attempted to adaptively change the relaxation parameter and to utilize the damping strategy from the inexact Newton method to improve the successive over-relaxation method. Our analysis shows that the nonlinear methods accompanied with a functional-assisted strategy, such as the conjugate gradient method and the inexact Newton method, can guarantee convergence in the tested molecules. Especially the inexact Newton method exhibits impressive performance when it is combined with highly efficient linear solvers that are tailored for its special requirement. PMID:24723843
A new iterative scheme for solving the discrete Smoluchowski equation
NASA Astrophysics Data System (ADS)
Smith, Alastair J.; Wells, Clive G.; Kraft, Markus
2018-01-01
This paper introduces a new iterative scheme for solving the discrete Smoluchowski equation and explores the numerical convergence properties of the method for a range of kernels admitting analytical solutions, in addition to some more physically realistic kernels typically used in kinetics applications. The solver is extended to spatially dependent problems with non-uniform velocities and its performance investigated in detail.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arndt, S.; Merkel, P.; Monticello, D.A.
Fixed- and free-boundary equilibria for Wendelstein 7-X (W7-X) [W. Lotz {ital et al.}, {ital Plasma Physics and Controlled Nuclear Fusion Research 1990} (Proc. 13th Int. Conf. Washington, DC, 1990), (International Atomic Energy Agency, Vienna, 1991), Vol. 2, p. 603] configurations are calculated using the Princeton Iterative Equilibrium Solver (PIES) [A. H. Reiman {ital et al.}, Comput. Phys. Commun., {bold 43}, 157 (1986)] to deal with magnetic islands and stochastic regions. Usually, these W7-X configurations require a large number of iterations for PIES convergence. Here, two methods have been successfully tested in an attempt to decrease the number of iterations neededmore » for convergence. First, periodic sequences of different blending parameters are used. Second, the initial guess is vastly improved by using results of the Variational Moments Equilibrium Code (VMEC) [S. P. Hirshmann {ital et al.}, Phys. Fluids {bold 26}, 3553 (1983)]. Use of these two methods have allowed verification of the Hamada condition and tendency of {open_quotes}self-healing{close_quotes} of islands has been observed. {copyright} {ital 1999 American Institute of Physics.}« less
Near-optimal matrix recovery from random linear measurements.
Romanov, Elad; Gavish, Matan
2018-06-25
In matrix recovery from random linear measurements, one is interested in recovering an unknown M-by-N matrix [Formula: see text] from [Formula: see text] measurements [Formula: see text], where each [Formula: see text] is an M-by-N measurement matrix with i.i.d. random entries, [Formula: see text] We present a matrix recovery algorithm, based on approximate message passing, which iteratively applies an optimal singular-value shrinker-a nonconvex nonlinearity tailored specifically for matrix estimation. Our algorithm typically converges exponentially fast, offering a significant speedup over previously suggested matrix recovery algorithms, such as iterative solvers for nuclear norm minimization (NNM). It is well known that there is a recovery tradeoff between the information content of the object [Formula: see text] to be recovered (specifically, its matrix rank r) and the number of linear measurements n from which recovery is to be attempted. The precise tradeoff between r and n, beyond which recovery by a given algorithm becomes possible, traces the so-called phase transition curve of that algorithm in the [Formula: see text] plane. The phase transition curve of our algorithm is noticeably better than that of NNM. Interestingly, it is close to the information-theoretic lower bound for the minimal number of measurements needed for matrix recovery, making it not only state of the art in terms of convergence rate, but also near optimal in terms of the matrices it successfully recovers. Copyright © 2018 the Author(s). Published by PNAS.
Regularization and computational methods for precise solution of perturbed orbit transfer problems
NASA Astrophysics Data System (ADS)
Woollands, Robyn Michele
The author has developed a suite of algorithms for solving the perturbed Lambert's problem in celestial mechanics. These algorithms have been implemented as a parallel computation tool that has broad applicability. This tool is composed of four component algorithms and each provides unique benefits for solving a particular type of orbit transfer problem. The first one utilizes a Keplerian solver (a-iteration) for solving the unperturbed Lambert's problem. This algorithm not only provides a "warm start" for solving the perturbed problem but is also used to identify which of several perturbed solvers is best suited for the job. The second algorithm solves the perturbed Lambert's problem using a variant of the modified Chebyshev-Picard iteration initial value solver that solves two-point boundary value problems. This method converges over about one third of an orbit and does not require a Newton-type shooting method and thus no state transition matrix needs to be computed. The third algorithm makes use of regularization of the differential equations through the Kustaanheimo-Stiefel transformation and extends the domain of convergence over which the modified Chebyshev-Picard iteration two-point boundary value solver will converge, from about one third of an orbit to almost a full orbit. This algorithm also does not require a Newton-type shooting method. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver to solve the perturbed two-impulse Lambert problem over multiple revolutions. The method of particular solutions is a shooting method but differs from the Newton-type shooting methods in that it does not require integration of the state transition matrix. The mathematical developments that underlie these four algorithms are derived in the chapters of this dissertation. For each of the algorithms, some orbit transfer test cases are included to provide insight on accuracy and efficiency of these individual algorithms. Following this discussion, the combined parallel algorithm, known as the unified Lambert tool, is presented and an explanation is given as to how it automatically selects which of the three perturbed solvers to compute the perturbed solution for a particular orbit transfer. The unified Lambert tool may be used to determine a single orbit transfer or for generating of an extremal field map. A case study is presented for a mission that is required to rendezvous with two pieces of orbit debris (spent rocket boosters). The unified Lambert tool software developed in this dissertation is already being utilized by several industrial partners and we are confident that it will play a significant role in practical applications, including solution of Lambert problems that arise in the current applications focused on enhanced space situational awareness.
Womack, James C; Anton, Lucian; Dziedzic, Jacek; Hasnip, Phil J; Probert, Matt I J; Skylaris, Chris-Kriton
2018-03-13
The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼10 9 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.
Higher Order, Hybrid BEM/FEM Methods Applied to Antenna Modeling
NASA Technical Reports Server (NTRS)
Fink, P. W.; Wilton, D. R.; Dobbins, J. A.
2002-01-01
In this presentation, the authors address topics relevant to higher order modeling using hybrid BEM/FEM formulations. The first of these is the limitation on convergence rates imposed by geometric modeling errors in the analysis of scattering by a dielectric sphere. The second topic is the application of an Incomplete LU Threshold (ILUT) preconditioner to solve the linear system resulting from the BEM/FEM formulation. The final tOpic is the application of the higher order BEM/FEM formulation to antenna modeling problems. The authors have previously presented work on the benefits of higher order modeling. To achieve these benefits, special attention is required in the integration of singular and near-singular terms arising in the surface integral equation. Several methods for handling these terms have been presented. It is also well known that achieving he high rates of convergence afforded by higher order bases may als'o require the employment of higher order geometry models. A number of publications have described the use of quadratic elements to model curved surfaces. The authors have shown in an EFIE formulation, applied to scattering by a PEC .sphere, that quadratic order elements may be insufficient to prevent the domination of modeling errors. In fact, on a PEC sphere with radius r = 0.58 Lambda(sub 0), a quartic order geometry representation was required to obtain a convergence benefi.t from quadratic bases when compared to the convergence rate achieved with linear bases. Initial trials indicate that, for a dielectric sphere of the same radius, - requirements on the geometry model are not as severe as for the PEC sphere. The authors will present convergence results for higher order bases as a function of the geometry model order in the hybrid BEM/FEM formulation applied to dielectric spheres. It is well known that the system matrix resulting from the hybrid BEM/FEM formulation is ill -conditioned. For many real applications, a good preconditioner is required to obtain usable convergence from an iterative solver. The authors have examined the use of an Incomplete LU Threshold (ILUT) preconditioner . to solver linear systems stemming from higher order BEM/FEM formulations in 2D scattering problems. Although the resulting preconditioner provided aD excellent approximation to the system inverse, its size in terms of non-zero entries represented only a modest improvement when compared with the fill-in associated with a sparse direct solver. Furthermore, the fill-in of the preconditioner could not be substantially reduced without the occurrence of instabilities. In addition to the results for these 2D problems, the authors will present iterative solution data from the application of the ILUT preconditioner to 3D problems.
Flutter and Forced Response Analyses of Cascades using a Two-Dimensional Linearized Euler Solver
NASA Technical Reports Server (NTRS)
Reddy, T. S. R.; Srivastava, R.; Mehmed, O.
1999-01-01
Flutter and forced response analyses for a cascade of blades in subsonic and transonic flow is presented. The structural model for each blade is a typical section with bending and torsion degrees of freedom. The unsteady aerodynamic forces due to bending and torsion motions. and due to a vortical gust disturbance are obtained by solving unsteady linearized Euler equations. The unsteady linearized equations are obtained by linearizing the unsteady nonlinear equations about the steady flow. The predicted unsteady aerodynamic forces include the effect of steady aerodynamic loading due to airfoil shape, thickness and angle of attack. The aeroelastic equations are solved in the frequency domain by coupling the un- steady aerodynamic forces to the aeroelastic solver MISER. The present unsteady aerodynamic solver showed good correlation with published results for both flutter and forced response predictions. Further improvements are required to use the unsteady aerodynamic solver in a design cycle.
USM3D Unstructured Grid Solutions for CAWAPI at NASA LaRC
NASA Technical Reports Server (NTRS)
Lamar, John E.; Abdol-Hamid, Khaled S.
2007-01-01
In support the Cranked Arrow Wing Aerodynamic Project International (CAWAPI) to improve the Technology Readiness Level of flow solvers by comparing results with measured F-16XL-1 flight data, NASA Langley employed the TetrUSS unstructured grid solver, USM3D, to obtain solutions for all seven flight conditions of interest. A newly available solver version that incorporates a number of turbulence models, including the two-equation linear and non-linear k-epsilon, was used in this study. As a first test, a choice was made to utilize only a single grid resolution with the solver for the simulation of the different flight conditions. Comparisons are presented with three turbulence models in USM3D, flight data for surface pressure, boundary-layer profiles, and skin-friction results, as well as limited predictions from other solvers. A result of these comparisons is that the USM3D solver can be used in an engineering environment to predict flow physics on a complex configuration at flight Reynolds numbers with a two-equation linear k-epsilon turbulence model.
Parallel Preconditioning for CFD Problems on the CM-5
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)
1994-01-01
Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
Efficient solution of the simplified P N equations
Hamilton, Steven P.; Evans, Thomas M.
2014-12-23
We show new solver strategies for the multigroup SPN equations for nuclear reactor analysis. By forming the complete matrix over space, moments, and energy a robust set of solution strategies may be applied. Moreover, power iteration, shifted power iteration, Rayleigh quotient iteration, Arnoldi's method, and a generalized Davidson method, each using algebraic and physics-based multigrid preconditioners, have been compared on C5G7 MOX test problem as well as an operational PWR model. These results show that the most ecient approach is the generalized Davidson method, that is 30-40 times faster than traditional power iteration and 6-10 times faster than Arnoldi's method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chacon, Luis; Stanier, Adam John
Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
Menu-Driven Solver Of Linear-Programming Problems
NASA Technical Reports Server (NTRS)
Viterna, L. A.; Ferencz, D.
1992-01-01
Program assists inexperienced user in formulating linear-programming problems. A Linear Program Solver (ALPS) computer program is full-featured LP analysis program. Solves plain linear-programming problems as well as more-complicated mixed-integer and pure-integer programs. Also contains efficient technique for solution of purely binary linear-programming problems. Written entirely in IBM's APL2/PC software, Version 1.01. Packed program contains licensed material, property of IBM (copyright 1988, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Gregory H.
2003-08-06
In this paper we present a general iterative method for the solution of the Riemann problem for hyperbolic systems of PDEs. The method is based on the multiple shooting method for free boundary value problems. We demonstrate the method by solving one-dimensional Riemann problems for hyperelastic solid mechanics. Even for conditions representative of routine laboratory conditions and military ballistics, dramatic differences are seen between the exact and approximate Riemann solution. The greatest discrepancy arises from misallocation of energy between compressional and thermal modes by the approximate solver, resulting in nonphysical entropy and temperature estimates. Several pathological conditions arise in commonmore » practice, and modifications to the method to handle these are discussed. These include points where genuine nonlinearity is lost, degeneracies, and eigenvector deficiencies that occur upon melting.« less
Blade design and analysis using a modified Euler solver
NASA Technical Reports Server (NTRS)
Leonard, O.; Vandenbraembussche, R. A.
1991-01-01
An iterative method for blade design based on Euler solver and described in an earlier paper is used to design compressor and turbine blades providing shock free transonic flows. The method shows a rapid convergence, and indicates how much the flow is sensitive to small modifications of the blade geometry, that the classical iterative use of analysis methods might not be able to define. The relationship between the required Mach number distribution and the resulting geometry is discussed. Examples show how geometrical constraints imposed upon the blade shape can be respected by using free geometrical parameters or by relaxing the required Mach number distribution. The same code is used both for the design of the required geometry and for the off-design calculations. Examples illustrate the difficulty of designing blade shapes with optimal performance also outside of the design point.
NASA Astrophysics Data System (ADS)
Klein, Andreas; Gerlach, Gerald
1998-09-01
This paper deals with the simulation of the fluid-structure interaction phenomena in micropumps. The proposed solution approach is based on external coupling of two different solvers, which are considered here as `black boxes'. Therefore, no specific intervention is necessary into the program code, and solvers can be exchanged arbitrarily. For the realization of the external iteration loop, two algorithms are considered: the relaxation-based Gauss-Seidel method and the computationally more extensive Newton method. It is demonstrated in terms of a simplified test case, that for rather weak coupling, the Gauss-Seidel method is sufficient. However, by simply changing the considered fluid from air to water, the two physical domains become strongly coupled, and the Gauss-Seidel method fails to converge in this case. The Newton iteration scheme must be used instead.
A Parallel Fast Sweeping Method for the Eikonal Equation
NASA Astrophysics Data System (ADS)
Baker, B.
2017-12-01
Recently, there has been an exciting emergence of probabilistic methods for travel time tomography. Unlike gradient-based optimization strategies, probabilistic tomographic methods are resistant to becoming trapped in a local minimum and provide a much better quantification of parameter resolution than, say, appealing to ray density or performing checkerboard reconstruction tests. The benefits associated with random sampling methods however are only realized by successive computation of predicted travel times in, potentially, strongly heterogeneous media. To this end this abstract is concerned with expediting the solution of the Eikonal equation. While many Eikonal solvers use a fast marching method, the proposed solver will use the iterative fast sweeping method because the eight fixed sweep orderings in each iteration are natural targets for parallelization. To reduce the number of iterations and grid points required the high-accuracy finite difference stencil of Nobel et al., 2014 is implemented. A directed acyclic graph (DAG) is created with a priori knowledge of the sweep ordering and finite different stencil. By performing a topological sort of the DAG sets of independent nodes are identified as candidates for concurrent updating. Additionally, the proposed solver will also address scalability during earthquake relocation, a necessary step in local and regional earthquake tomography and a barrier to extending probabilistic methods from active source to passive source applications, by introducing an asynchronous parallel forward solve phase for all receivers in the network. Synthetic examples using the SEG over-thrust model will be presented.
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2014-02-28
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2015-01-01
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230
An efficient spectral crystal plasticity solver for GPU architectures
NASA Astrophysics Data System (ADS)
Malahe, Michael
2018-03-01
We present a spectral crystal plasticity (CP) solver for graphics processing unit (GPU) architectures that achieves a tenfold increase in efficiency over prior GPU solvers. The approach makes use of a database containing a spectral decomposition of CP simulations performed using a conventional iterative solver over a parameter space of crystal orientations and applied velocity gradients. The key improvements in efficiency come from reducing global memory transactions, exposing more instruction-level parallelism, reducing integer instructions and performing fast range reductions on trigonometric arguments. The scheme also makes more efficient use of memory than prior work, allowing for larger problems to be solved on a single GPU. We illustrate these improvements with a simulation of 390 million crystal grains on a consumer-grade GPU, which executes at a rate of 2.72 s per strain step.
Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units.
Qi, Ruxi; Botello-Smith, Wesley M; Luo, Ray
2017-07-11
Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this study, we implemented and analyzed commonly used linear PBE solvers for the ever-improving graphics processing units (GPU) for biomolecular simulations, including both standard and preconditioned conjugate gradient (CG) solvers with several alternative preconditioners. Our implementation utilizes the standard Nvidia CUDA libraries cuSPARSE, cuBLAS, and CUSP. Extensive tests show that good numerical accuracy can be achieved given that the single precision is often used for numerical applications on GPU platforms. The optimal GPU performance was observed with the Jacobi-preconditioned CG solver, with a significant speedup over standard CG solver on CPU in our diversified test cases. Our analysis further shows that different matrix storage formats also considerably affect the efficiency of different linear PBE solvers on GPU, with the diagonal format best suited for our standard finite-difference linear systems. Further efficiency may be possible with matrix-free operations and integrated grid stencil setup specifically tailored for the banded matrices in PBE-specific linear systems.
Fast immersed interface Poisson solver for 3D unbounded problems around arbitrary geometries
NASA Astrophysics Data System (ADS)
Gillis, T.; Winckelmans, G.; Chatelain, P.
2018-02-01
We present a fast and efficient Fourier-based solver for the Poisson problem around an arbitrary geometry in an unbounded 3D domain. This solver merges two rewarding approaches, the lattice Green's function method and the immersed interface method, using the Sherman-Morrison-Woodbury decomposition formula. The method is intended to be second order up to the boundary. This is verified on two potential flow benchmarks. We also further analyse the iterative process and the convergence behavior of the proposed algorithm. The method is applicable to a wide range of problems involving a Poisson equation around inner bodies, which goes well beyond the present validation on potential flows.
NASA Astrophysics Data System (ADS)
Aviat, Félix; Lagardère, Louis; Piquemal, Jean-Philip
2017-10-01
In a recent paper [F. Aviat et al., J. Chem. Theory Comput. 13, 180-190 (2017)], we proposed the Truncated Conjugate Gradient (TCG) approach to compute the polarization energy and forces in polarizable molecular simulations. The method consists in truncating the conjugate gradient algorithm at a fixed predetermined order leading to a fixed computational cost and can thus be considered "non-iterative." This gives the possibility to derive analytical forces avoiding the usual energy conservation (i.e., drifts) issues occurring with iterative approaches. A key point concerns the evaluation of the analytical gradients, which is more complex than that with a usual solver. In this paper, after reviewing the present state of the art of polarization solvers, we detail a viable strategy for the efficient implementation of the TCG calculation. The complete cost of the approach is then measured as it is tested using a multi-time step scheme and compared to timings using usual iterative approaches. We show that the TCG methods are more efficient than traditional techniques, making it a method of choice for future long molecular dynamics simulations using polarizable force fields where energy conservation matters. We detail the various steps required for the implementation of the complete method by software developers.
Aviat, Félix; Lagardère, Louis; Piquemal, Jean-Philip
2017-10-28
In a recent paper [F. Aviat et al., J. Chem. Theory Comput. 13, 180-190 (2017)], we proposed the Truncated Conjugate Gradient (TCG) approach to compute the polarization energy and forces in polarizable molecular simulations. The method consists in truncating the conjugate gradient algorithm at a fixed predetermined order leading to a fixed computational cost and can thus be considered "non-iterative." This gives the possibility to derive analytical forces avoiding the usual energy conservation (i.e., drifts) issues occurring with iterative approaches. A key point concerns the evaluation of the analytical gradients, which is more complex than that with a usual solver. In this paper, after reviewing the present state of the art of polarization solvers, we detail a viable strategy for the efficient implementation of the TCG calculation. The complete cost of the approach is then measured as it is tested using a multi-time step scheme and compared to timings using usual iterative approaches. We show that the TCG methods are more efficient than traditional techniques, making it a method of choice for future long molecular dynamics simulations using polarizable force fields where energy conservation matters. We detail the various steps required for the implementation of the complete method by software developers.
Memory sparing, fast scattering formalism for rigorous diffraction modeling
NASA Astrophysics Data System (ADS)
Iff, W.; Kämpfe, T.; Jourlin, Y.; Tishchenko, A. V.
2017-07-01
The basics and algorithmic steps of a novel scattering formalism suited for memory sparing and fast electromagnetic calculations are presented. The formalism, called ‘S-vector algorithm’ (by analogy with the known scattering-matrix algorithm), allows the calculation of the collective scattering spectra of individual layered micro-structured scattering objects. A rigorous method of linear complexity is applied to model the scattering at individual layers; here the generalized source method (GSM) resorting to Fourier harmonics as basis functions is used as one possible method of linear complexity. The concatenation of the individual scattering events can be achieved sequentially or in parallel, both having pros and cons. The present development will largely concentrate on a consecutive approach based on the multiple reflection series. The latter will be reformulated into an implicit formalism which will be associated with an iterative solver, resulting in improved convergence. The examples will first refer to 1D grating diffraction for the sake of simplicity and intelligibility, with a final 2D application example.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petra, C.; Gavrea, B.; Anitescu, M.
2009-01-01
The present work aims at comparing the performance of several quadratic programming (QP) solvers for simulating large-scale frictional rigid-body systems. Traditional time-stepping schemes for simulation of multibody systems are formulated as linear complementarity problems (LCPs) with copositive matrices. Such LCPs are generally solved by means of Lemke-type algorithms and solvers such as the PATH solver proved to be robust. However, for large systems, the PATH solver or any other pivotal algorithm becomes unpractical from a computational point of view. The convex relaxation proposed by one of the authors allows the formulation of the integration step as a QPD, for whichmore » a wide variety of state-of-the-art solvers are available. In what follows we report the results obtained solving that subproblem when using the QP solvers MOSEK, OOQP, TRON, and BLMVM. OOQP is presented with both the symmetric indefinite solver MA27 and our Cholesky reformulation using the CHOLMOD package. We investigate computational performance and address the correctness of the results from a modeling point of view. We conclude that the OOQP solver, particularly with the CHOLMOD linear algebra solver, has predictable performance and memory use patterns and is far more competitive for these problems than are the other solvers.« less
Implicit time accurate simulation of unsteady flow
NASA Astrophysics Data System (ADS)
van Buuren, René; Kuerten, Hans; Geurts, Bernard J.
2001-03-01
Implicit time integration was studied in the context of unsteady shock-boundary layer interaction flow. With an explicit second-order Runge-Kutta scheme, a reference solution to compare with the implicit second-order Crank-Nicolson scheme was determined. The time step in the explicit scheme is restricted by both temporal accuracy as well as stability requirements, whereas in the A-stable implicit scheme, the time step has to obey temporal resolution requirements and numerical convergence conditions. The non-linear discrete equations for each time step are solved iteratively by adding a pseudo-time derivative. The quasi-Newton approach is adopted and the linear systems that arise are approximately solved with a symmetric block Gauss-Seidel solver. As a guiding principle for properly setting numerical time integration parameters that yield an efficient time accurate capturing of the solution, the global error caused by the temporal integration is compared with the error resulting from the spatial discretization. Focus is on the sensitivity of properties of the solution in relation to the time step. Numerical simulations show that the time step needed for acceptable accuracy can be considerably larger than the explicit stability time step; typical ratios range from 20 to 80. At large time steps, convergence problems that are closely related to a highly complex structure of the basins of attraction of the iterative method may occur. Copyright
Three-dimensional inverse modelling of damped elastic wave propagation in the Fourier domain
NASA Astrophysics Data System (ADS)
Petrov, Petr V.; Newman, Gregory A.
2014-09-01
3-D full waveform inversion (FWI) of seismic wavefields is routinely implemented with explicit time-stepping simulators. A clear advantage of explicit time stepping is the avoidance of solving large-scale implicit linear systems that arise with frequency domain formulations. However, FWI using explicit time stepping may require a very fine time step and (as a consequence) significant computational resources and run times. If the computational challenges of wavefield simulation can be effectively handled, an FWI scheme implemented within the frequency domain utilizing only a few frequencies, offers a cost effective alternative to FWI in the time domain. We have therefore implemented a 3-D FWI scheme for elastic wave propagation in the Fourier domain. To overcome the computational bottleneck in wavefield simulation, we have exploited an efficient Krylov iterative solver for the elastic wave equations approximated with second and fourth order finite differences. The solver does not exploit multilevel preconditioning for wavefield simulation, but is coupled efficiently to the inversion iteration workflow to reduce computational cost. The workflow is best described as a series of sequential inversion experiments, where in the case of seismic reflection acquisition geometries, the data has been laddered such that we first image highly damped data, followed by data where damping is systemically reduced. The key to our modelling approach is its ability to take advantage of solver efficiency when the elastic wavefields are damped. As the inversion experiment progresses, damping is significantly reduced, effectively simulating non-damped wavefields in the Fourier domain. While the cost of the forward simulation increases as damping is reduced, this is counterbalanced by the cost of the outer inversion iteration, which is reduced because of a better starting model obtained from the larger damped wavefield used in the previous inversion experiment. For cross-well data, it is also possible to launch a successful inversion experiment without laddering the damping constants. With this type of acquisition geometry, the solver is still quite effective using a small fixed damping constant. To avoid cycle skipping, we also employ a multiscale imaging approach, in which frequency content of the data is also laddered (with the data now including both reflection and cross-well data acquisition geometries). Thus the inversion process is launched using low frequency data to first recover the long spatial wavelength of the image. With this image as a new starting model, adding higher frequency data refines and enhances the resolution of the image. FWI using laddered frequencies with an efficient damping schemed enables reconstructing elastic attributes of the subsurface at a resolution that approaches half the smallest wavelength utilized to image the subsurface. We show the possibility of effectively carrying out such reconstructions using two to six frequencies, depending upon the application. Using the proposed FWI scheme, massively parallel computing resources are essential for reasonable execution times.
Efficiency and flexibility using implicit methods within atmosphere dycores
NASA Astrophysics Data System (ADS)
Evans, K. J.; Archibald, R.; Norman, M. R.; Gardner, D. J.; Woodward, C. S.; Worley, P.; Taylor, M.
2016-12-01
A suite of explicit and implicit methods are evaluated for a range of configurations of the shallow water dynamical core within the spectral-element Community Atmosphere Model (CAM-SE) to explore their relative computational performance. The configurations are designed to explore the attributes of each method under different but relevant model usage scenarios including varied spectral order within an element, static regional refinement, and scaling to large problem sizes. The limitations and benefits of using explicit versus implicit, with different discretizations and parameters, are discussed in light of trade-offs such as MPI communication, memory, and inherent efficiency bottlenecks. For the regionally refined shallow water configurations, the implicit BDF2 method is about the same efficiency as an explicit Runge-Kutta method, without including a preconditioner. Performance of the implicit methods with the residual function executed on a GPU is also presented; there is speed up for the residual relative to a CPU, but overwhelming transfer costs motivate moving more of the solver to the device. Given the performance behavior of implicit methods within the shallow water dynamical core, the recommendation for future work using implicit solvers is conditional based on scale separation and the stiffness of the problem. The strong growth of linear iterations with increasing resolution or time step size is the main bottleneck to computational efficiency. Within the hydrostatic dynamical core, of CAM-SE, we present results utilizing approximate block factorization preconditioners implemented using the Trilinos library of solvers. They reduce the cost of linear system solves and improve parallel scalability. We provide a summary of the remaining efficiency considerations within the preconditioner and utilization of the GPU, as well as a discussion about the benefits of a time stepping method that provides converged and stable solutions for a much wider range of time step sizes. As more complex model components, for example new physics and aerosols, are connected in the model, having flexibility in the time stepping will enable more options for combining and resolving multiple scales of behavior.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weston, Brian T.
This dissertation focuses on the development of a fully-implicit, high-order compressible ow solver with phase change. The work is motivated by laser-induced phase change applications, particularly by the need to develop large-scale multi-physics simulations of the selective laser melting (SLM) process in metal additive manufacturing (3D printing). Simulations of the SLM process require precise tracking of multi-material solid-liquid-gas interfaces, due to laser-induced melting/ solidi cation and evaporation/condensation of metal powder in an ambient gas. These rapid density variations and phase change processes tightly couple the governing equations, requiring a fully compressible framework to robustly capture the rapid density variations ofmore » the ambient gas and the melting/evaporation of the metal powder. For non-isothermal phase change, the velocity is gradually suppressed through the mushy region by a variable viscosity and Darcy source term model. The governing equations are discretized up to 4th-order accuracy with our reconstructed Discontinuous Galerkin spatial discretization scheme and up to 5th-order accuracy with L-stable fully implicit time discretization schemes (BDF2 and ESDIRK3-5). The resulting set of non-linear equations is solved using a robust Newton-Krylov method, with the Jacobian-free version of the GMRES solver for linear iterations. Due to the sti nes associated with the acoustic waves and thermal and viscous/material strength e ects, preconditioning the GMRES solver is essential. A robust and scalable approximate block factorization preconditioner was developed, which utilizes the velocity-pressure (vP) and velocity-temperature (vT) Schur complement systems. This multigrid block reduction preconditioning technique converges for high CFL/Fourier numbers and exhibits excellent parallel and algorithmic scalability on classic benchmark problems in uid dynamics (lid-driven cavity ow and natural convection heat transfer) as well as for laser-induced phase change problems in 2D and 3D.« less
Parallel computation of fluid-structural interactions using high resolution upwind schemes
NASA Astrophysics Data System (ADS)
Hu, Zongjun
An efficient and accurate solver is developed to simulate the non-linear fluid-structural interactions in turbomachinery flutter flows. A new low diffusion E-CUSP scheme, Zha CUSP scheme, is developed to improve the efficiency and accuracy of the inviscid flux computation. The 3D unsteady Navier-Stokes equations with the Baldwin-Lomax turbulence model are solved using the finite volume method with the dual-time stepping scheme. The linearized equations are solved with Gauss-Seidel line iterations. The parallel computation is implemented using MPI protocol. The solver is validated with 2D cases for its turbulence modeling, parallel computation and unsteady calculation. The Zha CUSP scheme is validated with 2D cases, including a supersonic flat plate boundary layer, a transonic converging-diverging nozzle and a transonic inlet diffuser. The Zha CUSP2 scheme is tested with 3D cases, including a circular-to-rectangular nozzle, a subsonic compressor cascade and a transonic channel. The Zha CUSP schemes are proved to be accurate, robust and efficient in these tests. The steady and unsteady separation flows in a 3D stationary cascade under high incidence and three inlet Mach numbers are calculated to study the steady state separation flow patterns and their unsteady oscillation characteristics. The leading edge vortex shedding is the mechanism behind the unsteady characteristics of the high incidence separated flows. The separation flow characteristics is affected by the inlet Mach number. The blade aeroelasticity of a linear cascade with forced oscillating blades is studied using parallel computation. A simplified two-passage cascade with periodic boundary condition is first calculated under a medium frequency and a low incidence. The full scale cascade with 9 blades and two end walls is then studied more extensively under three oscillation frequencies and two incidence angles. The end wall influence and the blade stability are studied and compared under different frequencies and incidence angles. The Zha CUSP schemes are the first time to be applied in moving grid systems and 2D and 3D calculations. The implicit Gauss-Seidel iteration with dual time stepping is the first time to be used for moving grid systems. The NASA flutter cascade is the first time to be calculated in full scale.
Parallelization of the preconditioned IDR solver for modern multicore computer systems
NASA Astrophysics Data System (ADS)
Bessonov, O. A.; Fedoseyev, A. I.
2012-10-01
This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Heat Transfer Effects on a Fully Premixed Methane Impinging Flame
2014-10-30
Houzeaux et al., 2009). The GM- RES solver is also employed to solve for the enthalpy and species mass fractions. The Gauss - Seidel iterative method is...the system is therefore split to solve the mo- mentum and continuity equations independently. This is achieved by applying an iterative strategy...the momentum equation twice and the continuity equation once. The momentum equation is solved using the GMRES or BICGSTAB method (diagonal and Gauss
Rotation and neoclassical ripple transport in ITER
Paul, Elizabeth Joy; Landreman, Matt; Poli, Francesca M.; ...
2017-07-13
Neoclassical transport in the presence of non-axisymmetric magnetic fields causes a toroidal torque known as neoclassical toroidal viscosity (NTV). The toroidal symmetry of ITER will be broken by the finite number of toroidal field coils and by test blanket modules (TBMs). The addition of ferritic inserts (FIs) will decrease the magnitude of the toroidal field ripple. 3D magnetic equilibria in the presence of toroidal field ripple and ferromagnetic structures are calculated for an ITER steady-state scenario using the Variational Moments Equilibrium Code (VMEC). Furthermore, neoclassical transport quantities in the presence of these error fields are calculated using the Stellarator Fokker-Planckmore » Iterative Neoclassical Conservative Solver (SFINCS).« less
Rotation and neoclassical ripple transport in ITER
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paul, Elizabeth Joy; Landreman, Matt; Poli, Francesca M.
Neoclassical transport in the presence of non-axisymmetric magnetic fields causes a toroidal torque known as neoclassical toroidal viscosity (NTV). The toroidal symmetry of ITER will be broken by the finite number of toroidal field coils and by test blanket modules (TBMs). The addition of ferritic inserts (FIs) will decrease the magnitude of the toroidal field ripple. 3D magnetic equilibria in the presence of toroidal field ripple and ferromagnetic structures are calculated for an ITER steady-state scenario using the Variational Moments Equilibrium Code (VMEC). Furthermore, neoclassical transport quantities in the presence of these error fields are calculated using the Stellarator Fokker-Planckmore » Iterative Neoclassical Conservative Solver (SFINCS).« less
An Approach to the Constrained Design of Natural Laminar Flow Airfoils
NASA Technical Reports Server (NTRS)
Green, Bradford E.
1997-01-01
A design method has been developed by which an airfoil with a substantial amount of natural laminar flow can be designed, while maintaining other aerodynamic and geometric constraints. After obtaining the initial airfoil's pressure distribution at the design lift coefficient using an Euler solver coupled with an integral turbulent boundary layer method, the calculations from a laminar boundary layer solver are used by a stability analysis code to obtain estimates of the transition location (using N-Factors) for the starting airfoil. A new design method then calculates a target pressure distribution that will increase the laminar flow toward the desired amount. An airfoil design method is then iteratively used to design an airfoil that possesses that target pressure distribution. The new airfoil's boundary layer stability characteristics are determined, and this iterative process continues until an airfoil is designed that meets the laminar flow requirement and as many of the other constraints as possible.
An approach to the constrained design of natural laminar flow airfoils
NASA Technical Reports Server (NTRS)
Green, Bradford Earl
1995-01-01
A design method has been developed by which an airfoil with a substantial amount of natural laminar flow can be designed, while maintaining other aerodynamic and geometric constraints. After obtaining the initial airfoil's pressure distribution at the design lift coefficient using an Euler solver coupled with an integml turbulent boundary layer method, the calculations from a laminar boundary layer solver are used by a stability analysis code to obtain estimates of the transition location (using N-Factors) for the starting airfoil. A new design method then calculates a target pressure distribution that will increase the larninar flow toward the desired amounl An airfoil design method is then iteratively used to design an airfoil that possesses that target pressure distribution. The new airfoil's boundary layer stability characteristics are determined, and this iterative process continues until an airfoil is designed that meets the laminar flow requirement and as many of the other constraints as possible.
NASA Technical Reports Server (NTRS)
Leonard, Michael W.
2013-01-01
Integration of the Control Allocation technique to recover from Pilot Induced Oscillations (CAPIO) System into the control system of a Short Takeoff and Landing Mobility Concept Vehicle simulation presents a challenge because the CAPIO formulation requires that constrained optimization problems be solved at the controller operating frequency. We present a solution that utilizes a modified version of the well-known L-BFGS-B solver. Despite the iterative nature of the solver, the method is seen to converge in real time with sufficient reliability to support three weeks of piloted runs at the NASA Ames Vertical Motion Simulator (VMS) facility. The results of the optimization are seen to be excellent in the vast majority of real-time frames. Deficiencies in the quality of the results in some frames are shown to be improvable with simple termination criteria adjustments, though more real-time optimization iterations would be required.
Wang, G.L.; Chew, W.C.; Cui, T.J.; Aydiner, A.A.; Wright, D.L.; Smith, D.V.
2004-01-01
Three-dimensional (3D) subsurface imaging by using inversion of data obtained from the very early time electromagnetic system (VETEM) was discussed. The study was carried out by using the distorted Born iterative method to match the internal nonlinear property of the 3D inversion problem. The forward solver was based on the total-current formulation bi-conjugate gradient-fast Fourier transform (BCCG-FFT). It was found that the selection of regularization parameter follow a heuristic rule as used in the Levenberg-Marquardt algorithm so that the iteration is stable.
Conservative and bounded volume-of-fluid advection on unstructured grids
NASA Astrophysics Data System (ADS)
Ivey, Christopher B.; Moin, Parviz
2017-12-01
This paper presents a novel Eulerian-Lagrangian piecewise-linear interface calculation (PLIC) volume-of-fluid (VOF) advection method, which is three-dimensional, unsplit, and discretely conservative and bounded. The approach is developed with reference to a collocated node-based finite-volume two-phase flow solver that utilizes the median-dual mesh constructed from non-convex polyhedra. The proposed advection algorithm satisfies conservation and boundedness of the liquid volume fraction irrespective of the underlying flux polyhedron geometry, which differs from contemporary unsplit VOF schemes that prescribe topologically complicated flux polyhedron geometries in efforts to satisfy conservation. Instead of prescribing complicated flux-polyhedron geometries, which are prone to topological failures, our VOF advection scheme, the non-intersecting flux polyhedron advection (NIFPA) method, builds the flux polyhedron iteratively such that its intersection with neighboring flux polyhedra, and any other unavailable volume, is empty and its total volume matches the calculated flux volume. During each iteration, a candidate nominal flux polyhedron is extruded using an iteration dependent scalar. The candidate is subsequently intersected with the volume guaranteed available to it at the time of the flux calculation to generate the candidate flux polyhedron. The difference in the volume of the candidate flux polyhedron and the actual flux volume is used to calculate extrusion during the next iteration. The choice in nominal flux polyhedron impacts the cost and accuracy of the scheme; however, it does not impact the methods underlying conservation and boundedness. As such, various robust nominal flux polyhedron are proposed and tested using canonical periodic kinematic test cases: Zalesak's disk and two- and three-dimensional deformation. The tests are conducted on the median duals of a quadrilateral and triangular primal mesh, in two-dimensions, and on the median duals of a hexahedral, wedge and tetrahedral primal mesh, in three-dimensions. Comparisons are made with the adaptation of a conventional unsplit VOF advection scheme to our collocated node-based flow solver. Depending on the choice in the nominal flux polyhedron, the NIFPA scheme presented accuracies ranging from zeroth to second order and calculation times that differed by orders of magnitude. For the nominal flux polyhedra which demonstrate second-order accuracy on all tests and meshes, the NIFPA method's cost was comparable to the traditional topologically complex second-order accurate VOF advection scheme.
NASA Astrophysics Data System (ADS)
Krauze, W.; Makowski, P.; Kujawińska, M.
2015-06-01
Standard tomographic algorithms applied to optical limited-angle tomography result in the reconstructions that have highly anisotropic resolution and thus special algorithms are developed. State of the art approaches utilize the Total Variation (TV) minimization technique. These methods give very good results but are applicable to piecewise constant structures only. In this paper, we propose a novel algorithm for 3D limited-angle tomography - Total Variation Iterative Constraint method (TVIC) which enhances the applicability of the TV regularization to non-piecewise constant samples, like biological cells. This approach consists of two parts. First, the TV minimization is used as a strong regularizer to create a sharp-edged image converted to a 3D binary mask which is then iteratively applied in the tomographic reconstruction as a constraint in the object domain. In the present work we test the method on a synthetic object designed to mimic basic structures of a living cell. For simplicity, the test reconstructions were performed within the straight-line propagation model (SIRT3D solver from the ASTRA Tomography Toolbox), but the strategy is general enough to supplement any algorithm for tomographic reconstruction that supports arbitrary geometries of plane-wave projection acquisition. This includes optical diffraction tomography solvers. The obtained reconstructions present resolution uniformity and general shape accuracy expected from the TV regularization based solvers, but keeping the smooth internal structures of the object at the same time. Comparison between three different patterns of object illumination arrangement show very small impact of the projection acquisition geometry on the image quality.
Assessment of Linear Finite-Difference Poisson-Boltzmann Solvers
Wang, Jun; Luo, Ray
2009-01-01
CPU time and memory usage are two vital issues that any numerical solvers for the Poisson-Boltzmann equation have to face in biomolecular applications. In this study we systematically analyzed the CPU time and memory usage of five commonly used finite-difference solvers with a large and diversified set of biomolecular structures. Our comparative analysis shows that modified incomplete Cholesky conjugate gradient and geometric multigrid are the most efficient in the diversified test set. For the two efficient solvers, our test shows that their CPU times increase approximately linearly with the numbers of grids. Their CPU times also increase almost linearly with the negative logarithm of the convergence criterion at very similar rate. Our comparison further shows that geometric multigrid performs better in the large set of tested biomolecules. However, modified incomplete Cholesky conjugate gradient is superior to geometric multigrid in molecular dynamics simulations of tested molecules. We also investigated other significant components in numerical solutions of the Poisson-Boltzmann equation. It turns out that the time-limiting step is the free boundary condition setup for the linear systems for the selected proteins if the electrostatic focusing is not used. Thus, development of future numerical solvers for the Poisson-Boltzmann equation should balance all aspects of the numerical procedures in realistic biomolecular applications. PMID:20063271
NASA Astrophysics Data System (ADS)
Balsara, Dinshaw S.; Nkonga, Boniface
2017-10-01
Just as the quality of a one-dimensional approximate Riemann solver is improved by the inclusion of internal sub-structure, the quality of a multidimensional Riemann solver is also similarly improved. Such multidimensional Riemann problems arise when multiple states come together at the vertex of a mesh. The interaction of the resulting one-dimensional Riemann problems gives rise to a strongly-interacting state. We wish to endow this strongly-interacting state with physically-motivated sub-structure. The fastest way of endowing such sub-structure consists of making a multidimensional extension of the HLLI Riemann solver for hyperbolic conservation laws. Presenting such a multidimensional analogue of the HLLI Riemann solver with linear sub-structure for use on structured meshes is the goal of this work. The multidimensional MuSIC Riemann solver documented here is universal in the sense that it can be applied to any hyperbolic conservation law. The multidimensional Riemann solver is made to be consistent with constraints that emerge naturally from the Galerkin projection of the self-similar states within the wave model. When the full eigenstructure in both directions is used in the present Riemann solver, it becomes a complete Riemann solver in a multidimensional sense. I.e., all the intermediate waves are represented in the multidimensional wave model. The work also presents, for the very first time, an important analysis of the dissipation characteristics of multidimensional Riemann solvers. The present Riemann solver results in the most efficient implementation of a multidimensional Riemann solver with sub-structure. Because it preserves stationary linearly degenerate waves, it might also help with well-balancing. Implementation-related details are presented in pointwise fashion for the one-dimensional HLLI Riemann solver as well as the multidimensional MuSIC Riemann solver.
Application of NASA General-Purpose Solver to Large-Scale Computations in Aeroacoustics
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Storaasli, Olaf O.
2004-01-01
Of several iterative and direct equation solvers evaluated previously for computations in aeroacoustics, the most promising was the NASA-developed General-Purpose Solver (winner of NASA's 1999 software of the year award). This paper presents detailed, single-processor statistics of the performance of this solver, which has been tailored and optimized for large-scale aeroacoustic computations. The statistics, compiled using an SGI ORIGIN 2000 computer with 12 Gb available memory (RAM) and eight available processors, are the central processing unit time, RAM requirements, and solution error. The equation solver is capable of solving 10 thousand complex unknowns in as little as 0.01 sec using 0.02 Gb RAM, and 8.4 million complex unknowns in slightly less than 3 hours using all 12 Gb. This latter solution is the largest aeroacoustics problem solved to date with this technique. The study was unable to detect any noticeable error in the solution, since noise levels predicted from these solution vectors are in excellent agreement with the noise levels computed from the exact solution. The equation solver provides a means for obtaining numerical solutions to aeroacoustics problems in three dimensions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spencer, Benjamin Whiting; Crane, Nathan K.; Heinstein, Martin W.
2011-03-01
Adagio is a Lagrangian, three-dimensional, implicit code for the analysis of solids and structures. It uses a multi-level iterative solver, which enables it to solve problems with large deformations, nonlinear material behavior, and contact. It also has a versatile library of continuum and structural elements, and an extensive library of material models. Adagio is written for parallel computing environments, and its solvers allow for scalable solutions of very large problems. Adagio uses the SIERRA Framework, which allows for coupling with other SIERRA mechanics codes. This document describes the functionality and input structure for Adagio.
NASA Astrophysics Data System (ADS)
Yeckel, Andrew; Lun, Lisa; Derby, Jeffrey J.
2009-12-01
A new, approximate block Newton (ABN) method is derived and tested for the coupled solution of nonlinear models, each of which is treated as a modular, black box. Such an approach is motivated by a desire to maintain software flexibility without sacrificing solution efficiency or robustness. Though block Newton methods of similar type have been proposed and studied, we present a unique derivation and use it to sort out some of the more confusing points in the literature. In particular, we show that our ABN method behaves like a Newton iteration preconditioned by an inexact Newton solver derived from subproblem Jacobians. The method is demonstrated on several conjugate heat transfer problems modeled after melt crystal growth processes. These problems are represented by partitioned spatial regions, each modeled by independent heat transfer codes and linked by temperature and flux matching conditions at the boundaries common to the partitions. Whereas a typical block Gauss-Seidel iteration fails about half the time for the model problem, quadratic convergence is achieved by the ABN method under all conditions studied here. Additional performance advantages over existing methods are demonstrated and discussed.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...
2017-06-01
As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel
As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
Solving regularly and singularly perturbed reaction-diffusion equations in three space dimensions
NASA Astrophysics Data System (ADS)
Moore, Peter K.
2007-06-01
In [P.K. Moore, Effects of basis selection and h-refinement on error estimator reliability and solution efficiency for higher-order methods in three space dimensions, Int. J. Numer. Anal. Mod. 3 (2006) 21-51] a fixed, high-order h-refinement finite element algorithm, Href, was introduced for solving reaction-diffusion equations in three space dimensions. In this paper Href is coupled with continuation creating an automatic method for solving regularly and singularly perturbed reaction-diffusion equations. The simple quasilinear Newton solver of Moore, (2006) is replaced by the nonlinear solver NITSOL [M. Pernice, H.F. Walker, NITSOL: a Newton iterative solver for nonlinear systems, SIAM J. Sci. Comput. 19 (1998) 302-318]. Good initial guesses for the nonlinear solver are obtained using continuation in the small parameter ɛ. Two strategies allow adaptive selection of ɛ. The first depends on the rate of convergence of the nonlinear solver and the second implements backtracking in ɛ. Finally a simple method is used to select the initial ɛ. Several examples illustrate the effectiveness of the algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lipnikov, Konstantin; Moulton, David; Svyatskiy, Daniil
2016-04-29
We develop a new approach for solving the nonlinear Richards’ equation arising in variably saturated flow modeling. The growing complexity of geometric models for simulation of subsurface flows leads to the necessity of using unstructured meshes and advanced discretization methods. Typically, a numerical solution is obtained by first discretizing PDEs and then solving the resulting system of nonlinear discrete equations with a Newton-Raphson-type method. Efficiency and robustness of the existing solvers rely on many factors, including an empiric quality control of intermediate iterates, complexity of the employed discretization method and a customized preconditioner. We propose and analyze a new preconditioningmore » strategy that is based on a stable discretization of the continuum Jacobian. We will show with numerical experiments for challenging problems in subsurface hydrology that this new preconditioner improves convergence of the existing Jacobian-free solvers 3-20 times. Furthermore, we show that the Picard method with this preconditioner becomes a more efficient nonlinear solver than a few widely used Jacobian-free solvers.« less
A Lagrangian meshfree method applied to linear and nonlinear elasticity.
Walker, Wade A
2017-01-01
The repeated replacement method (RRM) is a Lagrangian meshfree method which we have previously applied to the Euler equations for compressible fluid flow. In this paper we present new enhancements to RRM, and we apply the enhanced method to both linear and nonlinear elasticity. We compare the results of ten test problems to those of analytic solvers, to demonstrate that RRM can successfully simulate these elastic systems without many of the requirements of traditional numerical methods such as numerical derivatives, equation system solvers, or Riemann solvers. We also show the relationship between error and computational effort for RRM on these systems, and compare RRM to other methods to highlight its strengths and weaknesses. And to further explain the two elastic equations used in the paper, we demonstrate the mathematical procedure used to create Riemann and Sedov-Taylor solvers for them, and detail the numerical techniques needed to embody those solvers in code.
A Lagrangian meshfree method applied to linear and nonlinear elasticity
2017-01-01
The repeated replacement method (RRM) is a Lagrangian meshfree method which we have previously applied to the Euler equations for compressible fluid flow. In this paper we present new enhancements to RRM, and we apply the enhanced method to both linear and nonlinear elasticity. We compare the results of ten test problems to those of analytic solvers, to demonstrate that RRM can successfully simulate these elastic systems without many of the requirements of traditional numerical methods such as numerical derivatives, equation system solvers, or Riemann solvers. We also show the relationship between error and computational effort for RRM on these systems, and compare RRM to other methods to highlight its strengths and weaknesses. And to further explain the two elastic equations used in the paper, we demonstrate the mathematical procedure used to create Riemann and Sedov-Taylor solvers for them, and detail the numerical techniques needed to embody those solvers in code. PMID:29045443
Efficient Parallel Formulations of Hierarchical Methods and Their Applications
NASA Astrophysics Data System (ADS)
Grama, Ananth Y.
1996-01-01
Hierarchical methods such as the Fast Multipole Method (FMM) and Barnes-Hut (BH) are used for rapid evaluation of potential (gravitational, electrostatic) fields in particle systems. They are also used for solving integral equations using boundary element methods. The linear systems arising from these methods are dense and are solved iteratively. Hierarchical methods reduce the complexity of the core matrix-vector product from O(n^2) to O(n log n) and the memory requirement from O(n^2) to O(n). We have developed highly scalable parallel formulations of a hybrid FMM/BH method that are capable of handling arbitrarily irregular distributions. We apply these formulations to astrophysical simulations of Plummer and Gaussian galaxies. We have used our parallel formulations to solve the integral form of the Laplace equation. We show that our parallel hierarchical mat-vecs yield high efficiency and overall performance even on relatively small problems. A problem containing approximately 200K nodes takes under a second to compute on 256 processors and yet yields over 85% efficiency. The efficiency and raw performance is expected to increase for bigger problems. For the 200K node problem, our code delivers about 5 GFLOPS of performance on a 256 processor T3D. This is impressive considering the fact that the problem has floating point divides and roots, and very little locality resulting in poor cache performance. A dense matrix-vector product of the same dimensions would require about 0.5 TeraBytes of memory and about 770 TeraFLOPS of computing speed. Clearly, if the loss in accuracy resulting from the use of hierarchical methods is acceptable, our code yields significant savings in time and memory. We also study the convergence of a GMRES solver built around this mat-vec. We accelerate the convergence of the solver using three preconditioning techniques: diagonal scaling, block-diagonal preconditioning, and inner-outer preconditioning. We study the performance and parallel efficiency of these preconditioned solvers. Using this solver, we solve dense linear systems with hundreds of thousands of unknowns. Solving a 105K unknown problem takes about 10 minutes on a 64 processor T3D. Until very recently, boundary element problems of this magnitude could not even be generated, let alone solved.
Modern Workflow Full Waveform Inversion Applied to North America and the Northern Atlantic
NASA Astrophysics Data System (ADS)
Krischer, Lion; Fichtner, Andreas; Igel, Heiner
2015-04-01
We present the current state of a new seismic tomography model obtained using full waveform inversion of the crustal and upper mantle structure beneath North America and the Northern Atlantic, including the westernmost part of Europe. Parts of the eastern portion of the initial model consists of previous models by Fichtner et al. (2013) and Rickers et al. (2013). The final results of this study will contribute to the 'Comprehensive Earth Model' being developed by the Computational Seismology group at ETH Zurich. Significant challenges include the size of the domain, the uneven event and station coverage, and the strong east-west alignment of seismic ray paths across the North Atlantic. We use as much data as feasible, resulting in several thousand recordings per event depending on the receivers deployed at the earthquakes' origin times. To manage such projects in a reproducible and collaborative manner, we, as tomographers, should abandon ad-hoc scripts and one-time programs, and adopt sustainable and reusable solutions. Therefore we developed the LArge-scale Seismic Inversion Framework (LASIF - http://lasif.net), an open-source toolbox for managing seismic data in the context of non-linear iterative inversions that greatly reduces the time to research. Information on the applied processing, modelling, iterative model updating, what happened during each iteration, and so on are systematically archived. This results in a provenance record of the final model which in the end significantly enhances the reproducibility of iterative inversions. Additionally, tools for automated data download across different data centers, window selection, misfit measurements, parallel data processing, and input file generation for various forward solvers are provided.
Mang, Andreas; Biros, George
2017-01-01
We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations. We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20 × speedup for a two dimensional, real world multi-subject medical image registration problem.
Status and Plans for the TRANSP Interpretive and Predictive Simulation Code
NASA Astrophysics Data System (ADS)
Kaye, Stanley; Andre, Robert; Marina, Gorelenkova; Yuan, Xingqui; Hawryluk, Richard; Jardin, Steven; Poli, Francesca
2015-11-01
TRANSP is an integrated interpretive and predictive transport analysis tool that incorporates state of the art heating/current drive sources and transport models. The treatments and transport solvers are becoming increasingly sophisticated and comprehensive. For instance, the ISOLVER component provides a free boundary equilibrium solution, while the PT_SOLVER transport solver is especially suited for stiff transport models such as TGLF. TRANSP also incorporates such source models as NUBEAM for neutral beam injection, GENRAY, TORAY, TORBEAM, TORIC and CQL3D for ICRH, LHCD, ECH and HHFW. The implementation of selected components makes efficient use of MPI for speed up of code calculations. TRANSP has a wide international user-base, and it is run on the FusionGrid to allow for timely support and quick turnaround by the PPPL Computational Plasma Physics Group. It is being used as a basis for both analysis and development of control algorithms and discharge operational scenarios, including simulation of ITER plasmas. This poster will describe present uses of the code worldwide, as well as plans for upgrading the physics modules and code framework. Progress on implementing TRANSP as a component in the ITER IMAS will also be described. This research was supported by the U.S. Department of Energy under contracts DE-AC02-09CH11466.
A Comparative Study of Randomized Constraint Solvers for Random-Symbolic Testing
NASA Technical Reports Server (NTRS)
Takaki, Mitsuo; Cavalcanti, Diego; Gheyi, Rohit; Iyoda, Juliano; dAmorim, Marcelo; Prudencio, Ricardo
2009-01-01
The complexity of constraints is a major obstacle for constraint-based software verification. Automatic constraint solvers are fundamentally incomplete: input constraints often build on some undecidable theory or some theory the solver does not support. This paper proposes and evaluates several randomized solvers to address this issue. We compare the effectiveness of a symbolic solver (CVC3), a random solver, three hybrid solvers (i.e., mix of random and symbolic), and two heuristic search solvers. We evaluate the solvers on two benchmarks: one consisting of manually generated constraints and another generated with a concolic execution of 8 subjects. In addition to fully decidable constraints, the benchmarks include constraints with non-linear integer arithmetic, integer modulo and division, bitwise arithmetic, and floating-point arithmetic. As expected symbolic solving (in particular, CVC3) subsumes the other solvers for the concolic execution of subjects that only generate decidable constraints. For the remaining subjects the solvers are complementary.
Impurities in a non-axisymmetric plasma. Transport and effect on bootstrap current
Mollén, A.; Landreman, M.; Smith, H. M.; ...
2015-11-20
Impurities cause radiation losses and plasma dilution, and in stellarator plasmas the neoclassical ambipolar radial electric field is often unfavorable for avoiding strong impurity peaking. In this work we use a new continuum drift-kinetic solver, the SFINCS code (the Stellarator Fokker-Planck Iterative Neoclassical Conservative Solver) [M. Landreman et al., Phys. Plasmas 21 (2014) 042503] which employs the full linearized Fokker-Planck-Landau operator, to calculate neoclassical impurity transport coefficients for a Wendelstein 7-X (W7-X) magnetic configuration. We compare SFINCS calculations with theoretical asymptotes in the high collisionality limit. We observe and explain a 1/nu-scaling of the inter-species radial transport coefficient at lowmore » collisionality, arising due to the field term in the inter-species collision operator, and which is not found with simplified collision models even when momentum correction is applied. However, this type of scaling disappears if a radial electric field is present. We use SFINCS to analyze how the impurity content affects the neoclassical impurity dynamics and the bootstrap current. We show that a change in plasma effective charge Z eff of order unity can affect the bootstrap current enough to cause a deviation in the divertor strike point locations.« less
A scalable, fully implicit algorithm for the reduced two-field low-β extended MHD model
Chacon, Luis; Stanier, Adam John
2016-12-01
Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
NASA Astrophysics Data System (ADS)
D'Ambra, Pasqua; Tartaglione, Gaetano
2015-04-01
Image segmentation addresses the problem to partition a given image into its constituent objects and then to identify the boundaries of the objects. This problem can be formulated in terms of a variational model aimed to find optimal approximations of a bounded function by piecewise-smooth functions, minimizing a given functional. The corresponding Euler-Lagrange equations are a set of two coupled elliptic partial differential equations with varying coefficients. Numerical solution of the above system often relies on alternating minimization techniques involving descent methods coupled with explicit or semi-implicit finite-difference discretization schemes, which are slowly convergent and poorly scalable with respect to image size. In this work we focus on generalized relaxation methods also coupled with multigrid linear solvers, when a finite-difference discretization is applied to the Euler-Lagrange equations of Ambrosio-Tortorelli model. We show that non-linear Gauss-Seidel, accelerated by inner linear iterations, is an effective method for large-scale image analysis as those arising from high-throughput screening platforms for stem cells targeted differentiation, where one of the main goal is segmentation of thousand of images to analyze cell colonies morphology.
Solution of Ambrosio-Tortorelli model for image segmentation by generalized relaxation method
NASA Astrophysics Data System (ADS)
D'Ambra, Pasqua; Tartaglione, Gaetano
2015-03-01
Image segmentation addresses the problem to partition a given image into its constituent objects and then to identify the boundaries of the objects. This problem can be formulated in terms of a variational model aimed to find optimal approximations of a bounded function by piecewise-smooth functions, minimizing a given functional. The corresponding Euler-Lagrange equations are a set of two coupled elliptic partial differential equations with varying coefficients. Numerical solution of the above system often relies on alternating minimization techniques involving descent methods coupled with explicit or semi-implicit finite-difference discretization schemes, which are slowly convergent and poorly scalable with respect to image size. In this work we focus on generalized relaxation methods also coupled with multigrid linear solvers, when a finite-difference discretization is applied to the Euler-Lagrange equations of Ambrosio-Tortorelli model. We show that non-linear Gauss-Seidel, accelerated by inner linear iterations, is an effective method for large-scale image analysis as those arising from high-throughput screening platforms for stem cells targeted differentiation, where one of the main goal is segmentation of thousand of images to analyze cell colonies morphology.
Efficient development of memory bounded geo-applications to scale on modern supercomputers
NASA Astrophysics Data System (ADS)
Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric
2016-04-01
Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases, near peak memory bandwidth transfer is achieved. Our approach allows us to get the best out of the current hardware.
NASA Astrophysics Data System (ADS)
Joshi, Vaibhav; Jaiman, Rajeev K.
2018-05-01
We present a positivity preserving variational scheme for the phase-field modeling of incompressible two-phase flows with high density ratio. The variational finite element technique relies on the Allen-Cahn phase-field equation for capturing the phase interface on a fixed Eulerian mesh with mass conservative and energy-stable discretization. The mass conservation is achieved by enforcing a Lagrange multiplier which has both temporal and spatial dependence on the underlying solution of the phase-field equation. To make the scheme energy-stable in a variational sense, we discretize the spatial part of the Lagrange multiplier in the phase-field equation by the mid-point approximation. The proposed variational technique is designed to reduce the spurious and unphysical oscillations in the solution while maintaining the second-order accuracy of both spatial and temporal discretizations. We integrate the Allen-Cahn phase-field equation with the incompressible Navier-Stokes equations for modeling a broad range of two-phase flow and fluid-fluid interface problems. The coupling of the implicit discretizations corresponding to the phase-field and the incompressible flow equations is achieved via nonlinear partitioned iterative procedure. Comparison of results between the standard linear stabilized finite element method and the present variational formulation shows a remarkable reduction of oscillations in the solution while retaining the boundedness of the phase-indicator field. We perform a standalone test to verify the accuracy and stability of the Allen-Cahn two-phase solver. We examine the convergence and accuracy properties of the coupled phase-field solver through the standard benchmarks of the Laplace-Young law and a sloshing tank problem. Two- and three-dimensional dam break problems are simulated to assess the capability of the phase-field solver for complex air-water interfaces involving topological changes on unstructured meshes. Finally, we demonstrate the phase-field solver for a practical offshore engineering application of wave-structure interaction.
Wilson, John D.; Naff, Richard L.
2004-01-01
A geometric multigrid solver (GMG), based in the preconditioned conjugate gradient algorithm, has been developed for solving systems of equations resulting from applying the cell-centered finite difference algorithm to flow in porous media. This solver has been adapted to the U.S. Geological Survey ground-water flow model MODFLOW-2000. The documentation herein is a description of the solver and the adaptation to MODFLOW-2000.
NASA Technical Reports Server (NTRS)
Reddy, T. S. R.; Srivastava, R.; Mehmed, Oral
2002-01-01
An aeroelastic analysis system for flutter and forced response analysis of turbomachines based on a two-dimensional linearized unsteady Euler solver has been developed. The ASTROP2 code, an aeroelastic stability analysis program for turbomachinery, was used as a basis for this development. The ASTROP2 code uses strip theory to couple a two dimensional aerodynamic model with a three dimensional structural model. The code was modified to include forced response capability. The formulation was also modified to include aeroelastic analysis with mistuning. A linearized unsteady Euler solver, LINFLX2D is added to model the unsteady aerodynamics in ASTROP2. By calculating the unsteady aerodynamic loads using LINFLX2D, it is possible to include the effects of transonic flow on flutter and forced response in the analysis. The stability is inferred from an eigenvalue analysis. The revised code, ASTROP2-LE for ASTROP2 code using Linearized Euler aerodynamics, is validated by comparing the predictions with those obtained using linear unsteady aerodynamic solutions.
Hierarchically partitioned nonlinear equation solvers
NASA Technical Reports Server (NTRS)
Padovan, Joseph
1987-01-01
By partitioning solution space into a number of subspaces, a new multiply constrained partitioned Newton-Raphson nonlinear equation solver is developed. Specifically, for a given iteration, each of the various separate partitions are individually and simultaneously controlled. Due to the generality of the scheme, a hierarchy of partition levels can be employed. For finite-element-type applications, this includes the possibility of degree-of-freedom, nodal, elemental, geometric substructural, material and kinematically nonlinear group controls. It is noted that such partitioning can be continuously updated, depending on solution conditioning. In this context, convergence is ascertained at the individual partition level.
Dasgupta, Purnendu K
2008-12-05
Resolution of overlapped chromatographic peaks is generally accomplished by modeling the peaks as Gaussian or modified Gaussian functions. It is possible, even preferable, to use actual single analyte input responses for this purpose and a nonlinear least squares minimization routine such as that provided by Microsoft Excel Solver can then provide the resolution. In practice, the quality of the results obtained varies greatly due to small shifts in retention time. I show here that such deconvolution can be considerably improved if one or more of the response arrays are iteratively shifted in time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cortes, Adriano M.; Dalcin, Lisandro; Sarmiento, Adel F.
The recently introduced divergence-conforming B-spline discretizations allow the construction of smooth discrete velocity–pressure pairs for viscous incompressible flows that are at the same time inf–sup stable and pointwise divergence-free. When applied to the discretized Stokes problem, these spaces generate a symmetric and indefinite saddle-point linear system. The iterative method of choice to solve such system is the Generalized Minimum Residual Method. This method lacks robustness, and one remedy is to use preconditioners. For linear systems of saddle-point type, a large family of preconditioners can be obtained by using a block factorization of the system. In this paper, we show howmore » the nesting of “black-box” solvers and preconditioners can be put together in a block triangular strategy to build a scalable block preconditioner for the Stokes system discretized by divergence-conforming B-splines. Lastly, besides the known cavity flow problem, we used for benchmark flows defined on complex geometries: an eccentric annulus and hollow torus of an eccentric annular cross-section.« less
Computational investigation of large-scale vortex interaction with flexible bodies
NASA Astrophysics Data System (ADS)
Connell, Benjamin; Yue, Dick K. P.
2003-11-01
The interaction of large-scale vortices with flexible bodies is examined with particular interest paid to the energy and momentum budgets of the system. Finite difference direct numerical simulation of the Navier-Stokes equations on a moving curvilinear grid is coupled with a finite difference structural solver of both a linear membrane under tension and linear Euler-Bernoulli beam. The hydrodynamics and structural dynamics are solved simultaneously using an iterative procedure with the external structural forcing calculated from the hydrodynamics at the surface and the flow-field velocity boundary condition given by the structural motion. We focus on an investigation into the canonical problem of a vortex-dipole impinging on a flexible membrane. It is discovered that the structural properties of the membrane direct the interaction in terms of the flow evolution and the energy budget. Pressure gradients associated with resonant membrane response are shown to sustain the oscillatory motion of the vortex pair. Understanding how the key mechanisms in vortex-body interactions are guided by the structural properties of the body is a prerequisite to exploiting these mechanisms.
Cortes, Adriano M.; Dalcin, Lisandro; Sarmiento, Adel F.; ...
2016-10-19
The recently introduced divergence-conforming B-spline discretizations allow the construction of smooth discrete velocity–pressure pairs for viscous incompressible flows that are at the same time inf–sup stable and pointwise divergence-free. When applied to the discretized Stokes problem, these spaces generate a symmetric and indefinite saddle-point linear system. The iterative method of choice to solve such system is the Generalized Minimum Residual Method. This method lacks robustness, and one remedy is to use preconditioners. For linear systems of saddle-point type, a large family of preconditioners can be obtained by using a block factorization of the system. In this paper, we show howmore » the nesting of “black-box” solvers and preconditioners can be put together in a block triangular strategy to build a scalable block preconditioner for the Stokes system discretized by divergence-conforming B-splines. Lastly, besides the known cavity flow problem, we used for benchmark flows defined on complex geometries: an eccentric annulus and hollow torus of an eccentric annular cross-section.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heroux, Michael Allen
2004-07-01
The Trilinos{trademark} Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. AztecOO{trademark} is a package within Trilinos that enables the use of the Aztec solver library [19] with Epetra{trademark} [13] objects. AztecOO provides access to Aztec preconditioners and solvers by implementing the Aztec 'matrix-free' interface using Epetra. While Aztec is written in C and procedure-oriented, AztecOO is written in C++ and is object-oriented. In addition to providing access to Aztec capabilities, AztecOO also provides some signficant new functionality. In particular it provides an extensible status testing capability that allows expression of sophisticatedmore » stopping criteria as is needed in production use of iterative solvers. AztecOO also provides mechanisms for using Ifpack [2], ML [20] and AztecOO itself as preconditioners.« less
Computerized Business Calculus Using Calculators, Examples from Mathematics to Finance.
ERIC Educational Resources Information Center
Vest, Floyd
1991-01-01
After discussing the role of supercalculators within the business calculus curriculum, several examples are presented which allow the reader to examine the capabilities and codes of calculators specific to different major manufacturers. The topics examined include annuities, Newton's method, fixed point iteration, graphing, solvers, and…
Divergence-Free SPH for Incompressible and Viscous Fluids.
Bender, Jan; Koschier, Dan
2017-03-01
In this paper we present a novel Smoothed Particle Hydrodynamics (SPH) method for the efficient and stable simulation of incompressible fluids. The most efficient SPH-based approaches enforce incompressibility either on position or velocity level. However, the continuity equation for incompressible flow demands to maintain a constant density and a divergence-free velocity field. We propose a combination of two novel implicit pressure solvers enforcing both a low volume compression as well as a divergence-free velocity field. While a compression-free fluid is essential for realistic physical behavior, a divergence-free velocity field drastically reduces the number of required solver iterations and increases the stability of the simulation significantly. Thanks to the improved stability, our method can handle larger time steps than previous approaches. This results in a substantial performance gain since the computationally expensive neighborhood search has to be performed less frequently. Moreover, we introduce a third optional implicit solver to simulate highly viscous fluids which seamlessly integrates into our solver framework. Our implicit viscosity solver produces realistic results while introducing almost no numerical damping. We demonstrate the efficiency, robustness and scalability of our method in a variety of complex simulations including scenarios with millions of turbulent particles or highly viscous materials.
Computed inverse resonance imaging for magnetic susceptibility map reconstruction.
Chen, Zikuan; Calhoun, Vince
2012-01-01
This article reports a computed inverse magnetic resonance imaging (CIMRI) model for reconstructing the magnetic susceptibility source from MRI data using a 2-step computational approach. The forward T2*-weighted MRI (T2*MRI) process is broken down into 2 steps: (1) from magnetic susceptibility source to field map establishment via magnetization in the main field and (2) from field map to MR image formation by intravoxel dephasing average. The proposed CIMRI model includes 2 inverse steps to reverse the T2*MRI procedure: field map calculation from MR-phase image and susceptibility source calculation from the field map. The inverse step from field map to susceptibility map is a 3-dimensional ill-posed deconvolution problem, which can be solved with 3 kinds of approaches: the Tikhonov-regularized matrix inverse, inverse filtering with a truncated filter, and total variation (TV) iteration. By numerical simulation, we validate the CIMRI model by comparing the reconstructed susceptibility maps for a predefined susceptibility source. Numerical simulations of CIMRI show that the split Bregman TV iteration solver can reconstruct the susceptibility map from an MR-phase image with high fidelity (spatial correlation ≈ 0.99). The split Bregman TV iteration solver includes noise reduction, edge preservation, and image energy conservation. For applications to brain susceptibility reconstruction, it is important to calibrate the TV iteration program by selecting suitable values of the regularization parameter. The proposed CIMRI model can reconstruct the magnetic susceptibility source of T2*MRI by 2 computational steps: calculating the field map from the phase image and reconstructing the susceptibility map from the field map. The crux of CIMRI lies in an ill-posed 3-dimensional deconvolution problem, which can be effectively solved by the split Bregman TV iteration algorithm.
Computed inverse MRI for magnetic susceptibility map reconstruction
Chen, Zikuan; Calhoun, Vince
2015-01-01
Objective This paper reports on a computed inverse magnetic resonance imaging (CIMRI) model for reconstructing the magnetic susceptibility source from MRI data using a two-step computational approach. Methods The forward T2*-weighted MRI (T2*MRI) process is decomposed into two steps: 1) from magnetic susceptibility source to fieldmap establishment via magnetization in a main field, and 2) from fieldmap to MR image formation by intravoxel dephasing average. The proposed CIMRI model includes two inverse steps to reverse the T2*MRI procedure: fieldmap calculation from MR phase image and susceptibility source calculation from the fieldmap. The inverse step from fieldmap to susceptibility map is a 3D ill-posed deconvolution problem, which can be solved by three kinds of approaches: Tikhonov-regularized matrix inverse, inverse filtering with a truncated filter, and total variation (TV) iteration. By numerical simulation, we validate the CIMRI model by comparing the reconstructed susceptibility maps for a predefined susceptibility source. Results Numerical simulations of CIMRI show that the split Bregman TV iteration solver can reconstruct the susceptibility map from a MR phase image with high fidelity (spatial correlation≈0.99). The split Bregman TV iteration solver includes noise reduction, edge preservation, and image energy conservation. For applications to brain susceptibility reconstruction, it is important to calibrate the TV iteration program by selecting suitable values of the regularization parameter. Conclusions The proposed CIMRI model can reconstruct the magnetic susceptibility source of T2*MRI by two computational steps: calculating the fieldmap from the phase image and reconstructing the susceptibility map from the fieldmap. The crux of CIMRI lies in an ill-posed 3D deconvolution problem, which can be effectively solved by the split Bregman TV iteration algorithm. PMID:22446372
An immersed boundary method for fluid-structure interaction with compressible multiphase flows
NASA Astrophysics Data System (ADS)
Wang, Li; Currao, Gaetano M. D.; Han, Feng; Neely, Andrew J.; Young, John; Tian, Fang-Bao
2017-10-01
This paper presents a two-dimensional immersed boundary method for fluid-structure interaction with compressible multiphase flows involving large structure deformations. This method involves three important parts: flow solver, structure solver and fluid-structure interaction coupling. In the flow solver, the compressible multiphase Navier-Stokes equations for ideal gases are solved by a finite difference method based on a staggered Cartesian mesh, where a fifth-order accuracy Weighted Essentially Non-Oscillation (WENO) scheme is used to handle spatial discretization of the convective term, a fourth-order central difference scheme is employed to discretize the viscous term, the third-order TVD Runge-Kutta scheme is used to discretize the temporal term, and the level-set method is adopted to capture the multi-material interface. In this work, the structure considered is a geometrically non-linear beam which is solved by using a finite element method based on the absolute nodal coordinate formulation (ANCF). The fluid dynamics and the structure motion are coupled in a partitioned iterative manner with a feedback penalty immersed boundary method where the flow dynamics is defined on a fixed Lagrangian grid and the structure dynamics is described on a global coordinate. We perform several validation cases (including fluid over a cylinder, structure dynamics, flow induced vibration of a flexible plate, deformation of a flexible panel induced by shock waves in a shock tube, an inclined flexible plate in a hypersonic flow, and shock-induced collapse of a cylindrical helium cavity in the air), and compare the results with experimental and other numerical data. The present results agree well with the published data and the current experiment. Finally, we further demonstrate the versatility of the present method by applying it to a flexible plate interacting with multiphase flows.
Detwiler, R.L.; Mehl, S.; Rajaram, H.; Cheung, W.W.
2002-01-01
Numerical solution of large-scale ground water flow and transport problems is often constrained by the convergence behavior of the iterative solvers used to solve the resulting systems of equations. We demonstrate the ability of an algebraic multigrid algorithm (AMG) to efficiently solve the large, sparse systems of equations that result from computational models of ground water flow and transport in large and complex domains. Unlike geometric multigrid methods, this algorithm is applicable to problems in complex flow geometries, such as those encountered in pore-scale modeling of two-phase flow and transport. We integrated AMG into MODFLOW 2000 to compare two- and three-dimensional flow simulations using AMG to simulations using PCG2, a preconditioned conjugate gradient solver that uses the modified incomplete Cholesky preconditioner and is included with MODFLOW 2000. CPU times required for convergence with AMG were up to 140 times faster than those for PCG2. The cost of this increased speed was up to a nine-fold increase in required random access memory (RAM) for the three-dimensional problems and up to a four-fold increase in required RAM for the two-dimensional problems. We also compared two-dimensional numerical simulations of steady-state transport using AMG and the generalized minimum residual method with an incomplete LU-decomposition preconditioner. For these transport simulations, AMG yielded increased speeds of up to 17 times with only a 20% increase in required RAM. The ability of AMG to solve flow and transport problems in large, complex flow systems and its ready availability make it an ideal solver for use in both field-scale and pore-scale modeling.
Hill, Mary C.
1990-01-01
This report documents PCG2 : a numerical code to be used with the U.S. Geological Survey modular three-dimensional, finite-difference, ground-water flow model . PCG2 uses the preconditioned conjugate-gradient method to solve the equations produced by the model for hydraulic head. Linear or nonlinear flow conditions may be simulated. PCG2 includes two reconditioning options : modified incomplete Cholesky preconditioning, which is efficient on scalar computers; and polynomial preconditioning, which requires less computer storage and, with modifications that depend on the computer used, is most efficient on vector computers . Convergence of the solver is determined using both head-change and residual criteria. Nonlinear problems are solved using Picard iterations. This documentation provides a description of the preconditioned conjugate gradient method and the two preconditioners, detailed instructions for linking PCG2 to the modular model, sample data inputs, a brief description of PCG2, and a FORTRAN listing.
Online learning in optical tomography: a stochastic approach
NASA Astrophysics Data System (ADS)
Chen, Ke; Li, Qin; Liu, Jian-Guo
2018-07-01
We study the inverse problem of radiative transfer equation (RTE) using stochastic gradient descent method (SGD) in this paper. Mathematically, optical tomography amounts to recovering the optical parameters in RTE using the incoming–outgoing pair of light intensity. We formulate it as a PDE-constraint optimization problem, where the mismatch of computed and measured outgoing data is minimized with same initial data and RTE constraint. The memory and computation cost it requires, however, is typically prohibitive, especially in high dimensional space. Smart iterative solvers that only use partial information in each step is called for thereafter. Stochastic gradient descent method is an online learning algorithm that randomly selects data for minimizing the mismatch. It requires minimum memory and computation, and advances fast, therefore perfectly serves the purpose. In this paper we formulate the problem, in both nonlinear and its linearized setting, apply SGD algorithm and analyze the convergence performance.
A fast, preconditioned conjugate gradient Toeplitz solver
NASA Technical Reports Server (NTRS)
Pan, Victor; Schrieber, Robert
1989-01-01
A simple factorization is given of an arbitrary hermitian, positive definite matrix in which the factors are well-conditioned, hermitian, and positive definite. In fact, given knowledge of the extreme eigenvalues of the original matrix A, an optimal improvement can be achieved, making the condition numbers of each of the two factors equal to the square root of the condition number of A. This technique is to applied to the solution of hermitian, positive definite Toeplitz systems. Large linear systems with hermitian, positive definite Toeplitz matrices arise in some signal processing applications. A stable fast algorithm is given for solving these systems that is based on the preconditioned conjugate gradient method. The algorithm exploits Toeplitz structure to reduce the cost of an iteration to O(n log n) by applying the fast Fourier Transform to compute matrix-vector products. Matrix factorization is used as a preconditioner.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Renke; Jin, Shuangshuang; Chen, Yousu
This paper presents a faster-than-real-time dynamic simulation software package that is designed for large-size power system dynamic simulation. It was developed on the GridPACKTM high-performance computing (HPC) framework. The key features of the developed software package include (1) faster-than-real-time dynamic simulation for a WECC system (17,000 buses) with different types of detailed generator, controller, and relay dynamic models, (2) a decoupled parallel dynamic simulation algorithm with optimized computation architecture to better leverage HPC resources and technologies, (3) options for HPC-based linear and iterative solvers, (4) hidden HPC details, such as data communication and distribution, to enable development centered on mathematicalmore » models and algorithms rather than on computational details for power system researchers, and (5) easy integration of new dynamic models and related algorithms into the software package.« less
Final report for “Extreme-scale Algorithms and Solver Resilience”
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, William Douglas
2017-06-30
This is a joint project with principal investigators at Oak Ridge National Laboratory, Sandia National Laboratories, the University of California at Berkeley, and the University of Tennessee. Our part of the project involves developing performance models for highly scalable algorithms and the development of latency tolerant iterative methods. During this project, we extended our performance models for the Multigrid method for solving large systems of linear equations and conducted experiments with highly scalable variants of conjugate gradient methods that avoid blocking synchronization. In addition, we worked with the other members of the project on alternative techniques for resilience and reproducibility.more » We also presented an alternative approach for reproducible dot-products in parallel computations that performs almost as well as the conventional approach by separating the order of computation from the details of the decomposition of vectors across the processes.« less
Bowen-York trumpet data and black-hole simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hannam, Mark; Murchadha, Niall O; Husa, Sascha
2009-12-15
The most popular method to construct initial data for black-hole-binary simulations is the puncture method, in which compactified wormholes are given linear and angular momentum via the Bowen-York extrinsic curvature. When these data are evolved, they quickly approach a trumpet topology, suggesting that it would be preferable to use data that are in trumpet form from the outset. To achieve this, we extend the puncture method to allow the construction of Bowen-York trumpets, including an outline of an existence and uniqueness proof of the solutions. We construct boosted, spinning and binary Bowen-York puncture trumpets using a single-domain pseudospectral elliptic solver,more » and evolve the binary data and compare with standard wormhole-data results. We also show that for boosted trumpets the black-hole mass can be prescribed a priori, without recourse to the iterative procedure that is necessary for wormhole data.« less
Multigrid approaches to non-linear diffusion problems on unstructured meshes
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Bushnell, Dennis M. (Technical Monitor)
2001-01-01
The efficiency of three multigrid methods for solving highly non-linear diffusion problems on two-dimensional unstructured meshes is examined. The three multigrid methods differ mainly in the manner in which the nonlinearities of the governing equations are handled. These comprise a non-linear full approximation storage (FAS) multigrid method which is used to solve the non-linear equations directly, a linear multigrid method which is used to solve the linear system arising from a Newton linearization of the non-linear system, and a hybrid scheme which is based on a non-linear FAS multigrid scheme, but employs a linear solver on each level as a smoother. Results indicate that all methods are equally effective at converging the non-linear residual in a given number of grid sweeps, but that the linear solver is more efficient in cpu time due to the lower cost of linear versus non-linear grid sweeps.
TH-AB-BRA-09: Stability Analysis of a Novel Dose Calculation Algorithm for MRI Guided Radiotherapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zelyak, O; Fallone, B; Cross Cancer Institute, Edmonton, AB
2016-06-15
Purpose: To determine the iterative deterministic solution stability of the Linear Boltzmann Transport Equation (LBTE) in the presence of magnetic fields. Methods: The LBTE with magnetic fields under investigation is derived using a discrete ordinates approach. The stability analysis is performed using analytical and numerical methods. Analytically, the spectral Fourier analysis is used to obtain the convergence rate of the source iteration procedures based on finding the largest eigenvalue of the iterative operator. This eigenvalue is a function of relevant physical parameters, such as magnetic field strength and material properties, and provides essential information about the domain of applicability requiredmore » for clinically optimal parameter selection and maximum speed of convergence. The analytical results are reinforced by numerical simulations performed using the same discrete ordinates method in angle, and a discontinuous finite element spatial approach. Results: The spectral radius for the source iteration technique of the time independent transport equation with isotropic and anisotropic scattering centers inside infinite 3D medium is equal to the ratio of differential and total cross sections. The result is confirmed numerically by solving LBTE and is in full agreement with previously published results. The addition of magnetic field reveals that the convergence becomes dependent on the strength of magnetic field, the energy group discretization, and the order of anisotropic expansion. Conclusion: The source iteration technique for solving the LBTE with magnetic fields with the discrete ordinates method leads to divergent solutions in the limiting cases of small energy discretizations and high magnetic field strengths. Future investigations into non-stationary Krylov subspace techniques as an iterative solver will be performed as this has been shown to produce greater stability than source iteration. Furthermore, a stability analysis of a discontinuous finite element space-angle approach (which has been shown to provide the greatest stability) will also be investigated. Dr. B Gino Fallone is a co-founder and CEO of MagnetTx Oncology Solutions (under discussions to license Alberta bi-planar linac MR for commercialization)« less
Highly Productive Application Development with ViennaCL for Accelerators
NASA Astrophysics Data System (ADS)
Rupp, K.; Weinbub, J.; Rudolf, F.
2012-12-01
The use of graphics processing units (GPUs) for the acceleration of general purpose computations has become very attractive over the last years, and accelerators based on many integrated CPU cores are about to hit the market. However, there are discussions about the benefit of GPU computing when comparing the reduction of execution times with the increased development effort [1]. To counter these concerns, our open-source linear algebra library ViennaCL [2,3] uses modern programming techniques such as generic programming in order to provide a convenient access layer for accelerator and GPU computing. Other GPU-accelerated libraries are primarily tuned for performance, but less tailored to productivity and portability: MAGMA [4] provides dense linear algebra operations via a LAPACK-comparable interface, but no dedicated matrix and vector types. Cusp [5] is closest in functionality to ViennaCL for sparse matrices, but is based on CUDA and thus restricted to devices from NVIDIA. However, no convenience layer for dense linear algebra is provided with Cusp. ViennaCL is written in C++ and uses OpenCL to access the resources of accelerators, GPUs and multi-core CPUs in a unified way. On the one hand, the library provides iterative solvers from the family of Krylov methods, including various preconditioners, for the solution of linear systems typically obtained from the discretization of partial differential equations. On the other hand, dense linear algebra operations are supported, including algorithms such as QR factorization and singular value decomposition. The user application interface of ViennaCL is compatible to uBLAS [6], which is part of the peer-reviewed Boost C++ libraries [7]. This allows to port existing applications based on uBLAS with a minimum of effort to ViennaCL. Conversely, the interface compatibility allows to use the iterative solvers from ViennaCL with uBLAS types directly, thus enabling code reuse beyond CPU-GPU boundaries. Out-of-the-box support for types from the Eigen library [8] and MTL 4 [9] are provided as well, enabling a seamless transition from single-core CPU to GPU and multi-core CPU computations. Case studies from the numerical solution of PDEs are given and isolated performance benchmarks are discussed. Also, pitfalls in scientific computing with GPUs and accelerators are addressed, allowing for a first evaluation of whether these novel devices can be mapped well to certain applications. References: [1] R. Bordawekar et al., Technical Report, IBM, 2010 [2] ViennaCL library. Online: http://viennacl.sourceforge.net/ [3] K. Rupp et al., GPUScA, 2010 [4] MAGMA library. Online: http://icl.cs.utk.edu/magma/ [5] Cusp library. Online: http://code.google.com/p/cusp-library/ [6] uBLAS library. Online: http://www.boost.org/libs/numeric/ublas/ [7] Boost C++ Libraries. Online: http://www.boost.org/ [8] Eigen library. Online: http://eigen.tuxfamily.org/ [9] MTL 4 Library. Online: http://www.mtl4.org/
Linear solver performance in elastoplastic problem solution on GPU cluster
NASA Astrophysics Data System (ADS)
Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.
2017-12-01
Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.
NASA Technical Reports Server (NTRS)
Chang, S. C.
1984-01-01
Generally, fast direct solvers are not directly applicable to a nonseparable elliptic partial differential equation. This limitation, however, is circumvented by a semi-direct procedure, i.e., an iterative procedure using fast direct solvers. An efficient semi-direct procedure which is easy to implement and applicable to a variety of boundary conditions is presented. The current procedure also possesses other highly desirable properties, i.e.: (1) the convergence rate does not decrease with an increase of grid cell aspect ratio, and (2) the convergence rate is estimated using the coefficients of the partial differential equation being solved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yeung, Yu-Hong; Pothen, Alex; Halappanavar, Mahantesh
We present an augmented matrix approach to update the solution to a linear system of equations when the coefficient matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to performmore » $N-x$ contingency analysis, i.e., determine the state of the system when up to $x$ links from $N$ fail. Our algorithms augment the coefficient matrix to account for the changes in it, and then compute the solution to the augmented system without refactoring the modified matrix. We provide two algorithms, a direct method, and a hybrid direct-iterative method for solving the augmented system. We also exploit the sparsity of the matrices and vectors to accelerate the overall computation. Our algorithms are compared on three power grids with PARDISO, a parallel direct solver, and CHOLMOD, a direct solver with the ability to modify the Cholesky factors of the coefficient matrix. We show that our augmented algorithms outperform PARDISO (by two orders of magnitude), and CHOLMOD (by a factor of up to 5). Further, our algorithms scale better than CHOLMOD as the number of elements updated increases. The solutions are computed with high accuracy. Our algorithms are capable of computing $N-x$ contingency analysis on a $778K$ bus grid, updating a solution with $x=20$ elements in $$1.6 \\times 10^{-2}$$ seconds on an Intel Xeon processor.« less
Higher-order ice-sheet modelling accelerated by multigrid on graphics cards
NASA Astrophysics Data System (ADS)
Brædstrup, Christian; Egholm, David
2013-04-01
Higher-order ice flow modelling is a very computer intensive process owing primarily to the nonlinear influence of the horizontal stress coupling. When applied for simulating long-term glacial landscape evolution, the ice-sheet models must consider very long time series, while both high temporal and spatial resolution is needed to resolve small effects. The use of higher-order and full stokes models have therefore seen very limited usage in this field. However, recent advances in graphics card (GPU) technology for high performance computing have proven extremely efficient in accelerating many large-scale scientific computations. The general purpose GPU (GPGPU) technology is cheap, has a low power consumption and fits into a normal desktop computer. It could therefore provide a powerful tool for many glaciologists working on ice flow models. Our current research focuses on utilising the GPU as a tool in ice-sheet and glacier modelling. To this extent we have implemented the Integrated Second-Order Shallow Ice Approximation (iSOSIA) equations on the device using the finite difference method. To accelerate the computations, the GPU solver uses a non-linear Red-Black Gauss-Seidel iterator coupled with a Full Approximation Scheme (FAS) multigrid setup to further aid convergence. The GPU finite difference implementation provides the inherent parallelization that scales from hundreds to several thousands of cores on newer cards. We demonstrate the efficiency of the GPU multigrid solver using benchmark experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Xujun; Li, Jiyuan; Jiang, Xikai
An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Zhao, Xujun; Li, Jiyuan; Jiang, Xikai; ...
2017-06-29
An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
NASA Astrophysics Data System (ADS)
Moghaderi, Hamid; Dehghan, Mehdi; Donatelli, Marco; Mazza, Mariarosa
2017-12-01
Fractional diffusion equations (FDEs) are a mathematical tool used for describing some special diffusion phenomena arising in many different applications like porous media and computational finance. In this paper, we focus on a two-dimensional space-FDE problem discretized by means of a second order finite difference scheme obtained as combination of the Crank-Nicolson scheme and the so-called weighted and shifted Grünwald formula. By fully exploiting the Toeplitz-like structure of the resulting linear system, we provide a detailed spectral analysis of the coefficient matrix at each time step, both in the case of constant and variable diffusion coefficients. Such a spectral analysis has a very crucial role, since it can be used for designing fast and robust iterative solvers. In particular, we employ the obtained spectral information to define a Galerkin multigrid method based on the classical linear interpolation as grid transfer operator and damped-Jacobi as smoother, and to prove the linear convergence rate of the corresponding two-grid method. The theoretical analysis suggests that the proposed grid transfer operator is strong enough for working also with the V-cycle method and the geometric multigrid. On this basis, we introduce two computationally favourable variants of the proposed multigrid method and we use them as preconditioners for Krylov methods. Several numerical results confirm that the resulting preconditioning strategies still keep a linear convergence rate.
LEOPARD: A grid-based dispersion relation solver for arbitrary gyrotropic distributions
NASA Astrophysics Data System (ADS)
Astfalk, Patrick; Jenko, Frank
2017-01-01
Particle velocity distributions measured in collisionless space plasmas often show strong deviations from idealized model distributions. Despite this observational evidence, linear wave analysis in space plasma environments such as the solar wind or Earth's magnetosphere is still mainly carried out using dispersion relation solvers based on Maxwellians or other parametric models. To enable a more realistic analysis, we present the new grid-based kinetic dispersion relation solver LEOPARD (Linear Electromagnetic Oscillations in Plasmas with Arbitrary Rotationally-symmetric Distributions) which no longer requires prescribed model distributions but allows for arbitrary gyrotropic distribution functions. In this work, we discuss the underlying numerical scheme of the code and we show a few exemplary benchmarks. Furthermore, we demonstrate a first application of LEOPARD to ion distribution data obtained from hybrid simulations. In particular, we show that in the saturation stage of the parallel fire hose instability, the deformation of the initial bi-Maxwellian distribution invalidates the use of standard dispersion relation solvers. A linear solver based on bi-Maxwellians predicts further growth even after saturation, while LEOPARD correctly indicates vanishing growth rates. We also discuss how this complies with former studies on the validity of quasilinear theory for the resonant fire hose. In the end, we briefly comment on the role of LEOPARD in directly analyzing spacecraft data, and we refer to an upcoming paper which demonstrates a first application of that kind.
Mixed-Integer Conic Linear Programming: Challenges and Perspectives
2013-10-01
The novel DCCs for MISOCO may be used in branch- and-cut algorithms when solving MISOCO problems. The experimental software CICLO was developed to...perform limited, but rigorous computational experiments. The CICLO solver utilizes continuous SOCO solvers, MOSEK, CPLES or SeDuMi, builds on the open...submitted Fall 2013. Software: 1. CICLO : Integer conic linear optimization package. Authors: J.C. Góez, T.K. Ralphs, Y. Fu, and T. Terlaky
On differential operators generating iterative systems of linear ODEs of maximal symmetry algebra
NASA Astrophysics Data System (ADS)
Ndogmo, J. C.
2017-06-01
Although every iterative scalar linear ordinary differential equation is of maximal symmetry algebra, the situation is different and far more complex for systems of linear ordinary differential equations, and an iterative system of linear equations need not be of maximal symmetry algebra. We illustrate these facts by examples and derive families of vector differential operators whose iterations are all linear systems of equations of maximal symmetry algebra. Some consequences of these results are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
Marine Controlled-Source Electromagnetic 2D Inversion for synthetic models.
NASA Astrophysics Data System (ADS)
Liu, Y.; Li, Y.
2016-12-01
We present a 2D inverse algorithm for frequency domain marine controlled-source electromagnetic (CSEM) data, which is based on the regularized Gauss-Newton approach. As a forward solver, our parallel adaptive finite element forward modeling program is employed. It is a self-adaptive, goal-oriented grid refinement algorithm in which a finite element analysis is performed on a sequence of refined meshes. The mesh refinement process is guided by a dual error estimate weighting to bias refinement towards elements that affect the solution at the EM receiver locations. With the use of the direct solver (MUMPS), we can effectively compute the electromagnetic fields for multi-sources and parametric sensitivities. We also implement the parallel data domain decomposition approach of Key and Ovall (2011), with the goal of being able to compute accurate responses in parallel for complicated models and a full suite of data parameters typical of offshore CSEM surveys. All minimizations are carried out by using the Gauss-Newton algorithm and model perturbations at each iteration step are obtained by using the Inexact Conjugate Gradient iteration method. Synthetic test inversions are presented.
Scalable Nonlinear Solvers for Fully Implicit Coupled Nuclear Fuel Modeling. Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cai, Xiao-Chuan; Keyes, David; Yang, Chao
2014-09-29
The focus of the project is on the development and customization of some highly scalable domain decomposition based preconditioning techniques for the numerical solution of nonlinear, coupled systems of partial differential equations (PDEs) arising from nuclear fuel simulations. These high-order PDEs represent multiple interacting physical fields (for example, heat conduction, oxygen transport, solid deformation), each is modeled by a certain type of Cahn-Hilliard and/or Allen-Cahn equations. Most existing approaches involve a careful splitting of the fields and the use of field-by-field iterations to obtain a solution of the coupled problem. Such approaches have many advantages such as ease of implementationmore » since only single field solvers are needed, but also exhibit disadvantages. For example, certain nonlinear interactions between the fields may not be fully captured, and for unsteady problems, stable time integration schemes are difficult to design. In addition, when implemented on large scale parallel computers, the sequential nature of the field-by-field iterations substantially reduces the parallel efficiency. To overcome the disadvantages, fully coupled approaches have been investigated in order to obtain full physics simulations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schunert, Sebastian; Wang, Congjian; Wang, Yaqi
Rattlesnake and MAMMOTH are the designated TREAT analysis tools currently being developed at the Idaho National Laboratory. Concurrent with development of the multi-physics, multi-scale capabilities, sensitivity analysis and uncertainty quantification (SA/UQ) capabilities are required for predicitive modeling of the TREAT reactor. For steady-state SA/UQ, that is essential for setting initial conditions for the transients, generalized perturbation theory (GPT) will be used. This work describes the implementation of a PETSc based solver for the generalized adjoint equations that constitute a inhomogeneous, rank deficient problem. The standard approach is to use an outer iteration strategy with repeated removal of the fundamental modemore » contamination. The described GPT algorithm directly solves the GPT equations without the need of an outer iteration procedure by using Krylov subspaces that are orthogonal to the operator’s nullspace. Three test problems are solved and provide sufficient verification for the Rattlesnake’s GPT capability. We conclude with a preliminary example evaluating the impact of the Boron distribution in the TREAT reactor using perturbation theory.« less
NASA Astrophysics Data System (ADS)
Ansari, S. M.; Farquharson, C. G.; MacLachlan, S. P.
2017-07-01
In this paper, a new finite-element solution to the potential formulation of the geophysical electromagnetic (EM) problem that explicitly implements the Coulomb gauge, and that accurately computes the potentials and hence inductive and galvanic components, is proposed. The modelling scheme is based on using unstructured tetrahedral meshes for domain subdivision, which enables both realistic Earth models of complex geometries to be considered and efficient spatially variable refinement of the mesh to be done. For the finite-element discretization edge and nodal elements are used for approximating the vector and scalar potentials respectively. The issue of non-unique, incorrect potentials from the numerical solution of the usual incomplete-gauged potential system is demonstrated for a benchmark model from the literature that uses an electric-type EM source, through investigating the interface continuity conditions for both the normal and tangential components of the potential vectors, and by showing inconsistent results obtained from iterative and direct linear equation solvers. By explicitly introducing the Coulomb gauge condition as an extra equation, and by augmenting the Helmholtz equation with the gradient of a Lagrange multiplier, an explicitly gauged system for the potential formulation is formed. The solution to the discretized form of this system is validated for the above-mentioned example and for another classic example that uses a magnetic EM source. In order to stabilize the iterative solution of the gauged system, a block diagonal pre-conditioning scheme that is based upon the Schur complement of the potential system is used. For all examples, both the iterative and direct solvers produce the same responses for the potentials, demonstrating the uniqueness of the numerical solution for the potentials and fixing the problems with the interface conditions between cells observed for the incomplete-gauged system. These solutions of the gauged system also produce the physically anticipated behaviours for the inductive and galvanic components of the electric field. For a realistic geophysical scenario, the gauged scheme is also used to synthesize the magnetic field response of a model of the Ovoid ore deposit at Voisey's Bay, Labrador, Canada. The results are in good agreement with the helicopter-borne EM data from the real survey, and the inductive and galvanic parts of the current density show expected behaviours.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The newmore » parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.« less
NASA Astrophysics Data System (ADS)
Al-Chalabi, Rifat M. Khalil
1997-09-01
Development of an improvement to the computational efficiency of the existing nested iterative solution strategy of the Nodal Exapansion Method (NEM) nodal based neutron diffusion code NESTLE is presented. The improvement in the solution strategy is the result of developing a multilevel acceleration scheme that does not suffer from the numerical stalling associated with a number of iterative solution methods. The acceleration scheme is based on the multigrid method, which is specifically adapted for incorporation into the NEM nonlinear iterative strategy. This scheme optimizes the computational interplay between the spatial discretization and the NEM nonlinear iterative solution process through the use of the multigrid method. The combination of the NEM nodal method, calculation of the homogenized, neutron nodal balance coefficients (i.e. restriction operator), efficient underlying smoothing algorithm (power method of NESTLE), and the finer mesh reconstruction algorithm (i.e. prolongation operator), all operating on a sequence of coarser spatial nodes, constitutes the multilevel acceleration scheme employed in this research. Two implementations of the multigrid method into the NESTLE code were examined; the Imbedded NEM Strategy and the Imbedded CMFD Strategy. The main difference in implementation between the two methods is that in the Imbedded NEM Strategy, the NEM solution is required at every MG level. Numerical tests have shown that the Imbedded NEM Strategy suffers from divergence at coarse- grid levels, hence all the results for the different benchmarks presented here were obtained using the Imbedded CMFD Strategy. The novelties in the developed MG method are as follows: the formulation of the restriction and prolongation operators, and the selection of the relaxation method. The restriction operator utilizes a variation of the reactor physics, consistent homogenization technique. The prolongation operator is based upon a variant of the pin power reconstruction methodology. The relaxation method, which is the power method, utilizes a constant coefficient matrix within the NEM non-linear iterative strategy. The choice of the MG nesting within the nested iterative strategy enables the incorporation of other non-linear effects with no additional coding effort. In addition, if an eigenvalue problem is being solved, it remains an eigenvalue problem at all grid levels, simplifying coding implementation. The merit of the developed MG method was tested by incorporating it into the NESTLE iterative solver, and employing it to solve four different benchmark problems. In addition to the base cases, three different sensitivity studies are performed, examining the effects of number of MG levels, homogenized coupling coefficients correction (i.e. restriction operator), and fine-mesh reconstruction algorithm (i.e. prolongation operator). The multilevel acceleration scheme developed in this research provides the foundation for developing adaptive multilevel acceleration methods for steady-state and transient NEM nodal neutron diffusion equations. (Abstract shortened by UMI.)
NASA Astrophysics Data System (ADS)
Dhruv, Akash; Blower, Christopher; Wickenheiser, Adam M.
2015-03-01
The ability of UAVs to operate in complex and hostile environments makes them useful in military and civil operations concerning surveillance and reconnaissance. However, limitations in size of UAVs and communication delays prohibit their operation close to the ground and in cluttered environments, which increase risks associated with turbulence and wind gusts that cause trajectory deviations and potential loss of the vehicle. In the last decade, scientists and engineers have turned towards bio-inspiration to solve these issues by developing innovative flow control methods that offer better stability, controllability, and maneuverability. This paper presents an aerodynamic load solver for bio-inspired wings that consist of an array of feather-like flaps installed across the upper and lower surfaces in both the chord- and span-wise directions, mimicking the feathers of an avian wing. Each flap has the ability to rotate into both the wing body and the inbound airflow, generating complex flap configurations unobtainable by traditional wings that offer improved aerodynamic stability against gusting flows and turbulence. The solver discussed is an unsteady three-dimensional iterative doublet panel method with vortex particle wakes. This panel method models the wake-body interactions between multiple flaps effectively without the need to define specific wake geometries, thereby eliminating the need to manually model the wake for each configuration. To incorporate viscous flow characteristics, an iterative boundary layer theory is employed, modeling laminar, transitional and turbulent regions over the wing's surfaces, in addition to flow separation and reattachment locations. This technique enables the boundary layer to influence the wake strength and geometry both within the wing and aft of the trailing edge. The results obtained from this solver are validated using experimental data from a low-speed suction wind tunnel operating at Reynolds Number 300,000. This method enables fast and accurate assessment of aerodynamic loads for initial design of complex wing configurations compared to other methods available.
Electromagnetic scattering of large structures in layered earths using integral equations
NASA Astrophysics Data System (ADS)
Xiong, Zonghou; Tripp, Alan C.
1995-07-01
An electromagnetic scattering algorithm for large conductivity structures in stratified media has been developed and is based on the method of system iteration and spatial symmetry reduction using volume electric integral equations. The method of system iteration divides a structure into many substructures and solves the resulting matrix equation using a block iterative method. The block submatrices usually need to be stored on disk in order to save computer core memory. However, this requires a large disk for large structures. If the body is discretized into equal-size cells it is possible to use the spatial symmetry relations of the Green's functions to regenerate the scattering impedance matrix in each iteration, thus avoiding expensive disk storage. Numerical tests show that the system iteration converges much faster than the conventional point-wise Gauss-Seidel iterative method. The numbers of cells do not significantly affect the rate of convergency. Thus the algorithm effectively reduces the solution of the scattering problem to an order of O(N2), instead of O(N3) as with direct solvers.
Visualization of Unsteady Computational Fluid Dynamics
NASA Technical Reports Server (NTRS)
Haimes, Robert
1997-01-01
The current compute environment that most researchers are using for the calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a super-computer class machine. The Massively Parallel Processors (MPP's) such as the 160 node IBM SP2 at NAS and clusters of workstations acting as a single MPP (like NAS's SGI Power-Challenge array and the J90 cluster) provide the required computation bandwidth for CFD calculations of transient problems. If we follow the traditional computational analysis steps for CFD (and we wish to construct an interactive visualizer) we need to be aware of the following: (1) Disk space requirements. A single snap-shot must contain at least the values (primitive variables) stored at the appropriate locations within the mesh. For most simple 3D Euler solvers that means 5 floating point words. Navier-Stokes solutions with turbulence models may contain 7 state-variables. (2) Disk speed vs. Computational speeds. The time required to read the complete solution of a saved time frame from disk is now longer than the compute time for a set number of iterations from an explicit solver. Depending, on the hardware and solver an iteration of an implicit code may also take less time than reading the solution from disk. If one examines the performance improvements in the last decade or two, it is easy to see that depending on disk performance (vs. CPU improvement) may not be the best method for enhancing interactivity. (3) Cluster and Parallel Machine I/O problems. Disk access time is much worse within current parallel machines and cluster of workstations that are acting in concert to solve a single problem. In this case we are not trying to read the volume of data, but are running the solver and the solver outputs the solution. These traditional network interfaces must be used for the file system. (4) Numerics of particle traces. Most visualization tools can work upon a single snap shot of the data but some visualization tools for transient problems require dealing with time.
Fransson, Thomas; Saue, Trond; Norman, Patrick
2016-05-10
The influences of group 12 (Zn, Cd, Hg) metal-substitution on the valence spectra and phosphorescence parameters of porphyrins (P) have been investigated in a relativistic setting. In order to obtain valence spectra, this study reports the first application of the damped linear response function, or complex polarization propagator, in the four-component density functional theory framework [as formulated in Villaume et al. J. Chem. Phys. 2010 , 133 , 064105 ]. It is shown that the steep increase in the density of states as due to the inclusion of spin-orbit coupling yields only minor changes in overall computational costs involved with the solution of the set of linear response equations. Comparing single-frequency to multifrequency spectral calculations, it is noted that the number of iterations in the iterative linear equation solver per frequency grid-point decreases monotonously from 30 to 0.74 as the number of frequency points goes from one to 19. The main heavy-atom effect on the UV/vis-absorption spectra is indirect and attributed to the change of point group symmetry due to metal-substitution, and it is noted that substitutions using heavier atoms yield small red-shifts of the intense Soret-band. Concerning phosphorescence parameters, the adoption of a four-component relativistic setting enables the calculation of such properties at a linear order of response theory, and any higher-order response functions do not need to be considered-a real, conventional, form of linear response theory has been used for the calculation of these parameters. For the substituted porphyrins, electronic coupling between the lowest triplet states is strong and results in theoretical estimates of lifetimes that are sensitive to the wave function and electron density parametrization. With this in mind, we report our best estimates of the phosphorescence lifetimes to be 460, 13.8, 11.2, and 0.00155 s for H2P, ZnP, CdP, and HgP, respectively, with the corresponding transition energies being equal to 1.46, 1.50, 1.38, and 0.89 eV.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine
NASA Technical Reports Server (NTRS)
Turner, Mark G.; Topp, David A.
1998-01-01
A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
Signal-Preserving Erratic Noise Attenuation via Iterative Robust Sparsity-Promoting Filter
Zhao, Qiang; Du, Qizhen; Gong, Xufei; ...
2018-04-06
Sparse domain thresholding filters operating in a sparse domain are highly effective in removing Gaussian random noise under Gaussian distribution assumption. Erratic noise, which designates non-Gaussian noise that consists of large isolated events with known or unknown distribution, also needs to be explicitly taken into account. However, conventional sparse domain thresholding filters based on the least-squares (LS) criterion are severely sensitive to data with high-amplitude and non-Gaussian noise, i.e., the erratic noise, which makes the suppression of this type of noise extremely challenging. Here, in this paper, we present a robust sparsity-promoting denoising model, in which the LS criterion ismore » replaced by the Huber criterion to weaken the effects of erratic noise. The random and erratic noise is distinguished by using a data-adaptive parameter in the presented method, where random noise is described by mean square, while the erratic noise is downweighted through a damped weight. Different from conventional sparse domain thresholding filters, definition of the misfit between noisy data and recovered signal via the Huber criterion results in a nonlinear optimization problem. With the help of theoretical pseudoseismic data, an iterative robust sparsity-promoting filter is proposed to transform the nonlinear optimization problem into a linear LS problem through an iterative procedure. The main advantage of this transformation is that the nonlinear denoising filter can be solved by conventional LS solvers. Lastly, tests with several data sets demonstrate that the proposed denoising filter can successfully attenuate the erratic noise without damaging useful signal when compared with conventional denoising approaches based on the LS criterion.« less
Signal-Preserving Erratic Noise Attenuation via Iterative Robust Sparsity-Promoting Filter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Qiang; Du, Qizhen; Gong, Xufei
Sparse domain thresholding filters operating in a sparse domain are highly effective in removing Gaussian random noise under Gaussian distribution assumption. Erratic noise, which designates non-Gaussian noise that consists of large isolated events with known or unknown distribution, also needs to be explicitly taken into account. However, conventional sparse domain thresholding filters based on the least-squares (LS) criterion are severely sensitive to data with high-amplitude and non-Gaussian noise, i.e., the erratic noise, which makes the suppression of this type of noise extremely challenging. Here, in this paper, we present a robust sparsity-promoting denoising model, in which the LS criterion ismore » replaced by the Huber criterion to weaken the effects of erratic noise. The random and erratic noise is distinguished by using a data-adaptive parameter in the presented method, where random noise is described by mean square, while the erratic noise is downweighted through a damped weight. Different from conventional sparse domain thresholding filters, definition of the misfit between noisy data and recovered signal via the Huber criterion results in a nonlinear optimization problem. With the help of theoretical pseudoseismic data, an iterative robust sparsity-promoting filter is proposed to transform the nonlinear optimization problem into a linear LS problem through an iterative procedure. The main advantage of this transformation is that the nonlinear denoising filter can be solved by conventional LS solvers. Lastly, tests with several data sets demonstrate that the proposed denoising filter can successfully attenuate the erratic noise without damaging useful signal when compared with conventional denoising approaches based on the LS criterion.« less
Development of FullWave : Hot Plasma RF Simulation Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Kim, Jin-Soo; Spencer, J. Andrew; Zhao, Liangji; Galkin, Sergei
2017-10-01
Full wave simulation tool, modeling RF fields in hot inhomogeneous magnetized plasma, is being developed. The wave equations with linearized hot plasma dielectric response are solved in configuration space on adaptive cloud of computational points. The nonlocal hot plasma dielectric response is formulated in configuration space without limiting approximations by calculating the plasma conductivity kernel based on the solution of the linearized Vlasov equation in inhomogeneous magnetic field. This approach allows for better resolution of plasma resonances, antenna structures and complex boundaries. The formulation of FullWave and preliminary results will be presented: construction of the finite differences for approximation of derivatives on adaptive cloud of computational points; model and results of nonlocal conductivity kernel calculation in tokamak geometry; results of 2-D full wave simulations in the cold plasma model in tokamak geometry using the formulated approach; results of self-consistent calculations of hot plasma dielectric response and RF fields in 1-D mirror magnetic field; preliminary results of self-consistent simulations of 2-D RF fields in tokamak using the calculated hot plasma conductivity kernel; development of iterative solver for wave equations. Work is supported by the U.S. DOE SBIR program.
NASA Technical Reports Server (NTRS)
Day, Brad A.; Meade, Andrew J., Jr.
1993-01-01
A semi-discrete Galerkin (SDG) method is under development to model attached, turbulent, and compressible boundary layers for transonic airfoil analysis problems. For the boundary-layer formulation the method models the spatial variable normal to the surface with linear finite elements and the time-like variable with finite differences. A Dorodnitsyn transformed system of equations is used to bound the infinite spatial domain thereby providing high resolution near the wall and permitting the use of a uniform finite element grid which automatically follows boundary-layer growth. The second-order accurate Crank-Nicholson scheme is applied along with a linearization method to take advantage of the parabolic nature of the boundary-layer equations and generate a non-iterative marching routine. The SDG code can be applied to any smoothly-connected airfoil shape without modification and can be coupled to any inviscid flow solver. In this analysis, a direct viscous-inviscid interaction is accomplished between the Euler and boundary-layer codes through the application of a transpiration velocity boundary condition. Results are presented for compressible turbulent flow past RAE 2822 and NACA 0012 airfoils at various freestream Mach numbers, Reynolds numbers, and angles of attack.
NASA Technical Reports Server (NTRS)
Padovan, J.; Lackney, J.
1986-01-01
The current paper develops a constrained hierarchical least square nonlinear equation solver. The procedure can handle the response behavior of systems which possess indefinite tangent stiffness characteristics. Due to the generality of the scheme, this can be achieved at various hierarchical application levels. For instance, in the case of finite element simulations, various combinations of either degree of freedom, nodal, elemental, substructural, and global level iterations are possible. Overall, this enables a solution methodology which is highly stable and storage efficient. To demonstrate the capability of the constrained hierarchical least square methodology, benchmarking examples are presented which treat structure exhibiting highly nonlinear pre- and postbuckling behavior wherein several indefinite stiffness transitions occur.
TLNS3D/CDISC Multipoint Design of the TCA Concept
NASA Technical Reports Server (NTRS)
Campbell, Richard L.; Mann, Michael J.
1999-01-01
This paper presents the work done to date by the authors on developing an efficient approach to multipoint design and applying it to the design of the HSR TCA (High Speed Research Technology Concept Aircraft) configuration. While the title indicates that this exploratory study has been performed using the TLNS3DMB flow solver and the CDISC (Constrained Direct Iterative Surface Curvature) design method, the CDISC method could have been used with any flow solver, and the multipoint design approach does not require the use of CDISC. The goal of the study was to develop a multipoint design method that could achieve a design in about the same time as 10 analysis runs.
NASA Astrophysics Data System (ADS)
Krank, Benjamin; Fehn, Niklas; Wall, Wolfgang A.; Kronbichler, Martin
2017-11-01
We present an efficient discontinuous Galerkin scheme for simulation of the incompressible Navier-Stokes equations including laminar and turbulent flow. We consider a semi-explicit high-order velocity-correction method for time integration as well as nodal equal-order discretizations for velocity and pressure. The non-linear convective term is treated explicitly while a linear system is solved for the pressure Poisson equation and the viscous term. The key feature of our solver is a consistent penalty term reducing the local divergence error in order to overcome recently reported instabilities in spatially under-resolved high-Reynolds-number flows as well as small time steps. This penalty method is similar to the grad-div stabilization widely used in continuous finite elements. We further review and compare our method to several other techniques recently proposed in literature to stabilize the method for such flow configurations. The solver is specifically designed for large-scale computations through matrix-free linear solvers including efficient preconditioning strategies and tensor-product elements, which have allowed us to scale this code up to 34.4 billion degrees of freedom and 147,456 CPU cores. We validate our code and demonstrate optimal convergence rates with laminar flows present in a vortex problem and flow past a cylinder and show applicability of our solver to direct numerical simulation as well as implicit large-eddy simulation of turbulent channel flow at Reτ = 180 as well as 590.
IGA-ADS: Isogeometric analysis FEM using ADS solver
NASA Astrophysics Data System (ADS)
Łoś, Marcin M.; Woźniak, Maciej; Paszyński, Maciej; Lenharth, Andrew; Hassaan, Muhamm Amber; Pingali, Keshav
2017-08-01
In this paper we present a fast explicit solver for solution of non-stationary problems using L2 projections with isogeometric finite element method. The solver has been implemented within GALOIS framework. It enables parallel multi-core simulations of different time-dependent problems, in 1D, 2D, or 3D. We have prepared the solver framework in a way that enables direct implementation of the selected PDE and corresponding boundary conditions. In this paper we describe the installation, implementation of exemplary three PDEs, and execution of the simulations on multi-core Linux cluster nodes. We consider three case studies, including heat transfer, linear elasticity, as well as non-linear flow in heterogeneous media. The presented package generates output suitable for interfacing with Gnuplot and ParaView visualization software. The exemplary simulations show near perfect scalability on Gilbert shared-memory node with four Intel® Xeon® CPU E7-4860 processors, each possessing 10 physical cores (for a total of 40 cores).
A comparison of SuperLU solvers on the intel MIC architecture
NASA Astrophysics Data System (ADS)
Tuncel, Mehmet; Duran, Ahmet; Celebi, M. Serdar; Akaydin, Bora; Topkaya, Figen O.
2016-10-01
In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU_MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
Robust large-scale parallel nonlinear solvers for simulations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their usemore » in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.« less
User's Manual for PCSMS (Parallel Complex Sparse Matrix Solver). Version 1.
NASA Technical Reports Server (NTRS)
Reddy, C. J.
2000-01-01
PCSMS (Parallel Complex Sparse Matrix Solver) is a computer code written to make use of the existing real sparse direct solvers to solve complex, sparse matrix linear equations. PCSMS converts complex matrices into real matrices and use real, sparse direct matrix solvers to factor and solve the real matrices. The solution vector is reconverted to complex numbers. Though, this utility is written for Silicon Graphics (SGI) real sparse matrix solution routines, it is general in nature and can be easily modified to work with any real sparse matrix solver. The User's Manual is written to make the user acquainted with the installation and operation of the code. Driver routines are given to aid the users to integrate PCSMS routines in their own codes.
Application of PDSLin to the magnetic reconnection problem
NASA Astrophysics Data System (ADS)
Yuan, Xuefei; Li, Xiaoye S.; Yamazaki, Ichitaro; Jardin, Stephen C.; Koniges, Alice E.; Keyes, David E.
2013-01-01
Magnetic reconnection is a fundamental process in a magnetized plasma at both low and high magnetic Lundquist numbers (the ratio of the resistive diffusion time to the Alfvén wave transit time), which occurs in a wide variety of laboratory and space plasmas, e.g. magnetic fusion experiments, the solar corona and the Earth's magnetotail. An implicit time advance for the two-fluid magnetic reconnection problem is known to be difficult because of the large condition number of the associated matrix. This is especially troublesome when the collisionless ion skin depth is large so that the Whistler waves, which cause the fast reconnection, dominate the physics (Yuan et al 2012 J. Comput. Phys. 231 5822-53). For small system sizes, a direct solver such as SuperLU can be employed to obtain an accurate solution as long as the condition number is bounded by the reciprocal of the floating-point machine precision. However, SuperLU scales effectively only to hundreds of processors or less. For larger system sizes, it has been shown that physics-based (Chacón and Knoll 2003 J. Comput. Phys. 188 573-92) or other preconditioners can be applied to provide adequate solver performance. In recent years, we have been developing a new algebraic hybrid linear solver, PDSLin (Parallel Domain decomposition Schur complement-based Linear solver) (Yamazaki and Li 2010 Proc. VECPAR pp 421-34 and Yamazaki et al 2011 Technical Report). In this work, we compare numerical results from a direct solver and the proposed hybrid solver for the magnetic reconnection problem and demonstrate that the new hybrid solver is scalable to thousands of processors while maintaining the same robustness as a direct solver.
Xu, Peng; Tian, Yin; Lei, Xu; Hu, Xiao; Yao, Dezhong
2008-12-01
How to localize the neural electric activities within brain effectively and precisely from the scalp electroencephalogram (EEG) recordings is a critical issue for current study in clinical neurology and cognitive neuroscience. In this paper, based on the charge source model and the iterative re-weighted strategy, proposed is a new maximum neighbor weight based iterative sparse source imaging method, termed as CMOSS (Charge source model based Maximum neighbOr weight Sparse Solution). Different from the weight used in focal underdetermined system solver (FOCUSS) where the weight for each point in the discrete solution space is independently updated in iterations, the new designed weight for each point in each iteration is determined by the source solution of the last iteration at both the point and its neighbors. Using such a new weight, the next iteration may have a bigger chance to rectify the local source location bias existed in the previous iteration solution. The simulation studies with comparison to FOCUSS and LORETA for various source configurations were conducted on a realistic 3-shell head model, and the results confirmed the validation of CMOSS for sparse EEG source localization. Finally, CMOSS was applied to localize sources elicited in a visual stimuli experiment, and the result was consistent with those source areas involved in visual processing reported in previous studies.
NASA Astrophysics Data System (ADS)
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the parallel context.
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures
Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone
2018-01-01
We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
Laplace-domain waveform modeling and inversion for the 3D acoustic-elastic coupled media
NASA Astrophysics Data System (ADS)
Shin, Jungkyun; Shin, Changsoo; Calandra, Henri
2016-06-01
Laplace-domain waveform inversion reconstructs long-wavelength subsurface models by using the zero-frequency component of damped seismic signals. Despite the computational advantages of Laplace-domain waveform inversion over conventional frequency-domain waveform inversion, an acoustic assumption and an iterative matrix solver have been used to invert 3D marine datasets to mitigate the intensive computing cost. In this study, we develop a Laplace-domain waveform modeling and inversion algorithm for 3D acoustic-elastic coupled media by using a parallel sparse direct solver library (MUltifrontal Massively Parallel Solver, MUMPS). We precisely simulate a real marine environment by coupling the 3D acoustic and elastic wave equations with the proper boundary condition at the fluid-solid interface. In addition, we can extract the elastic properties of the Earth below the sea bottom from the recorded acoustic pressure datasets. As a matrix solver, the parallel sparse direct solver is used to factorize the non-symmetric impedance matrix in a distributed memory architecture and rapidly solve the wave field for a number of shots by using the lower and upper matrix factors. Using both synthetic datasets and real datasets obtained by a 3D wide azimuth survey, the long-wavelength component of the P-wave and S-wave velocity models is reconstructed and the proposed modeling and inversion algorithm are verified. A cluster of 80 CPU cores is used for this study.
Textbook Multigrid Efficiency for Leading Edge Stagnation
NASA Technical Reports Server (NTRS)
Diskin, Boris; Thomas, James L.; Mineck, Raymond E.
2004-01-01
A multigrid solver is defined as having textbook multigrid efficiency (TME) if the solutions to the governing system of equations are attained in a computational work which is a small (less than 10) multiple of the operation count in evaluating the discrete residuals. TME in solving the incompressible inviscid fluid equations is demonstrated for leading-edge stagnation flows. The contributions of this paper include (1) a special formulation of the boundary conditions near stagnation allowing convergence of the Newton iterations on coarse grids, (2) the boundary relaxation technique to facilitate relaxation and residual restriction near the boundaries, (3) a modified relaxation scheme to prevent initial error amplification, and (4) new general analysis techniques for multigrid solvers. Convergence of algebraic errors below the level of discretization errors is attained by a full multigrid (FMG) solver with one full approximation scheme (FAS) cycle per grid. Asymptotic convergence rates of the FAS cycles for the full system of flow equations are very fast, approaching those for scalar elliptic equations.
Textbook Multigrid Efficiency for Leading Edge Stagnation
NASA Technical Reports Server (NTRS)
Diskin, Boris; Thomas, James L.; Mineck, Raymond E.
2004-01-01
A multigrid solver is defined as having textbook multigrid efficiency (TME) if the solutions to the governing system of equations are attained in a computational work which is a small (less than 10) multiple of the operation count in evaluating the discrete residuals. TME in solving the incompressible inviscid fluid equations is demonstrated for leading- edge stagnation flows. The contributions of this paper include (1) a special formulation of the boundary conditions near stagnation allowing convergence of the Newton iterations on coarse grids, (2) the boundary relaxation technique to facilitate relaxation and residual restriction near the boundaries, (3) a modified relaxation scheme to prevent initial error amplification, and (4) new general analysis techniques for multigrid solvers. Convergence of algebraic errors below the level of discretization errors is attained by a full multigrid (FMG) solver with one full approximation scheme (F.4S) cycle per grid. Asymptotic convergence rates of the F.4S cycles for the full system of flow equations are very fast, approaching those for scalar elliptic equations.
Neighbour lists for smoothed particle hydrodynamics on GPUs
NASA Astrophysics Data System (ADS)
Winkler, Daniel; Rezavand, Massoud; Rauch, Wolfgang
2018-04-01
The efficient iteration of neighbouring particles is a performance critical aspect of any high performance smoothed particle hydrodynamics (SPH) solver. SPH solvers that implement a constant smoothing length generally divide the simulation domain into a uniform grid to reduce the computational complexity of the neighbour search. Based on this method, particle neighbours are either stored per grid cell or for each individual particle, denoted as Verlet list. While the latter approach has significantly higher memory requirements, it has the potential for a significant computational speedup. A theoretical comparison is performed to estimate the potential improvements of the method based on unknown hardware dependent factors. Subsequently, the computational performance of both approaches is empirically evaluated on graphics processing units. It is shown that the speedup differs significantly for different hardware, dimensionality and floating point precision. The Verlet list algorithm is implemented as an alternative to the cell linked list approach in the open-source SPH solver DualSPHysics and provided as a standalone software package.
Solving lattice QCD systems of equations using mixed precision solvers on GPUs
NASA Astrophysics Data System (ADS)
Clark, M. A.; Babich, R.; Barros, K.; Brower, R. C.; Rebbi, C.
2010-09-01
Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40, 135 and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.
Well-conditioned fractional collocation methods using fractional Birkhoff interpolation basis
NASA Astrophysics Data System (ADS)
Jiao, Yujian; Wang, Li-Lian; Huang, Can
2016-01-01
The purpose of this paper is twofold. Firstly, we provide explicit and compact formulas for computing both Caputo and (modified) Riemann-Liouville (RL) fractional pseudospectral differentiation matrices (F-PSDMs) of any order at general Jacobi-Gauss-Lobatto (JGL) points. We show that in the Caputo case, it suffices to compute F-PSDM of order μ ∈ (0 , 1) to compute that of any order k + μ with integer k ≥ 0, while in the modified RL case, it is only necessary to evaluate a fractional integral matrix of order μ ∈ (0 , 1). Secondly, we introduce suitable fractional JGL Birkhoff interpolation problems leading to new interpolation polynomial basis functions with remarkable properties: (i) the matrix generated from the new basis yields the exact inverse of F-PSDM at "interior" JGL points; (ii) the matrix of the highest fractional derivative in a collocation scheme under the new basis is diagonal; and (iii) the resulted linear system is well-conditioned in the Caputo case, while in the modified RL case, the eigenvalues of the coefficient matrix are highly concentrated. In both cases, the linear systems of the collocation schemes using the new basis can be solved by an iterative solver within a few iterations. Notably, the inverse can be computed in a very stable manner, so this offers optimal preconditioners for usual fractional collocation methods for fractional differential equations (FDEs). It is also noteworthy that the choice of certain special JGL points with parameters related to the order of the equations can ease the implementation. We highlight that the use of the Bateman's fractional integral formulas and fast transforms between Jacobi polynomials with different parameters, is essential for our algorithm development.
Numerical Technology for Large-Scale Computational Electromagnetics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharpe, R; Champagne, N; White, D
The key bottleneck of implicit computational electromagnetics tools for large complex geometries is the solution of the resulting linear system of equations. The goal of this effort was to research and develop critical numerical technology that alleviates this bottleneck for large-scale computational electromagnetics (CEM). The mathematical operators and numerical formulations used in this arena of CEM yield linear equations that are complex valued, unstructured, and indefinite. Also, simultaneously applying multiple mathematical modeling formulations to different portions of a complex problem (hybrid formulations) results in a mixed structure linear system, further increasing the computational difficulty. Typically, these hybrid linear systems aremore » solved using a direct solution method, which was acceptable for Cray-class machines but does not scale adequately for ASCI-class machines. Additionally, LLNL's previously existing linear solvers were not well suited for the linear systems that are created by hybrid implicit CEM codes. Hence, a new approach was required to make effective use of ASCI-class computing platforms and to enable the next generation design capabilities. Multiple approaches were investigated, including the latest sparse-direct methods developed by our ASCI collaborators. In addition, approaches that combine domain decomposition (or matrix partitioning) with general-purpose iterative methods and special purpose pre-conditioners were investigated. Special-purpose pre-conditioners that take advantage of the structure of the matrix were adapted and developed based on intimate knowledge of the matrix properties. Finally, new operator formulations were developed that radically improve the conditioning of the resulting linear systems thus greatly reducing solution time. The goal was to enable the solution of CEM problems that are 10 to 100 times larger than our previous capability.« less
TDIGG - TWO-DIMENSIONAL, INTERACTIVE GRID GENERATION CODE
NASA Technical Reports Server (NTRS)
Vu, B. T.
1994-01-01
TDIGG is a fast and versatile program for generating two-dimensional computational grids for use with finite-difference flow-solvers. Both algebraic and elliptic grid generation systems are included. The method for grid generation by algebraic transformation is based on an interpolation algorithm and the elliptic grid generation is established by solving the partial differential equation (PDE). Non-uniform grid distributions are carried out using a hyperbolic tangent stretching function. For algebraic grid systems, interpolations in one direction (univariate) and two directions (bivariate) are considered. These interpolations are associated with linear or cubic Lagrangian/Hermite/Bezier polynomial functions. The algebraic grids can subsequently be smoothed using an elliptic solver. For elliptic grid systems, the PDE can be in the form of Laplace (zero forcing function) or Poisson. The forcing functions in the Poisson equation come from the boundary or the entire domain of the initial algebraic grids. A graphics interface procedure using the Silicon Graphics (GL) Library is included to allow users to visualize the grid variations at each iteration. This will allow users to interactively modify the grid to match their applications. TDIGG is written in FORTRAN 77 for Silicon Graphics IRIS series computers running IRIX. This package requires either MIT's X Window System, Version 11 Revision 4 or SGI (Motif) Window System. A sample executable is provided on the distribution medium. It requires 148K of RAM for execution. The standard distribution medium is a .25 inch streaming magnetic IRIX tape cartridge in UNIX tar format. This program was developed in 1992.
NASA Technical Reports Server (NTRS)
Dunham, R. S.
1976-01-01
FORTRAN coded out-of-core equation solvers that solve using direct methods symmetric banded systems of simultaneous algebraic equations. Banded, frontal and column (skyline) solvers were studied as well as solvers that can partition the working area and thus could fit into any available core. Comparison timings are presented for several typical two dimensional and three dimensional continuum type grids of elements with and without midside nodes. Extensive conclusions are also given.
FOLDER: A numerical tool to simulate the development of structures in layered media
NASA Astrophysics Data System (ADS)
Adamuszek, Marta; Dabrowski, Marcin; Schmid, Daniel W.
2015-04-01
FOLDER is a numerical toolbox for modelling deformation in layered media during layer parallel shortening or extension in two dimensions. FOLDER builds on MILAMIN [1], a finite element method based mechanical solver, with a range of utilities included from the MUTILS package [2]. Numerical mesh is generated using the Triangle software [3]. The toolbox includes features that allow for: 1) designing complex structures such as multi-layer stacks, 2) accurately simulating large-strain deformation of linear and non-linear viscous materials, 3) post-processing of various physical fields such as velocity (total and perturbing), rate of deformation, finite strain, stress, deviatoric stress, pressure, apparent viscosity. FOLDER is designed to ensure maximum flexibility to configure model geometry, define material parameters, specify range of numerical parameters in simulations and choose the plotting options. FOLDER is an open source MATLAB application and comes with a user friendly graphical interface. The toolbox additionally comprises an educational application that illustrates various analytical solutions of growth rates calculated for the cases of folding and necking of a single layer with interfaces perturbed with a single sinusoidal waveform. We further derive two novel analytical expressions for the growth rate in the cases of folding and necking of a linear viscous layer embedded in a linear viscous medium of a finite thickness. We use FOLDER to test the accuracy of single-layer folding simulations using various 1) spatial and temporal resolutions, 2) time integration schemes, and 3) iterative algorithms for non-linear materials. The accuracy of the numerical results is quantified by: 1) comparing them to analytical solution, if available, or 2) running convergence tests. As a result, we provide a map of the most optimal choice of grid size, time step, and number of iterations to keep the results of the numerical simulations below a given error for a given time integration scheme. We also demonstrate that Euler and Leapfrog time integration schemes are not recommended for any practical use. Finally, the capabilities of the toolbox are illustrated based on two examples: 1) shortening of a synthetic multi-layer sequence and 2) extension of a folded quartz vein embedded in phyllite from Sprague Upper Reservoir (example discussed by Sherwin and Chapple [4]). The latter example demonstrates that FOLDER can be successfully used for reverse modelling and mechanical restoration. [1] Dabrowski, M., Krotkiewski, M., and Schmid, D. W., 2008, MILAMIN: MATLAB-based finite element method solver for large problems. Geochemistry Geophysics Geosystems, vol. 9. [2] Krotkiewski, M. and Dabrowski M., 2010 Parallel symmetric sparse matrix-vector product on scalar multi-core cpus. Parallel Computing, 36(4):181-198 [3] Shewchuk, J. R., 1996, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator, In: Applied Computational Geometry: Towards Geometric Engineering'' (Ming C. Lin and Dinesh Manocha, editors), Vol. 1148 of Lecture Notes in Computer Science, pp. 203-222, Springer-Verlag, Berlin [4] Sherwin, J.A., Chapple, W.M., 1968. Wavelengths of single layer folds - a Comparison between theory and Observation. American Journal of Science 266 (3), p. 167-179
A new approach for solving the three-dimensional steady Euler equations. I - General theory
NASA Technical Reports Server (NTRS)
Chang, S.-C.; Adamczyk, J. J.
1986-01-01
The present iterative procedure combines the Clebsch potentials and the Munk-Prim (1947) substitution principle with an extension of a semidirect Cauchy-Riemann solver to three dimensions, in order to solve steady, inviscid three-dimensional rotational flow problems in either subsonic or incompressible flow regimes. This solution procedure can be used, upon discretization, to obtain inviscid subsonic flow solutions in a 180-deg turning channel. In addition to accurately predicting the behavior of weak secondary flows, the algorithm can generate solutions for strong secondary flows and will yield acceptable flow solutions after only 10-20 outer loop iterations.
A new approach for solving the three-dimensional steady Euler equations. I - General theory
NASA Astrophysics Data System (ADS)
Chang, S.-C.; Adamczyk, J. J.
1986-08-01
The present iterative procedure combines the Clebsch potentials and the Munk-Prim (1947) substitution principle with an extension of a semidirect Cauchy-Riemann solver to three dimensions, in order to solve steady, inviscid three-dimensional rotational flow problems in either subsonic or incompressible flow regimes. This solution procedure can be used, upon discretization, to obtain inviscid subsonic flow solutions in a 180-deg turning channel. In addition to accurately predicting the behavior of weak secondary flows, the algorithm can generate solutions for strong secondary flows and will yield acceptable flow solutions after only 10-20 outer loop iterations.
NASA Astrophysics Data System (ADS)
Ikelle, Luc T.; Osen, Are; Amundsen, Lasse; Shen, Yunqing
2004-12-01
The classical linear solutions to the problem of multiple attenuation, like predictive deconvolution, τ-p filtering, or F-K filtering, are generally fast, stable, and robust compared to non-linear solutions, which are generally either iterative or in the form of a series with an infinite number of terms. These qualities have made the linear solutions more attractive to seismic data-processing practitioners. However, most linear solutions, including predictive deconvolution or F-K filtering, contain severe assumptions about the model of the subsurface and the class of free-surface multiples they can attenuate. These assumptions limit their usefulness. In a recent paper, we described an exception to this assertion for OBS data. We showed in that paper that a linear and non-iterative solution to the problem of attenuating free-surface multiples which is as accurate as iterative non-linear solutions can be constructed for OBS data. We here present a similar linear and non-iterative solution for attenuating free-surface multiples in towed-streamer data. For most practical purposes, this linear solution is as accurate as the non-linear ones.
A wideband FMBEM for 2D acoustic design sensitivity analysis based on direct differentiation method
NASA Astrophysics Data System (ADS)
Chen, Leilei; Zheng, Changjun; Chen, Haibo
2013-09-01
This paper presents a wideband fast multipole boundary element method (FMBEM) for two dimensional acoustic design sensitivity analysis based on the direct differentiation method. The wideband fast multipole method (FMM) formed by combining the original FMM and the diagonal form FMM is used to accelerate the matrix-vector products in the boundary element analysis. The Burton-Miller formulation is used to overcome the fictitious frequency problem when using a single Helmholtz boundary integral equation for exterior boundary-value problems. The strongly singular and hypersingular integrals in the sensitivity equations can be evaluated explicitly and directly by using the piecewise constant discretization. The iterative solver GMRES is applied to accelerate the solution of the linear system of equations. A set of optimal parameters for the wideband FMBEM design sensitivity analysis are obtained by observing the performances of the wideband FMM algorithm in terms of computing time and memory usage. Numerical examples are presented to demonstrate the efficiency and validity of the proposed algorithm.
WAKES: Wavelet Adaptive Kinetic Evolution Solvers
NASA Astrophysics Data System (ADS)
Mardirian, Marine; Afeyan, Bedros; Larson, David
2016-10-01
We are developing a general capability to adaptively solve phase space evolution equations mixing particle and continuum techniques in an adaptive manner. The multi-scale approach is achieved using wavelet decompositions which allow phase space density estimation to occur with scale dependent increased accuracy and variable time stepping. Possible improvements on the SFK method of Larson are discussed, including the use of multiresolution analysis based Richardson-Lucy Iteration, adaptive step size control in explicit vs implicit approaches. Examples will be shown with KEEN waves and KEEPN (Kinetic Electrostatic Electron Positron Nonlinear) waves, which are the pair plasma generalization of the former, and have a much richer span of dynamical behavior. WAKES techniques are well suited for the study of driven and released nonlinear, non-stationary, self-organized structures in phase space which have no fluid, limit nor a linear limit, and yet remain undamped and coherent well past the drive period. The work reported here is based on the Vlasov-Poisson model of plasma dynamics. Work supported by a Grant from the AFOSR.
Exploiting data representation for fault tolerance
Hoemmen, Mark Frederick; Elliott, J.; Sandia National Lab.; ...
2015-01-06
Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product'smore » result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.« less
NASA Astrophysics Data System (ADS)
Biazzo, Indaco; Braunstein, Alfredo; Zecchina, Riccardo
2012-08-01
We study the behavior of an algorithm derived from the cavity method for the prize-collecting steiner tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks, networks, and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the dhea solver, a branch and cut integer linear programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two postprocessing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.
A Fast MoM Solver (GIFFT) for Large Arrays of Microstrip and Cavity-Backed Antennas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fasenfest, B J; Capolino, F; Wilton, D
2005-02-02
A straightforward numerical analysis of large arrays of arbitrary contour (and possibly missing elements) requires large memory storage and long computation times. Several techniques are currently under development to reduce this cost. One such technique is the GIFFT (Green's function interpolation and FFT) method discussed here that belongs to the class of fast solvers for large structures. This method uses a modification of the standard AIM approach [1] that takes into account the reusability properties of matrices that arise from identical array elements. If the array consists of planar conducting bodies, the array elements are meshed using standard subdomain basismore » functions, such as the RWG basis. The Green's function is then projected onto a sparse regular grid of separable interpolating polynomials. This grid can then be used in a 2D or 3D FFT to accelerate the matrix-vector product used in an iterative solver [2]. The method has been proven to greatly reduce solve time by speeding up the matrix-vector product computation. The GIFFT approach also reduces fill time and memory requirements, since only the near element interactions need to be calculated exactly. The present work extends GIFFT to layered material Green's functions and multiregion interactions via slots in ground planes. In addition, a preconditioner is implemented to greatly reduce the number of iterations required for a solution. The general scheme of the GIFFT method is reported in [2]; this contribution is limited to presenting new results for array antennas made of slot-excited patches and cavity-backed patch antennas.« less
Tezaur, Irina K.; Tuminaro, Raymond S.; Perego, Mauro; ...
2015-01-01
We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems that arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multigrid (AMG) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the AMG preconditionermore » results in faster linear solve times but the ILU preconditioner exhibits better scalability. In addition, a weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. We show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the AMG preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the AMG preconditioner.« less
T-MATS Toolbox for the Modeling and Analysis of Thermodynamic Systems
NASA Technical Reports Server (NTRS)
Chapman, Jeffryes W.
2014-01-01
The Toolbox for the Modeling and Analysis of Thermodynamic Systems (T-MATS) is a MATLABSimulink (The MathWorks Inc.) plug-in for creating and simulating thermodynamic systems and controls. The package contains generic parameterized components that can be combined with a variable input iterative solver and optimization algorithm to create complex system models, such as gas turbines.
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Hixon, Duane; Sankar, L. N.
1993-01-01
During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
Nested Conjugate Gradient Algorithm with Nested Preconditioning for Non-linear Image Restoration.
Skariah, Deepak G; Arigovindan, Muthuvel
2017-06-19
We develop a novel optimization algorithm, which we call Nested Non-Linear Conjugate Gradient algorithm (NNCG), for image restoration based on quadratic data fitting and smooth non-quadratic regularization. The algorithm is constructed as a nesting of two conjugate gradient (CG) iterations. The outer iteration is constructed as a preconditioned non-linear CG algorithm; the preconditioning is performed by the inner CG iteration that is linear. The inner CG iteration, which performs preconditioning for outer CG iteration, itself is accelerated by an another FFT based non-iterative preconditioner. We prove that the method converges to a stationary point for both convex and non-convex regularization functionals. We demonstrate experimentally that proposed method outperforms the well-known majorization-minimization method used for convex regularization, and a non-convex inertial-proximal method for non-convex regularization functional.
An installed nacelle design code using a multiblock Euler solver. Volume 2: User guide
NASA Technical Reports Server (NTRS)
Chen, H. C.
1992-01-01
This is a user manual for the general multiblock Euler design (GMBEDS) code. The code is for the design of a nacelle installed on a geometrically complex configuration such as a complete airplane with wing/body/nacelle/pylon. It consists of two major building blocks: a design module developed by LaRC using directive iterative surface curvature (DISC); and a general multiblock Euler (GMBE) flow solver. The flow field surrounding a complex configuration is divided into a number of topologically simple blocks to facilitate surface-fitted grid generation and improve flow solution efficiency. This user guide provides input data formats along with examples of input files and a Unix script for program execution in the UNICOS environment.
Fast Multilevel Solvers for a Class of Discrete Fourth Order Parabolic Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zheng, Bin; Chen, Luoping; Hu, Xiaozhe
2016-03-05
In this paper, we study fast iterative solvers for the solution of fourth order parabolic equations discretized by mixed finite element methods. We propose to use consistent mass matrix in the discretization and use lumped mass matrix to construct efficient preconditioners. We provide eigenvalue analysis for the preconditioned system and estimate the convergence rate of the preconditioned GMRes method. Furthermore, we show that these preconditioners only need to be solved inexactly by optimal multigrid algorithms. Our numerical examples indicate that the proposed preconditioners are very efficient and robust with respect to both discretization parameters and diffusion coefficients. We also investigatemore » the performance of multigrid algorithms with either collective smoothers or distributive smoothers when solving the preconditioner systems.« less
NASA Astrophysics Data System (ADS)
Lavery, N.; Taylor, C.
1999-07-01
Multigrid and iterative methods are used to reduce the solution time of the matrix equations which arise from the finite element (FE) discretisation of the time-independent equations of motion of the incompressible fluid in turbulent motion. Incompressible flow is solved by using the method of reduce interpolation for the pressure to satisfy the Brezzi-Babuska condition. The k-l model is used to complete the turbulence closure problem. The non-symmetric iterative matrix methods examined are the methods of least squares conjugate gradient (LSCG), biconjugate gradient (BCG), conjugate gradient squared (CGS), and the biconjugate gradient squared stabilised (BCGSTAB). The multigrid algorithm applied is based on the FAS algorithm of Brandt, and uses two and three levels of grids with a V-cycling schedule. These methods are all compared to the non-symmetric frontal solver. Copyright
NASA Astrophysics Data System (ADS)
Barnaś, Dawid; Bieniasz, Lesław K.
2017-07-01
We have recently developed a vectorized Thomas solver for quasi-block tridiagonal linear algebraic equation systems using Streaming SIMD Extensions (SSE) and Advanced Vector Extensions (AVX) in operations on dense blocks [D. Barnaś and L. K. Bieniasz, Int. J. Comput. Meth., accepted]. The acceleration caused by vectorization was observed for large block sizes, but was less satisfactory for small blocks. In this communication we report on another version of the solver, optimized for small blocks of size up to four rows and/or columns.
IRMHD: an implicit radiative and magnetohydrodynamical solver for self-gravitating systems
NASA Astrophysics Data System (ADS)
Hujeirat, A.
1998-07-01
The 2D implicit hydrodynamical solver developed by Hujeirat & Rannacher is now modified to include the effects of radiation, magnetic fields and self-gravity in different geometries. The underlying numerical concept is based on the operator splitting approach, and the resulting 2D matrices are inverted using different efficient preconditionings such as ADI (alternating direction implicit), the approximate factorization method and Line-Gauss-Seidel or similar iteration procedures. Second-order finite volume with third-order upwinding and second-order time discretization is used. To speed up convergence and enhance efficiency we have incorporated an adaptive time-step control and monotonic multilevel grid distributions as well as vectorizing the code. Test calculations had shown that it requires only 38 per cent more computational effort than its explicit counterpart, whereas its range of application to astrophysical problems is much larger. For example, strongly time-dependent, quasi-stationary and steady-state solutions for the set of Euler and Navier-Stokes equations can now be sought on a non-linearly distributed and strongly stretched mesh. As most of the numerical techniques used to build up this algorithm have been described by Hujeirat & Rannacher in an earlier paper, we focus in this paper on the inclusion of self-gravity, radiation and magnetic fields. Strategies for satisfying the condition ∇.B=0 in the implicit evolution of MHD flows are given. A new discretization strategy for the vector potential which allows alternating use of the direct method is prescribed. We investigate the efficiencies of several 2D solvers for a Poisson-like equation and compare their convergence rates. We provide a splitting approach for the radiative flux within the FLD (flux-limited diffusion) approximation to enhance consistency and accuracy between regions of different optical depths. The results of some test problems are presented to demonstrate the accuracy and robustness of the code.
Multilevel acceleration of scattering-source iterations with application to electron transport
Drumm, Clif; Fan, Wesley
2017-08-18
Acceleration/preconditioning strategies available in the SCEPTRE radiation transport code are described. A flexible transport synthetic acceleration (TSA) algorithm that uses a low-order discrete-ordinates (S N) or spherical-harmonics (P N) solve to accelerate convergence of a high-order S N source-iteration (SI) solve is described. Convergence of the low-order solves can be further accelerated by applying off-the-shelf incomplete-factorization or algebraic-multigrid methods. Also available is an algorithm that uses a generalized minimum residual (GMRES) iterative method rather than SI for convergence, using a parallel sweep-based solver to build up a Krylov subspace. TSA has been applied as a preconditioner to accelerate the convergencemore » of the GMRES iterations. The methods are applied to several problems involving electron transport and problems with artificial cross sections with large scattering ratios. These methods were compared and evaluated by considering material discontinuities and scattering anisotropy. Observed accelerations obtained are highly problem dependent, but speedup factors around 10 have been observed in typical applications.« less
Wu, Jiayang; Cao, Pan; Hu, Xiaofeng; Jiang, Xinhong; Pan, Ting; Yang, Yuxing; Qiu, Ciyuan; Tremblay, Christine; Su, Yikai
2014-10-20
We propose and experimentally demonstrate an all-optical temporal differential-equation solver that can be used to solve ordinary differential equations (ODEs) characterizing general linear time-invariant (LTI) systems. The photonic device implemented by an add-drop microring resonator (MRR) with two tunable interferometric couplers is monolithically integrated on a silicon-on-insulator (SOI) wafer with a compact footprint of ~60 μm × 120 μm. By thermally tuning the phase shifts along the bus arms of the two interferometric couplers, the proposed device is capable of solving first-order ODEs with two variable coefficients. The operation principle is theoretically analyzed, and system testing of solving ODE with tunable coefficients is carried out for 10-Gb/s optical Gaussian-like pulses. The experimental results verify the effectiveness of the fabricated device as a tunable photonic ODE solver.
TOUGH3: A new efficient version of the TOUGH suite of multiphase flow and transport simulators
NASA Astrophysics Data System (ADS)
Jung, Yoojin; Pau, George Shu Heng; Finsterle, Stefan; Pollyea, Ryan M.
2017-11-01
The TOUGH suite of nonisothermal multiphase flow and transport simulators has been updated by various developers over many years to address a vast range of challenging subsurface problems. The increasing complexity of the simulated processes as well as the growing size of model domains that need to be handled call for an improvement in the simulator's computational robustness and efficiency. Moreover, modifications have been frequently introduced independently, resulting in multiple versions of TOUGH that (1) led to inconsistencies in feature implementation and usage, (2) made code maintenance and development inefficient, and (3) caused confusion to users and developers. TOUGH3-a new base version of TOUGH-addresses these issues. It consolidates both the serial (TOUGH2 V2.1) and parallel (TOUGH2-MP V2.0) implementations, enabling simulations to be performed on desktop computers and supercomputers using a single code. New PETSc parallel linear solvers are added to the existing serial solvers of TOUGH2 and the Aztec solver used in TOUGH2-MP. The PETSc solvers generally perform better than the Aztec solvers in parallel and the internal TOUGH3 linear solver in serial. TOUGH3 also incorporates many new features, addresses bugs, and improves the flexibility of data handling. Due to the improved capabilities and usability, TOUGH3 is more robust and efficient for solving tough and computationally demanding problems in diverse scientific and practical applications related to subsurface flow modeling.
Parallel Semi-Implicit Spectral Element Atmospheric Model
NASA Astrophysics Data System (ADS)
Fournier, A.; Thomas, S.; Loft, R.
2001-05-01
The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.
Validation of a coupled core-transport, pedestal-structure, current-profile and equilibrium model
NASA Astrophysics Data System (ADS)
Meneghini, O.
2015-11-01
The first workflow capable of predicting the self-consistent solution to the coupled core-transport, pedestal structure, and equilibrium problems from first-principles and its experimental tests are presented. Validation with DIII-D discharges in high confinement regimes shows that the workflow is capable of robustly predicting the kinetic profiles from on axis to the separatrix and matching the experimental measurements to within their uncertainty, with no prior knowledge of the pedestal height nor of any measurement of the temperature or pressure. Self-consistent coupling has proven to be essential to match the experimental results, and capture the non-linear physics that governs the core and pedestal solutions. In particular, clear stabilization of the pedestal peeling ballooning instabilities by the global Shafranov shift and destabilization by additional edge bootstrap current, and subsequent effect on the core plasma profiles, have been clearly observed and documented. In our model, self-consistency is achieved by iterating between the TGYRO core transport solver (with NEO and TGLF for neoclassical and turbulent flux), and the pedestal structure predicted by the EPED model. A self-consistent equilibrium is calculated by EFIT, while the ONETWO transport package evolves the current profile and calculates the particle and energy sources. The capabilities of such workflow are shown to be critical for the design of future experiments such as ITER and FNSF, which operate in a regime where the equilibrium, the pedestal, and the core transport problems are strongly coupled, and for which none of these quantities can be assumed to be known. Self-consistent core-pedestal predictions for ITER, as well as initial optimizations, will be presented. Supported by the US Department of Energy under DE-FC02-04ER54698, DE-SC0012652.
Simultaneous elastic parameter inversion in 2-D/3-D TTI medium combined later arrival times
NASA Astrophysics Data System (ADS)
Bai, Chao-ying; Wang, Tao; Yang, Shang-bei; Li, Xing-wang; Huang, Guo-jiao
2016-04-01
Traditional traveltime inversion for anisotropic medium is, in general, based on a "weak" assumption in the anisotropic property, which simplifies both the forward part (ray tracing is performed once only) and the inversion part (a linear inversion solver is possible). But for some real applications, a general (both "weak" and "strong") anisotropic medium should be considered. In such cases, one has to develop a ray tracing algorithm to handle with the general (including "strong") anisotropic medium and also to design a non-linear inversion solver for later tomography. Meanwhile, it is constructive to investigate how much the tomographic resolution can be improved by introducing the later arrivals. For this motivation, we incorporated our newly developed ray tracing algorithm (multistage irregular shortest-path method) for general anisotropic media with a non-linear inversion solver (a damped minimum norm, constrained least squares problem with a conjugate gradient approach) to formulate a non-linear inversion solver for anisotropic medium. This anisotropic traveltime inversion procedure is able to combine the later (reflected) arrival times. Both 2-D/3-D synthetic inversion experiments and comparison tests show that (1) the proposed anisotropic traveltime inversion scheme is able to recover the high contrast anomalies and (2) it is possible to improve the tomographic resolution by introducing the later (reflected) arrivals, but not as expected in the isotropic medium, because the different velocity (qP, qSV and qSH) sensitivities (or derivatives) respective to the different elastic parameters are not the same but are also dependent on the inclination angle.
Eikonal-Based Inversion of GPR Data from the Vaucluse Karst Aquifer
NASA Astrophysics Data System (ADS)
Yedlin, M. J.; van Vorst, D.; Guglielmi, Y.; Cappa, F.; Gaffet, S.
2009-12-01
In this paper, we present an easy-to-implement eikonal-based travel time inversion algorithm and apply it to borehole GPR measurement data obtained from a karst aquifer located in the Vaucluse in Provence. The boreholes are situated with a fault zone deep inside the aquifer, in the Laboratoire Souterrain à Bas Bruit (LSBB). The measurements were made using 250 MHz MALA RAMAC borehole GPR antennas. The inversion formulation is unique in its application of a fast-sweeping eikonal solver (Zhao [1]) to the minimization of an objective functional that is composed of a travel time misfit and a model-based regularization [2]. The solver is robust in the presence of large velocity contrasts, efficient, easy to implement, and does not require the use of a sorting algorithm. The computation of sensitivities, which are required for the inversion process, is achieved by tracing rays backward from receiver to source following the gradient of the travel time field [2]. A user wishing to implement this algorithm can opt to avoid the ray tracing step and simply perturb the model to obtain the required sensitivities. Despite the obvious computational inefficiency of such an approach, it is acceptable for 2D problems. The relationship between travel time and the velocity profile is non-linear, requiring an iterative approach to be used. At each iteration, a set of matrix equations is solved to determine the model update. As the inversion continues, the weighting of the regularization parameter is adjusted until an appropriate data misfit is obtained. The inversion results, shown in the attached image, are consistent with previously obtained geological structure. Future work will look at improving inversion resolution and incorporating other measurement methodologies, with the goal of providing useful data for groundwater analysis. References: [1] H. Zhao, “A fast sweeping method for Eikonal equations,” Mathematics of Computation, vol. 74, no. 250, pp. 603-627, 2004. [2] D. Aldridge and D. Oldenburg, “Two-dimensional tomographic inversion with finite-difference traveltimes,” Journal of Seismic Exploration, vol. 2, pp. 257-274, 1993. Recovered Permittivity Profiles
Kipp, K.L.
1987-01-01
The Heat- and Soil-Transport Program (HST3D) simulates groundwater flow and associated heat and solute transport in three dimensions. The three governing equations are coupled through the interstitial pore velocity, the dependence of the fluid density on pressure, temperature, the solute-mass fraction , and the dependence of the fluid viscosity on temperature and solute-mass fraction. The solute transport equation is for only a single, solute species with possible linear equilibrium sorption and linear decay. Finite difference techniques are used to discretize the governing equations using a point-distributed grid. The flow-, heat- and solute-transport equations are solved , in turn, after a particle Gauss-reduction scheme is used to modify them. The modified equations are more tightly coupled and have better stability for the numerical solutions. The basic source-sink term represents wells. A complex well flow model may be used to simulate specified flow rate and pressure conditions at the land surface or within the aquifer, with or without pressure and flow rate constraints. Boundary condition types offered include specified value, specified flux, leakage, heat conduction, and approximate free surface, and two types of aquifer influence functions. All boundary conditions can be functions of time. Two techniques are available for solution of the finite difference matrix equations. One technique is a direct-elimination solver, using equations reordered by alternating diagonal planes. The other technique is an iterative solver, using two-line successive over-relaxation. A restart option is available for storing intermediate results and restarting the simulation at an intermediate time with modified boundary conditions. This feature also can be used as protection against computer system failure. Data input and output may be in metric (SI) units or inch-pound units. Output may include tables of dependent variables and parameters, zoned-contour maps, and plots of the dependent variables versus time. (Lantz-PTT)
Methods for design and evaluation of integrated hardware-software systems for concurrent computation
NASA Technical Reports Server (NTRS)
Pratt, T. W.
1985-01-01
Research activities and publications are briefly summarized. The major tasks reviewed are: (1) VAX implementation of the PISCES parallel programming environment; (2) Apollo workstation network implementation of the PISCES environment; (3) FLEX implementation of the PISCES environment; (4) sparse matrix iterative solver in PSICES Fortran; (5) image processing application of PISCES; and (6) a formal model of concurrent computation being developed.
Simulation of High Power Lasers (Preprint)
2010-06-01
integration, which requires communication of zonal boundary information after each inner- iteration of the Gauss - Seidel or Jacobi matrix solver. Each...experiment consisting of a supersonic (M~2.2) converging -diverging nozzle section with secondary mass injection in the nozzle expansion downstream of...consists of a section of a supersonic (M~2.2) converging -diverging slit nozzle with one large and two small orifices that inject reactants into the
NASA Astrophysics Data System (ADS)
Zhang, Fan; Szilágyi, Béla
2013-10-01
At the beginning of binary black hole simulations, there is a pulse of spurious radiation (or junk radiation) resulting from the initial data not matching astrophysical quasi-equilibrium inspiral exactly. One traditionally waits for the junk radiation to exit the computational domain before taking physical readings, at the expense of throwing away a segment of the evolution, and with the hope that junk radiation exits cleanly. We argue that this hope does not necessarily pan out, as junk radiation could excite long-lived constraint violation. Another complication with the initial data is that they contain orbital eccentricity that needs to be removed, usually by evolving the early part of the inspiral multiple times with gradually improved input parameters. We show that this procedure is also adversely impacted by junk radiation. In this paper, we do not attempt to eliminate junk radiation directly, but instead tackle the much simpler problem of ameliorating its long-lasting effects. We report on the success of a method that achieves this goal by combining the removal of junk radiation and eccentricity into a single procedure. Namely, we periodically stop a low resolution simulation; take the numerically evolved metric data and overlay it with eccentricity adjustments; run it through an initial data solver (i.e. the solver receives as free data the numerical output of the previous iteration); restart the simulation; repeat until eccentricity becomes sufficiently low; and then launch the high resolution “production run” simulation. This approach has the following benefits: (1) We do not have to contend with the influence of junk radiation on eccentricity measurements for later iterations of the eccentricity reduction procedure. (2) We reenforce constraints every time the initial data solver is invoked, removing the constraint violation excited by junk radiation previously. (3) The wasted simulation segment associated with the junk radiation’s evolution is absorbed into the eccentricity reduction iterations. Furthermore, (1) and (2) together allow us to carry out our joint-elimination procedure at low resolution, even when the subsequent “production run” is intended as a high resolution simulation.
OVERSMART Reporting Tool for Flow Computations Over Large Grid Systems
NASA Technical Reports Server (NTRS)
Kao, David L.; Chan, William M.
2012-01-01
Structured grid solvers such as NASA's OVERFLOW compressible Navier-Stokes flow solver can generate large data files that contain convergence histories for flow equation residuals, turbulence model equation residuals, component forces and moments, and component relative motion dynamics variables. Most of today's large-scale problems can extend to hundreds of grids, and over 100 million grid points. However, due to the lack of efficient tools, only a small fraction of information contained in these files is analyzed. OVERSMART (OVERFLOW Solution Monitoring And Reporting Tool) provides a comprehensive report of solution convergence of flow computations over large, complex grid systems. It produces a one-page executive summary of the behavior of flow equation residuals, turbulence model equation residuals, and component forces and moments. Under the automatic option, a matrix of commonly viewed plots such as residual histograms, composite residuals, sub-iteration bar graphs, and component forces and moments is automatically generated. Specific plots required by the user can also be prescribed via a command file or a graphical user interface. Output is directed to the user s computer screen and/or to an html file for archival purposes. The current implementation has been targeted for the OVERFLOW flow solver, which is used to obtain a flow solution on structured overset grids. The OVERSMART framework allows easy extension to other flow solvers.
Constraining past seawater δ18O and temperature records developed from foraminiferal geochemistry
NASA Astrophysics Data System (ADS)
Quinn, T. M.; Thirumalai, K.; Marino, G.
2016-12-01
Paired measurements of magnesium-to-calcium ratios (Mg/Ca) and the stable oxygen isotopic composition (δ18O) in foraminifera have significantly advanced our knowledge of the climate system by providing information on past temperature and seawater δ18O (δ18Osw, a proxy for salinity and ice volume). However, multiple sources of uncertainty exist in transferring these downcore geochemical data into quantitative paleoclimate reconstructions. Here, we develop a computational toolkit entitled Paleo-Seawater Uncertainty Solver (PSU Solver) that performs bootstrap Monte Carlo simulations to constrain these various sources of uncertainty. PSU Solver calculates temperature and δ18Osw, and their respective confidence intervals using an iterative approach with user-defined errors, calibrations, and sea-level curves. Our probabilistic approach yields reduced uncertainty constraints compared to theoretical considerations and commonly used propagation exercises. We demonstrate the applicability of PSU Solver for published records covering three timescales: the late Holocene, the last deglaciation, and the last glacial period. We show that the influence of salinity on Mg/Ca can considerably alter the structure and amplitude of change in the resulting reconstruction and can impact the interpretation of paleoceanographic time series. We also highlight the sensitivity of the records to various inputs of sea-level curves, transfer functions, and uncertainty constraints. PSU Solver offers an expeditious yet rigorous approach to test the robustness of past climate variability inferred from paired Mg/Ca-δ18O measurements.
A Tensor-Train accelerated solver for integral equations in complex geometries
NASA Astrophysics Data System (ADS)
Corona, Eduardo; Rahimian, Abtin; Zorin, Denis
2017-04-01
We present a framework using the Quantized Tensor Train (QTT) decomposition to accurately and efficiently solve volume and boundary integral equations in three dimensions. We describe how the QTT decomposition can be used as a hierarchical compression and inversion scheme for matrices arising from the discretization of integral equations. For a broad range of problems, computational and storage costs of the inversion scheme are extremely modest O (log N) and once the inverse is computed, it can be applied in O (Nlog N) . We analyze the QTT ranks for hierarchically low rank matrices and discuss its relationship to commonly used hierarchical compression techniques such as FMM and HSS. We prove that the QTT ranks are bounded for translation-invariant systems and argue that this behavior extends to non-translation invariant volume and boundary integrals. For volume integrals, the QTT decomposition provides an efficient direct solver requiring significantly less memory compared to other fast direct solvers. We present results demonstrating the remarkable performance of the QTT-based solver when applied to both translation and non-translation invariant volume integrals in 3D. For boundary integral equations, we demonstrate that using a QTT decomposition to construct preconditioners for a Krylov subspace method leads to an efficient and robust solver with a small memory footprint. We test the QTT preconditioners in the iterative solution of an exterior elliptic boundary value problem (Laplace) formulated as a boundary integral equation in complex, multiply connected geometries.
Determining the Optimal Values of Exponential Smoothing Constants--Does Solver Really Work?
ERIC Educational Resources Information Center
Ravinder, Handanhal V.
2013-01-01
A key issue in exponential smoothing is the choice of the values of the smoothing constants used. One approach that is becoming increasingly popular in introductory management science and operations management textbooks is the use of Solver, an Excel-based non-linear optimizer, to identify values of the smoothing constants that minimize a measure…
A Unified Approach to Optimization
2014-10-02
employee scheduling, ad placement, latin squares, disjunctions of linear systems, temporal modeling with interval variables, and traveling salesman problems ...integrating technologies. A key to integrated modeling is to formulate a problem with high-levelmetaconstraints, which are inspired by the “global... problem substructure to the solver. This contrasts with the atomistic modeling style of mixed integer programming (MIP) and satisfiability (SAT) solvers
Bindu, G; Semenov, S
2013-01-01
This paper describes an efficient two-dimensional fused image reconstruction approach for Microwave Tomography (MWT). Finite Difference Time Domain (FDTD) models were created for a viable MWT experimental system having the transceivers modelled using thin wire approximation with resistive voltage sources. Born Iterative and Distorted Born Iterative methods have been employed for image reconstruction with the extremity imaging being done using a differential imaging technique. The forward solver in the imaging algorithm employs the FDTD method of solving the time domain Maxwell's equations with the regularisation parameter computed using a stochastic approach. The algorithm is tested with 10% noise inclusion and successful image reconstruction has been shown implying its robustness.
High-Order Hyperbolic Residual-Distribution Schemes on Arbitrary Triangular Grids
NASA Technical Reports Server (NTRS)
Mazaheri, Alireza; Nishikawa, Hiroaki
2015-01-01
In this paper, we construct high-order hyperbolic residual-distribution schemes for general advection-diffusion problems on arbitrary triangular grids. We demonstrate that the second-order accuracy of the hyperbolic schemes can be greatly improved by requiring the scheme to preserve exact quadratic solutions. We also show that the improved second-order scheme can be easily extended to third-order by further requiring the exactness for cubic solutions. We construct these schemes based on the LDA and the SUPG methodology formulated in the framework of the residual-distribution method. For both second- and third-order-schemes, we construct a fully implicit solver by the exact residual Jacobian of the second-order scheme, and demonstrate rapid convergence of 10-15 iterations to reduce the residuals by 10 orders of magnitude. We demonstrate also that these schemes can be constructed based on a separate treatment of the advective and diffusive terms, which paves the way for the construction of hyperbolic residual-distribution schemes for the compressible Navier-Stokes equations. Numerical results show that these schemes produce exceptionally accurate and smooth solution gradients on highly skewed and anisotropic triangular grids, including curved boundary problems, using linear elements. We also present Fourier analysis performed on the constructed linear system and show that an under-relaxation parameter is needed for stabilization of Gauss-Seidel relaxation.
A depth-first search algorithm to compute elementary flux modes by linear programming.
Quek, Lake-Ee; Nielsen, Lars K
2014-07-30
The decomposition of complex metabolic networks into elementary flux modes (EFMs) provides a useful framework for exploring reaction interactions systematically. Generating a complete set of EFMs for large-scale models, however, is near impossible. Even for moderately-sized models (<400 reactions), existing approaches based on the Double Description method must iterate through a large number of combinatorial candidates, thus imposing an immense processor and memory demand. Based on an alternative elementarity test, we developed a depth-first search algorithm using linear programming (LP) to enumerate EFMs in an exhaustive fashion. Constraints can be introduced to directly generate a subset of EFMs satisfying the set of constraints. The depth-first search algorithm has a constant memory overhead. Using flux constraints, a large LP problem can be massively divided and parallelized into independent sub-jobs for deployment into computing clusters. Since the sub-jobs do not overlap, the approach scales to utilize all available computing nodes with minimal coordination overhead or memory limitations. The speed of the algorithm was comparable to efmtool, a mainstream Double Description method, when enumerating all EFMs; the attrition power gained from performing flux feasibility tests offsets the increased computational demand of running an LP solver. Unlike the Double Description method, the algorithm enables accelerated enumeration of all EFMs satisfying a set of constraints.
NASA Astrophysics Data System (ADS)
Vecharynski, Eugene; Brabec, Jiri; Shao, Meiyue; Govind, Niranjan; Yang, Chao
2017-12-01
We present two efficient iterative algorithms for solving the linear response eigenvalue problem arising from the time dependent density functional theory. Although the matrix to be diagonalized is nonsymmetric, it has a special structure that can be exploited to save both memory and floating point operations. In particular, the nonsymmetric eigenvalue problem can be transformed into an eigenvalue problem that involves the product of two matrices M and K. We show that, because MK is self-adjoint with respect to the inner product induced by the matrix K, this product eigenvalue problem can be solved efficiently by a modified Davidson algorithm and a modified locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm that make use of the K-inner product. The solution of the product eigenvalue problem yields one component of the eigenvector associated with the original eigenvalue problem. We show that the other component of the eigenvector can be easily recovered in an inexpensive postprocessing procedure. As a result, the algorithms we present here become more efficient than existing methods that try to approximate both components of the eigenvectors simultaneously. In particular, our numerical experiments demonstrate that the new algorithms presented here consistently outperform the existing state-of-the-art Davidson type solvers by a factor of two in both solution time and storage.
On iterative algorithms for quantitative photoacoustic tomography in the radiative transport regime
NASA Astrophysics Data System (ADS)
Wang, Chao; Zhou, Tie
2017-11-01
In this paper, we present a numerical reconstruction method for quantitative photoacoustic tomography (QPAT), based on the radiative transfer equation (RTE), which models light propagation more accurately than diffusion approximation (DA). We investigate the reconstruction of absorption coefficient and scattering coefficient of biological tissues. An improved fixed-point iterative method to retrieve the absorption coefficient, given the scattering coefficient, is proposed for its cheap computational cost; the convergence of this method is also proved. The Barzilai-Borwein (BB) method is applied to retrieve two coefficients simultaneously. Since the reconstruction of optical coefficients involves the solutions of original and adjoint RTEs in the framework of optimization, an efficient solver with high accuracy is developed from Gao and Zhao (2009 Transp. Theory Stat. Phys. 38 149-92). Simulation experiments illustrate that the improved fixed-point iterative method and the BB method are competitive methods for QPAT in the relevant cases.
Architecting the Finite Element Method Pipeline for the GPU.
Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T
2014-02-01
The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.
Simulation of violent free surface flow by AMR method
NASA Astrophysics Data System (ADS)
Hu, Changhong; Liu, Cheng
2018-05-01
A novel CFD approach based on adaptive mesh refinement (AMR) technique is being developed for numerical simulation of violent free surface flows. CIP method is applied to the flow solver and tangent of hyperbola for interface capturing with slope weighting (THINC/SW) scheme is implemented as the free surface capturing scheme. The PETSc library is adopted to solve the linear system. The linear solver is redesigned and modified to satisfy the requirement of the AMR mesh topology. In this paper, our CFD method is outlined and newly obtained results on numerical simulation of violent free surface flows are presented.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES*
Fu, Zhisong; Jeong, Won-Ki; Pan, Yongsheng; Kirby, Robert M.; Whitaker, Ross T.
2012-01-01
This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. PMID:22641200
MIBPB: a software package for electrostatic analysis.
Chen, Duan; Chen, Zhan; Chen, Changjun; Geng, Weihua; Wei, Guo-Wei
2011-03-01
The Poisson-Boltzmann equation (PBE) is an established model for the electrostatic analysis of biomolecules. The development of advanced computational techniques for the solution of the PBE has been an important topic in the past two decades. This article presents a matched interface and boundary (MIB)-based PBE software package, the MIBPB solver, for electrostatic analysis. The MIBPB has a unique feature that it is the first interface technique-based PBE solver that rigorously enforces the solution and flux continuity conditions at the dielectric interface between the biomolecule and the solvent. For protein molecular surfaces, which may possess troublesome geometrical singularities, the MIB scheme makes the MIBPB by far the only existing PBE solver that is able to deliver the second-order convergence, that is, the accuracy increases four times when the mesh size is halved. The MIBPB method is also equipped with a Dirichlet-to-Neumann mapping technique that builds a Green's function approach to analytically resolve the singular charge distribution in biomolecules in order to obtain reliable solutions at meshes as coarse as 1 Å--whereas it usually takes other traditional PB solvers 0.25 Å to reach similar level of reliability. This work further accelerates the rate of convergence of linear equation systems resulting from the MIBPB by using the Krylov subspace (KS) techniques. Condition numbers of the MIBPB matrices are significantly reduced by using appropriate KS solver and preconditioner combinations. Both linear and nonlinear PBE solvers in the MIBPB package are tested by protein-solvent solvation energy calculations and analysis of salt effects on protein-protein binding energies, respectively. Copyright © 2010 Wiley Periodicals, Inc.
MIBPB: A software package for electrostatic analysis
Chen, Duan; Chen, Zhan; Chen, Changjun; Geng, Weihua; Wei, Guo-Wei
2010-01-01
The Poisson-Boltzmann equation (PBE) is an established model for the electrostatic analysis of biomolecules. The development of advanced computational techniques for the solution of the PBE has been an important topic in the past two decades. This paper presents a matched interface and boundary (MIB) based PBE software package, the MIBPB solver, for electrostatic analysis. The MIBPB has a unique feature that it is the first interface technique based PBE solver that rigorously enforces the solution and flux continuity conditions at the dielectric interface between the biomolecule and the solvent. For protein molecular surfaces which may possess troublesome geometrical singularities, the MIB scheme makes the MIBPB by far the only existing PBE solver that is able to deliver the second order convergence, i.e., the accuracy increases four times when the mesh size is halved. The MIBPB method is also equipped with a Dirichlet-to-Neumann mapping (DNM) technique, that builds a Green's function approach to analytically resolve the singular charge distribution in biomolecules in order to obtain reliable solutions at meshes as coarse as 1Å — while it usually takes other traditional PB solvers 0.25Å to reach similar level of reliability. The present work further accelerates the rate of convergence of linear equation systems resulting from the MIBPB by utilizing the Krylov subspace (KS) techniques. Condition numbers of the MIBPB matrices are significantly reduced by using appropriate Krylov subspace solver and preconditioner combinations. Both linear and nonlinear PBE solvers in the MIBPB package are tested by protein-solvent solvation energy calculations and analysis of salt effects on protein-protein binding energies, respectively. PMID:20845420
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weiland, T.; Bartsch, M.; Becker, U.
1997-02-01
MAFIA Version 4.0 is an almost completely new version of the general purpose electromagnetic simulator known since 13 years. The major improvements concern the new graphical user interface based on state of the art technology as well as a series of new solvers for new physics problems. MAFIA now covers heat distribution, electro-quasistatics, S-parameters in frequency domain, particle beam tracking in linear accelerators, acoustics and even elastodynamics. The solvers that were available in earlier versions have also been improved and/or extended, as for example the complex eigenmode solver, the 2D--3D coupled PIC solvers. Time domain solvers have new waveguide boundarymore » conditions with an extremely low reflection even near cutoff frequency, concentrated elements are available as well as a variety of signal processing options. Probably the most valuable addition are recursive sub-grid capabilities that enable modeling of very small details in large structures. {copyright} {ital 1997 American Institute of Physics.}« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weiland, T.; Bartsch, M.; Becker, U.
1997-02-01
MAFIA Version 4.0 is an almost completely new version of the general purpose electromagnetic simulator known since 13 years. The major improvements concern the new graphical user interface based on state of the art technology as well as a series of new solvers for new physics problems. MAFIA now covers heat distribution, electro-quasistatics, S-parameters in frequency domain, particle beam tracking in linear accelerators, acoustics and even elastodynamics. The solvers that were available in earlier versions have also been improved and/or extended, as for example the complex eigenmode solver, the 2D-3D coupled PIC solvers. Time domain solvers have new waveguide boundarymore » conditions with an extremely low reflection even near cutoff frequency, concentrated elements are available as well as a variety of signal processing options. Probably the most valuable addition are recursive sub-grid capabilities that enable modeling of very small details in large structures.« less
Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...
2017-03-05
Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Development of new vibration energy flow analysis software and its applications to vehicle systems
NASA Astrophysics Data System (ADS)
Kim, D.-J.; Hong, S.-Y.; Park, Y.-H.
2005-09-01
The Energy flow analysis (EFA) offers very promising results in predicting the noise and vibration responses of system structures in medium-to-high frequency ranges. We have developed the Energy flow finite element method (EFFEM) based software, EFADSC++ R4, for the vibration analysis. The software can analyze the system structures composed of beam, plate, spring-damper, rigid body elements and many other components developed, and has many useful functions in analysis. For convenient use of the software, the main functions of the whole software are modularized into translator, model-converter, and solver. The translator module makes it possible to use finite element (FE) model for the vibration analysis. The model-converter module changes FE model into energy flow finite element (EFFE) model, and generates joint elements to cover the vibrational attenuation in the complex structures composed of various elements and can solve the joint element equations by using the wave tra! nsmission approach very quickly. The solver module supports the various direct and iterative solvers for multi-DOF structures. The predictions of vibration for real vehicles by using the developed software were performed successfully.
SC-GRAPPA: Self-constraint noniterative GRAPPA reconstruction with closed-form solution.
Ding, Yu; Xue, Hui; Ahmad, Rizwan; Ting, Samuel T; Simonetti, Orlando P
2012-12-01
Parallel MRI (pMRI) reconstruction techniques are commonly used to reduce scan time by undersampling the k-space data. GRAPPA, a k-space based pMRI technique, is widely used clinically because of its robustness. In GRAPPA, the missing k-space data are estimated by solving a set of linear equations; however, this set of equations does not take advantage of the correlations within the missing k-space data. All k-space data in a neighborhood acquired from a phased-array coil are correlated. The correlation can be estimated easily as a self-constraint condition, and formulated as an extra set of linear equations to improve the performance of GRAPPA. The authors propose a modified k-space based pMRI technique called self-constraint GRAPPA (SC-GRAPPA) which combines the linear equations of GRAPPA with these extra equations to solve for the missing k-space data. Since SC-GRAPPA utilizes a least-squares solution of the linear equations, it has a closed-form solution that does not require an iterative solver. The SC-GRAPPA equation was derived by incorporating GRAPPA as a prior estimate. SC-GRAPPA was tested in a uniform phantom and two normal volunteers. MR real-time cardiac cine images with acceleration rate 5 and 6 were reconstructed using GRAPPA and SC-GRAPPA. SC-GRAPPA showed a significantly lower artifact level, and a greater than 10% overall signal-to-noise ratio (SNR) gain over GRAPPA, with more significant SNR gain observed in low-SNR regions of the images. SC-GRAPPA offers improved pMRI reconstruction, and is expected to benefit clinical imaging applications in the future.
Applying EXCEL Solver to a watershed management goal-programming problem
J. E. de Steiguer
2000-01-01
This article demonstrates the application of EXCEL® spreadsheet linear programming (LP) solver to a watershed management multiple use goal programming (GP) problem. The data used to demonstrate the application are from a published study for a watershed in northern Colorado. GP has been used by natural resource managers for many years. However, the GP solution by means...
NASA Technical Reports Server (NTRS)
Zubair, Mohammad; Nielsen, Eric; Luitjens, Justin; Hammond, Dana
2016-01-01
In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructuredgrid approach to accommodate geometric complexity. Implicit solution methodologies for such spatial discretizations generally require frequent solution of large tightly-coupled systems of block-sparse linear equations. The multicolor point-implicit solver used in the current work typically requires a significant fraction of the overall application run time. In this work, an efficient implementation of the solver for graphics processing units is proposed. Several factors present unique challenges to achieving an efficient implementation in this environment. These include the variable amount of parallelism available in different kernel calls, indirect memory access patterns, low arithmetic intensity, and the requirement to support variable block sizes. In this work, the solver is reformulated to use standard sparse and dense Basic Linear Algebra Subprograms (BLAS) functions. However, numerical experiments show that the performance of the BLAS functions available in existing CUDA libraries is suboptimal for matrices representative of those encountered in actual simulations. Instead, optimized versions of these functions are developed. Depending on block size, the new implementations show performance gains of up to 7x over the existing CUDA library functions.
NASA Astrophysics Data System (ADS)
Sides, Scott; Jamroz, Ben; Crockett, Robert; Pletzer, Alexander
2012-02-01
Self-consistent field theory (SCFT) for dense polymer melts has been highly successful in describing complex morphologies in block copolymers. Field-theoretic simulations such as these are able to access large length and time scales that are difficult or impossible for particle-based simulations such as molecular dynamics. The modified diffusion equations that arise as a consequence of the coarse-graining procedure in the SCF theory can be efficiently solved with a pseudo-spectral (PS) method that uses fast-Fourier transforms on uniform Cartesian grids. However, PS methods can be difficult to apply in many block copolymer SCFT simulations (eg. confinement, interface adsorption) in which small spatial regions might require finer resolution than most of the simulation grid. Progress on using new solver algorithms to address these problems will be presented. The Tech-X Chompst project aims at marrying the best of adaptive mesh refinement with linear matrix solver algorithms. The Tech-X code PolySwift++ is an SCFT simulation platform that leverages ongoing development in coupling Chombo, a package for solving PDEs via block-structured AMR calculations and embedded boundaries, with PETSc, a toolkit that includes a large assortment of sparse linear solvers.
On numerical instabilities of Godunov-type schemes for strong shocks
NASA Astrophysics Data System (ADS)
Xie, Wenjia; Li, Wei; Li, Hua; Tian, Zhengyu; Pan, Sha
2017-12-01
It is well known that low diffusion Riemann solvers with minimal smearing on contact and shear waves are vulnerable to shock instability problems, including the carbuncle phenomenon. In the present study, we concentrate on exploring where the instability grows out and how the dissipation inherent in Riemann solvers affects the unstable behaviors. With the help of numerical experiments and a linearized analysis method, it has been found that the shock instability is strongly related to the unstable modes of intermediate states inside the shock structure. The consistency of mass flux across the normal shock is needed for a Riemann solver to capture strong shocks stably. The famous carbuncle phenomenon is interpreted as the consequence of the inconsistency of mass flux across the normal shock for a low diffusion Riemann solver. Based on the results of numerical experiments and the linearized analysis, a robust Godunov-type scheme with a simple cure for the shock instability is suggested. With only the dissipation corresponding to shear waves introduced in the vicinity of strong shocks, the instability problem is circumvented. Numerical results of several carefully chosen strong shock wave problems are investigated to demonstrate the robustness of the proposed scheme.
Pseudo-time methods for constrained optimization problems governed by PDE
NASA Technical Reports Server (NTRS)
Taasan, Shlomo
1995-01-01
In this paper we present a novel method for solving optimization problems governed by partial differential equations. Existing methods are gradient information in marching toward the minimum, where the constrained PDE is solved once (sometimes only approximately) per each optimization step. Such methods can be viewed as a marching techniques on the intersection of the state and costate hypersurfaces while improving the residuals of the design equations per each iteration. In contrast, the method presented here march on the design hypersurface and at each iteration improve the residuals of the state and costate equations. The new method is usually much less expensive per iteration step since, in most problems of practical interest, the design equation involves much less unknowns that that of either the state or costate equations. Convergence is shown using energy estimates for the evolution equations governing the iterative process. Numerical tests show that the new method allows the solution of the optimization problem in a cost of solving the analysis problems just a few times, independent of the number of design parameters. The method can be applied using single grid iterations as well as with multigrid solvers.
Morphing Aircraft Structures: Research in AFRL/RB
2008-09-01
various iterative steps in the process, etc. The solver also internally controls the step size for integration, as this is independent of the step...Coupling of Substructures for Dynamic Analyses,” AIAA Journal , Vol. 6, No. 7, 1968, pp. 1313-1319. 2“Using the State-Dependent Modal Force (MFORCE),” AFL...an actuation system consisting of multiple internal actuators, centrally computer controlled to implement any commanded morphing configuration; and
Gumerov, Nail A; Duraiswami, Ramani
2009-01-01
The development of a fast multipole method (FMM) accelerated iterative solution of the boundary element method (BEM) for the Helmholtz equations in three dimensions is described. The FMM for the Helmholtz equation is significantly different for problems with low and high kD (where k is the wavenumber and D the domain size), and for large problems the method must be switched between levels of the hierarchy. The BEM requires several approximate computations (numerical quadrature, approximations of the boundary shapes using elements), and these errors must be balanced against approximations introduced by the FMM and the convergence criterion for iterative solution. These different errors must all be chosen in a way that, on the one hand, excess work is not done and, on the other, that the error achieved by the overall computation is acceptable. Details of translation operators for low and high kD, choice of representations, and BEM quadrature schemes, all consistent with these approximations, are described. A novel preconditioner using a low accuracy FMM accelerated solver as a right preconditioner is also described. Results of the developed solvers for large boundary value problems with 0.0001 less, similarkD less, similar500 are presented and shown to perform close to theoretical expectations.
Comparison results on preconditioned SOR-type iterative method for Z-matrices linear systems
NASA Astrophysics Data System (ADS)
Wang, Xue-Zhong; Huang, Ting-Zhu; Fu, Ying-Ding
2007-09-01
In this paper, we present some comparison theorems on preconditioned iterative method for solving Z-matrices linear systems, Comparison results show that the rate of convergence of the Gauss-Seidel-type method is faster than the rate of convergence of the SOR-type iterative method.
Diagnosis of Enzyme Inhibition Using Excel Solver: A Combined Dry and Wet Laboratory Exercise
ERIC Educational Resources Information Center
Dias, Albino A.; Pinto, Paula A.; Fraga, Irene; Bezerra, Rui M. F.
2014-01-01
In enzyme kinetic studies, linear transformations of the Michaelis-Menten equation, such as the Lineweaver-Burk double-reciprocal transformation, present some constraints. The linear transformation distorts the experimental error and the relationship between "x" and "y" axes; consequently, linear regression of transformed data…
NASA Astrophysics Data System (ADS)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.
2018-01-01
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
AQUASOL: An efficient solver for the dipolar Poisson–Boltzmann–Langevin equation
Koehl, Patrice; Delarue, Marc
2010-01-01
The Poisson–Boltzmann (PB) formalism is among the most popular approaches to modeling the solvation of molecules. It assumes a continuum model for water, leading to a dielectric permittivity that only depends on position in space. In contrast, the dipolar Poisson–Boltzmann–Langevin (DPBL) formalism represents the solvent as a collection of orientable dipoles with nonuniform concentration; this leads to a nonlinear permittivity function that depends both on the position and on the local electric field at that position. The differences in the assumptions underlying these two models lead to significant differences in the equations they generate. The PB equation is a second order, elliptic, nonlinear partial differential equation (PDE). Its response coefficients correspond to the dielectric permittivity and are therefore constant within each subdomain of the system considered (i.e., inside and outside of the molecules considered). While the DPBL equation is also a second order, elliptic, nonlinear PDE, its response coefficients are nonlinear functions of the electrostatic potential. Many solvers have been developed for the PB equation; to our knowledge, none of these can be directly applied to the DPBL equation. The methods they use may adapt to the difference; their implementations however are PBE specific. We adapted the PBE solver originally developed by Holst and Saied [J. Comput. Chem. 16, 337 (1995)] to the problem of solving the DPBL equation. This solver uses a truncated Newton method with a multigrid preconditioner. Numerical evidences suggest that it converges for the DPBL equation and that the convergence is superlinear. It is found however to be slow and greedy in memory requirement for problems commonly encountered in computational biology and computational chemistry. To circumvent these problems, we propose two variants, a quasi-Newton solver based on a simplified, inexact Jacobian and an iterative self-consistent solver that is based directly on the PBE solver. While both methods are not guaranteed to converge, numerical evidences suggest that they do and that their convergence is also superlinear. Both variants are significantly faster than the solver based on the exact Jacobian, with a much smaller memory footprint. All three methods have been implemented in a new code named AQUASOL, which is freely available. PMID:20151727
AQUASOL: An efficient solver for the dipolar Poisson-Boltzmann-Langevin equation.
Koehl, Patrice; Delarue, Marc
2010-02-14
The Poisson-Boltzmann (PB) formalism is among the most popular approaches to modeling the solvation of molecules. It assumes a continuum model for water, leading to a dielectric permittivity that only depends on position in space. In contrast, the dipolar Poisson-Boltzmann-Langevin (DPBL) formalism represents the solvent as a collection of orientable dipoles with nonuniform concentration; this leads to a nonlinear permittivity function that depends both on the position and on the local electric field at that position. The differences in the assumptions underlying these two models lead to significant differences in the equations they generate. The PB equation is a second order, elliptic, nonlinear partial differential equation (PDE). Its response coefficients correspond to the dielectric permittivity and are therefore constant within each subdomain of the system considered (i.e., inside and outside of the molecules considered). While the DPBL equation is also a second order, elliptic, nonlinear PDE, its response coefficients are nonlinear functions of the electrostatic potential. Many solvers have been developed for the PB equation; to our knowledge, none of these can be directly applied to the DPBL equation. The methods they use may adapt to the difference; their implementations however are PBE specific. We adapted the PBE solver originally developed by Holst and Saied [J. Comput. Chem. 16, 337 (1995)] to the problem of solving the DPBL equation. This solver uses a truncated Newton method with a multigrid preconditioner. Numerical evidences suggest that it converges for the DPBL equation and that the convergence is superlinear. It is found however to be slow and greedy in memory requirement for problems commonly encountered in computational biology and computational chemistry. To circumvent these problems, we propose two variants, a quasi-Newton solver based on a simplified, inexact Jacobian and an iterative self-consistent solver that is based directly on the PBE solver. While both methods are not guaranteed to converge, numerical evidences suggest that they do and that their convergence is also superlinear. Both variants are significantly faster than the solver based on the exact Jacobian, with a much smaller memory footprint. All three methods have been implemented in a new code named AQUASOL, which is freely available.
Benchmarking Defmod, an open source FEM code for modeling episodic fault rupture
NASA Astrophysics Data System (ADS)
Meng, Chunfang
2017-03-01
We present Defmod, an open source (linear) finite element code that enables us to efficiently model the crustal deformation due to (quasi-)static and dynamic loadings, poroelastic flow, viscoelastic flow and frictional fault slip. Ali (2015) provides the original code introducing an implicit solver for (quasi-)static problem, and an explicit solver for dynamic problem. The fault constraint is implemented via Lagrange Multiplier. Meng (2015) combines these two solvers into a hybrid solver that uses failure criteria and friction laws to adaptively switch between the (quasi-)static state and dynamic state. The code is capable of modeling episodic fault rupture driven by quasi-static loadings, e.g. due to reservoir fluid withdraw or injection. Here, we focus on benchmarking the Defmod results against some establish results.
A new extrapolation cascadic multigrid method for three dimensional elliptic boundary value problems
NASA Astrophysics Data System (ADS)
Pan, Kejia; He, Dongdong; Hu, Hongling; Ren, Zhengyong
2017-09-01
In this paper, we develop a new extrapolation cascadic multigrid method, which makes it possible to solve three dimensional elliptic boundary value problems with over 100 million unknowns on a desktop computer in half a minute. First, by combining Richardson extrapolation and quadratic finite element (FE) interpolation for the numerical solutions on two-level of grids (current and previous grids), we provide a quite good initial guess for the iterative solution on the next finer grid, which is a third-order approximation to the FE solution. And the resulting large linear system from the FE discretization is then solved by the Jacobi-preconditioned conjugate gradient (JCG) method with the obtained initial guess. Additionally, instead of performing a fixed number of iterations as used in existing cascadic multigrid methods, a relative residual tolerance is introduced in the JCG solver, which enables us to obtain conveniently the numerical solution with the desired accuracy. Moreover, a simple method based on the midpoint extrapolation formula is proposed to achieve higher-order accuracy on the finest grid cheaply and directly. Test results from four examples including two smooth problems with both constant and variable coefficients, an H3-regular problem as well as an anisotropic problem are reported to show that the proposed method has much better efficiency compared to the classical V-cycle and W-cycle multigrid methods. Finally, we present the reason why our method is highly efficient for solving these elliptic problems.
Nonconvex model predictive control for commercial refrigeration
NASA Astrophysics Data System (ADS)
Gybel Hovgaard, Tobias; Boyd, Stephen; Larsen, Lars F. S.; Bagterp Jørgensen, John
2013-08-01
We consider the control of a commercial multi-zone refrigeration system, consisting of several cooling units that share a common compressor, and is used to cool multiple areas or rooms. In each time period we choose cooling capacity to each unit and a common evaporation temperature. The goal is to minimise the total energy cost, using real-time electricity prices, while obeying temperature constraints on the zones. We propose a variation on model predictive control to achieve this goal. When the right variables are used, the dynamics of the system are linear, and the constraints are convex. The cost function, however, is nonconvex due to the temperature dependence of thermodynamic efficiency. To handle this nonconvexity we propose a sequential convex optimisation method, which typically converges in fewer than 5 or so iterations. We employ a fast convex quadratic programming solver to carry out the iterations, which is more than fast enough to run in real time. We demonstrate our method on a realistic model, with a full year simulation and 15-minute time periods, using historical electricity prices and weather data, as well as random variations in thermal load. These simulations show substantial cost savings, on the order of 30%, compared to a standard thermostat-based control system. Perhaps more important, we see that the method exhibits sophisticated response to real-time variations in electricity prices. This demand response is critical to help balance real-time uncertainties in generation capacity associated with large penetration of intermittent renewable energy sources in a future smart grid.
Adaptive and iterative methods for simulations of nanopores with the PNP-Stokes equations
NASA Astrophysics Data System (ADS)
Mitscha-Baude, Gregor; Buttinger-Kreuzhuber, Andreas; Tulzer, Gerhard; Heitzinger, Clemens
2017-06-01
We present a 3D finite element solver for the nonlinear Poisson-Nernst-Planck (PNP) equations for electrodiffusion, coupled to the Stokes system of fluid dynamics. The model serves as a building block for the simulation of macromolecule dynamics inside nanopore sensors. The source code is released online at http://github.com/mitschabaude/nanopores. We add to existing numerical approaches by deploying goal-oriented adaptive mesh refinement. To reduce the computation overhead of mesh adaptivity, our error estimator uses the much cheaper Poisson-Boltzmann equation as a simplified model, which is justified on heuristic grounds but shown to work well in practice. To address the nonlinearity in the full PNP-Stokes system, three different linearization schemes are proposed and investigated, with two segregated iterative approaches both outperforming a naive application of Newton's method. Numerical experiments are reported on a real-world nanopore sensor geometry. We also investigate two different models for the interaction of target molecules with the nanopore sensor through the PNP-Stokes equations. In one model, the molecule is of finite size and is explicitly built into the geometry; while in the other, the molecule is located at a single point and only modeled implicitly - after solution of the system - which is computationally favorable. We compare the resulting force profiles of the electric and velocity fields acting on the molecule, and conclude that the point-size model fails to capture important physical effects such as the dependence of charge selectivity of the sensor on the molecule radius.
A tightly-coupled domain-decomposition approach for highly nonlinear stochastic multiphysics systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Taverniers, Søren; Tartakovsky, Daniel M., E-mail: dmt@ucsd.edu
2017-02-01
Multiphysics simulations often involve nonlinear components that are driven by internally generated or externally imposed random fluctuations. When used with a domain-decomposition (DD) algorithm, such components have to be coupled in a way that both accurately propagates the noise between the subdomains and lends itself to a stable and cost-effective temporal integration. We develop a conservative DD approach in which tight coupling is obtained by using a Jacobian-free Newton–Krylov (JfNK) method with a generalized minimum residual iterative linear solver. This strategy is tested on a coupled nonlinear diffusion system forced by a truncated Gaussian noise at the boundary. Enforcement ofmore » path-wise continuity of the state variable and its flux, as opposed to continuity in the mean, at interfaces between subdomains enables the DD algorithm to correctly propagate boundary fluctuations throughout the computational domain. Reliance on a single Newton iteration (explicit coupling), rather than on the fully converged JfNK (implicit) coupling, may increase the solution error by an order of magnitude. Increase in communication frequency between the DD components reduces the explicit coupling's error, but makes it less efficient than the implicit coupling at comparable error levels for all noise strengths considered. Finally, the DD algorithm with the implicit JfNK coupling resolves temporally-correlated fluctuations of the boundary noise when the correlation time of the latter exceeds some multiple of an appropriately defined characteristic diffusion time.« less
Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems.
Anzt, H; Quintana-Ortí, E S
2014-06-28
While most recent breakthroughs in scientific research rely on complex simulations carried out in large-scale supercomputers, the power draft and energy spent for this purpose is increasingly becoming a limiting factor to this trend. In this paper, we provide an overview of the current status in energy-efficient scientific computing by reviewing different technologies used to monitor power draft as well as power- and energy-saving mechanisms available in commodity hardware. For the particular domain of sparse linear algebra, we analyse the energy efficiency of a broad collection of hardware architectures and investigate how algorithmic and implementation modifications can improve the energy performance of sparse linear system solvers, without negatively impacting their performance. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lott, P. Aaron; Woodward, Carol S.; Evans, Katherine J.
Performing accurate and efficient numerical simulation of global atmospheric climate models is challenging due to the disparate length and time scales over which physical processes interact. Implicit solvers enable the physical system to be integrated with a time step commensurate with processes being studied. The dominant cost of an implicit time step is the ancillary linear system solves, so we have developed a preconditioner aimed at improving the efficiency of these linear system solves. Our preconditioner is based on an approximate block factorization of the linearized shallow-water equations and has been implemented within the spectral element dynamical core within themore » Community Atmospheric Model (CAM-SE). Furthermore, in this paper we discuss the development and scalability of the preconditioner for a suite of test cases with the implicit shallow-water solver within CAM-SE.« less
Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units
NASA Astrophysics Data System (ADS)
Kemal, Jonathan Yashar
For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
Linear instabilities near the DIII-D edge simulated in fluid models
NASA Astrophysics Data System (ADS)
Bass, Eric; Holland, Christopher
2017-10-01
The linear instability spectrum is reported near the DIII-D edge (within the separatrix) for L-mode and H-mode shots using the new eigenvalue solver FluTES (Fluid Toroidal Eigenvalue Solver). FluTES circumvents difficulties with convergence to clean linear eigenmodes (required for diagnosis of nonlinear simulations in codes such as BOUT++) often encountered with fluid initial-value solvers. FluTES is well-verified in analytic cases and against a BOUT++/ELITE benchmark toroidal case. We report results for both a 3-field, one-fluid model (the well-known ``elm-pb'' model) and a 5-field, two-fluid model. For the peeling-ballooning-dominated H-mode, the two solutions are qualitatively the same. In the driftwave-dominated L-mode edge, only the two-fluid solution gives robust instabilities which occur primarily at n > 50 . FluTES is optimized for this regime (near-flutelike limit, toroidally spectral). Cross-separatrix, coupled fluid and drift instabilities may play a role in explaining the gyrokinetic L-mode edge transport shortfall. Extension of FluTES into the open-field-line region is underway. Prepared by UCSD under Contract Number DE-FG02-06ER54871.
Convergence Results on Iteration Algorithms to Linear Systems
Wang, Zhuande; Yang, Chuansheng; Yuan, Yubo
2014-01-01
In order to solve the large scale linear systems, backward and Jacobi iteration algorithms are employed. The convergence is the most important issue. In this paper, a unified backward iterative matrix is proposed. It shows that some well-known iterative algorithms can be deduced with it. The most important result is that the convergence results have been proved. Firstly, the spectral radius of the Jacobi iterative matrix is positive and the one of backward iterative matrix is strongly positive (lager than a positive constant). Secondly, the mentioned two iterations have the same convergence results (convergence or divergence simultaneously). Finally, some numerical experiments show that the proposed algorithms are correct and have the merit of backward methods. PMID:24991640
ERIC Educational Resources Information Center
Donoghue, John R.
2015-01-01
At the heart of van der Linden's approach to automated test assembly (ATA) is a linear programming/integer programming (LP/IP) problem. A variety of IP solvers are available, ranging in cost from free to hundreds of thousands of dollars. In this paper, I compare several approaches to solving the underlying IP problem. These approaches range from…
Distributed Memory Parallel Computing with SEAWAT
NASA Astrophysics Data System (ADS)
Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.
2017-12-01
Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources. Speed-ups up to 40 were obtained with the new PKS solver.
Born iterative reconstruction using perturbed-phase field estimates.
Astheimer, Jeffrey P; Waag, Robert C
2008-10-01
A method of image reconstruction from scattering measurements for use in ultrasonic imaging is presented. The method employs distorted-wave Born iteration but does not require using a forward-problem solver or solving large systems of equations. These calculations are avoided by limiting intermediate estimates of medium variations to smooth functions in which the propagated fields can be approximated by phase perturbations derived from variations in a geometric path along rays. The reconstruction itself is formed by a modification of the filtered-backpropagation formula that includes correction terms to account for propagation through an estimated background. Numerical studies that validate the method for parameter ranges of interest in medical applications are presented. The efficiency of this method offers the possibility of real-time imaging from scattering measurements.
Bindu, G.; Semenov, S.
2013-01-01
This paper describes an efficient two-dimensional fused image reconstruction approach for Microwave Tomography (MWT). Finite Difference Time Domain (FDTD) models were created for a viable MWT experimental system having the transceivers modelled using thin wire approximation with resistive voltage sources. Born Iterative and Distorted Born Iterative methods have been employed for image reconstruction with the extremity imaging being done using a differential imaging technique. The forward solver in the imaging algorithm employs the FDTD method of solving the time domain Maxwell’s equations with the regularisation parameter computed using a stochastic approach. The algorithm is tested with 10% noise inclusion and successful image reconstruction has been shown implying its robustness. PMID:24058889
NASA Astrophysics Data System (ADS)
Han, Song; Zhang, Wei; Zhang, Jie
2017-09-01
A fast sweeping method (FSM) determines the first arrival traveltimes of seismic waves by sweeping the velocity model in different directions meanwhile applying a local solver. It is an efficient way to numerically solve Hamilton-Jacobi equations for traveltime calculations. In this study, we develop an improved FSM to calculate the first arrival traveltimes of quasi-P (qP) waves in 2-D tilted transversely isotropic (TTI) media. A local solver utilizes the coupled slowness surface of qP and quasi-SV (qSV) waves to form a quartic equation, and solve it numerically to obtain possible traveltimes of qP-wave. The proposed quartic solver utilizes Fermat's principle to limit the range of the possible solution, then uses the bisection procedure to efficiently determine the real roots. With causality enforced during sweepings, our FSM converges fast in a few iterations, and the exact number depending on the complexity of the velocity model. To improve the accuracy, we employ high-order finite difference schemes and derive the second-order formulae. There is no weak anisotropy assumption, and no approximation is made to the complex slowness surface of qP-wave. In comparison to the traveltimes calculated by a horizontal slowness shooting method, the validity and accuracy of our FSM is demonstrated.
A coarse-grid projection method for accelerating incompressible flow computations
NASA Astrophysics Data System (ADS)
San, Omer; Staples, Anne E.
2013-01-01
We present a coarse-grid projection (CGP) method for accelerating incompressible flow computations, which is applicable to methods involving Poisson equations as incompressibility constraints. The CGP methodology is a modular approach that facilitates data transfer with simple interpolations and uses black-box solvers for the Poisson and advection-diffusion equations in the flow solver. After solving the Poisson equation on a coarsened grid, an interpolation scheme is used to obtain the fine data for subsequent time stepping on the full grid. A particular version of the method is applied here to the vorticity-stream function, primitive variable, and vorticity-velocity formulations of incompressible Navier-Stokes equations. We compute several benchmark flow problems on two-dimensional Cartesian and non-Cartesian grids, as well as a three-dimensional flow problem. The method is found to accelerate these computations while retaining a level of accuracy close to that of the fine resolution field, which is significantly better than the accuracy obtained for a similar computation performed solely using a coarse grid. A linear acceleration rate is obtained for all the cases we consider due to the linear-cost elliptic Poisson solver used, with reduction factors in computational time between 2 and 42. The computational savings are larger when a suboptimal Poisson solver is used. We also find that the computational savings increase with increasing distortion ratio on non-Cartesian grids, making the CGP method a useful tool for accelerating generalized curvilinear incompressible flow solvers.
Iterative methods for mixed finite element equations
NASA Technical Reports Server (NTRS)
Nakazawa, S.; Nagtegaal, J. C.; Zienkiewicz, O. C.
1985-01-01
Iterative strategies for the solution of indefinite system of equations arising from the mixed finite element method are investigated in this paper with application to linear and nonlinear problems in solid and structural mechanics. The augmented Hu-Washizu form is derived, which is then utilized to construct a family of iterative algorithms using the displacement method as the preconditioner. Two types of iterative algorithms are implemented. Those are: constant metric iterations which does not involve the update of preconditioner; variable metric iterations, in which the inverse of the preconditioning matrix is updated. A series of numerical experiments is conducted to evaluate the numerical performance with application to linear and nonlinear model problems.
NASA Astrophysics Data System (ADS)
O'Malley, D.; Le, E. B.; Vesselinov, V. V.
2015-12-01
We present a fast, scalable, and highly-implementable stochastic inverse method for characterization of aquifer heterogeneity. The method utilizes recent advances in randomized matrix algebra and exploits the structure of the Quasi-Linear Geostatistical Approach (QLGA), without requiring a structured grid like Fast-Fourier Transform (FFT) methods. The QLGA framework is a more stable version of Gauss-Newton iterates for a large number of unknown model parameters, but provides unbiased estimates. The methods are matrix-free and do not require derivatives or adjoints, and are thus ideal for complex models and black-box implementation. We also incorporate randomized least-square solvers and data-reduction methods, which speed up computation and simulate missing data points. The new inverse methodology is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. Inversion results based on series of synthetic problems with steady-state and transient calibration data are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pieper, Andreas; Kreutzer, Moritz; Alvermann, Andreas, E-mail: alvermann@physik.uni-greifswald.de
2016-11-15
We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need formore » matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10{sup 2} innermost eigenpairs of a topological insulator matrix with dimension 10{sup 9} derived from quantum physics applications.« less
Hammering Yucca Flat, Part One: P-Wave Velocity
NASA Astrophysics Data System (ADS)
Tang, D. G.; Abbott, R. E.; Preston, L. A.; Hampshire, J. B., II
2015-12-01
Explosion-source phenomenology is best studied when competing signals (such as instrument, site, and propagation effects), are well understood. The second phase of the Source Physics Experiments (SPE), is moving from granite geology to alluvium geology at Yucca Flat, Nevada National Security Site. To improve subsurface characterization of Yucca Flat (and therefore better understand propagation and site effects), an active-source seismic survey was conducted using a novel 13,000-kg impulsive hammer source. The source points, spaced 200 m apart, covered a N-S transect spanning 18 km. Three component, 2-Hz geophones were used to record useable signals out to 10 km. We inverted for P-wave velocity by computing travel times using a finite-difference 3D eikonal solver, and then compared that to the picked travel times using a linearized iterative inversion scheme. Preliminary results from traditional reflection processing methods are also presented. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Validation of the United States Marine Corps Qualified Candidate Population Model
2003-03-01
time. Fields are created in the database to support this forecasting. User forms and a macro are programmed in Microsoft VBA to develop the...at 0.001. To accomplish 50,000 iterations of a minimization problem, this study wrote a macro in the VBA programming language that guides the solver...success in the commissioning process. **To improve the diagnostics of this propensity model, other factors were considered as well. Applying SQL
Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains
Bunting, Gregory; Prakash, Arun; Walsh, Timothy; ...
2018-01-26
Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bunting, Gregory; Prakash, Arun; Walsh, Timothy
Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
NASA Astrophysics Data System (ADS)
Navas-Montilla, A.; Murillo, J.
2016-07-01
In this work, an arbitrary order HLL-type numerical scheme is constructed using the flux-ADER methodology. The proposed scheme is based on an augmented Derivative Riemann solver that was used for the first time in Navas-Montilla and Murillo (2015) [1]. Such solver, hereafter referred to as Flux-Source (FS) solver, was conceived as a high order extension of the augmented Roe solver and led to the generation of a novel numerical scheme called AR-ADER scheme. Here, we provide a general definition of the FS solver independently of the Riemann solver used in it. Moreover, a simplified version of the solver, referred to as Linearized-Flux-Source (LFS) solver, is presented. This novel version of the FS solver allows to compute the solution without requiring reconstruction of derivatives of the fluxes, nevertheless some drawbacks are evidenced. In contrast to other previously defined Derivative Riemann solvers, the proposed FS and LFS solvers take into account the presence of the source term in the resolution of the Derivative Riemann Problem (DRP), which is of particular interest when dealing with geometric source terms. When applied to the shallow water equations, the proposed HLLS-ADER and AR-ADER schemes can be constructed to fulfill the exactly well-balanced property, showing that an arbitrary quadrature of the integral of the source inside the cell does not ensure energy balanced solutions. As a result of this work, energy balanced flux-ADER schemes that provide the exact solution for steady cases and that converge to the exact solution with arbitrary order for transient cases are constructed.
Large Scale, High Resolution, Mantle Dynamics Modeling
NASA Astrophysics Data System (ADS)
Geenen, T.; Berg, A. V.; Spakman, W.
2007-12-01
To model the geodynamic evolution of plate convergence, subduction and collision and to allow for a connection to various types of observational data, geophysical, geodetical and geological, we developed a 4D (space-time) numerical mantle convection code. The model is based on a spherical 3D Eulerian fem model, with quadratic elements, on top of which we constructed a 3D Lagrangian particle in cell(PIC) method. We use the PIC method to transport material properties and to incorporate a viscoelastic rheology. Since capturing small scale processes associated with localization phenomena require a high resolution, we spend a considerable effort on implementing solvers suitable to solve for models with over 100 million degrees of freedom. We implemented Additive Schwartz type ILU based methods in combination with a Krylov solver, GMRES. However we found that for problems with over 500 thousend degrees of freedom the convergence of the solver degraded severely. This observation is known from the literature [Saad, 2003] and results from the local character of the ILU preconditioner resulting in a poor approximation of the inverse of A for large A. The size of A for which ILU is no longer usable depends on the condition of A and on the amount of fill in allowed for the ILU preconditioner. We found that for our problems with over 5×105 degrees of freedom convergence became to slow to solve the system within an acceptable amount of walltime, one minute, even when allowing for considerable amount of fill in. We also implemented MUMPS and found good scaling results for problems up to 107 degrees of freedom for up to 32 CPU¡¯s. For problems with over 100 million degrees of freedom we implemented Algebraic Multigrid type methods (AMG) from the ML library [Sala, 2006]. Since multigrid methods are most effective for single parameter problems, we rebuild our model to use the SIMPLE method in the Stokes solver [Patankar, 1980]. We present scaling results from these solvers for 3D spherical models. We also applied the above mentioned method to a high resolution (~ 1 km) 2D mantle convection model with temperature, pressure and phase dependent rheology including several phase transitions. We focus on a model of a subducting lithospheric slab which is subject to strong folding at the bottom of the mantle's D" region which includes the postperovskite phase boundary. For a detailed description of this model we refer to poster [Mantel convection models of the D" region, U17] [Saad, 2003] Saad, Y. (2003). Iterative methods for sparse linear systems. [Sala, 2006] Sala. M (2006) An Object-Oriented Framework for the Development of Scalable Parallel Multilevel Preconditioners. ACM Transactions on Mathematical Software, 32 (3), 2006 [Patankar, 1980] Patankar, S. V.(1980) Numerical Heat Transfer and Fluid Flow, Hemisphere, Washington.
A Build-Up Interior Method for Linear Programming: Affine Scaling Form
1990-02-01
initiating a major iteration imply convergence in a finite number of iterations. Each iteration t of the Dikin algorithm starts with an interior dual...this variant with the affine scaling method of Dikin [5] (in dual form). We have also looked into the analogous variant for the related Karmarkar’s...4] G. B. Dantzig, Linear Programming and Extensions (Princeton University Press, Princeton, NJ, 1963). [5] I. I. Dikin , "Iterative solution of
Isometric Non-Rigid Shape-from-Motion with Riemannian Geometry Solved in Linear Time.
Parashar, Shaifali; Pizarro, Daniel; Bartoli, Adrien
2017-10-06
We study Isometric Non-Rigid Shape-from-Motion (Iso-NRSfM): given multiple intrinsically calibrated monocular images, we want to reconstruct the time-varying 3D shape of a thin-shell object undergoing isometric deformations. We show that Iso-NRSfM is solvable from local warps, the inter-image geometric transformations. We propose a new theoretical framework based on the Riemmanian manifold to represent the unknown 3D surfaces as embeddings of the camera's retinal plane. This allows us to use the manifold's metric tensor and Christoffel Symbol (CS) fields. These are expressed in terms of the first and second order derivatives of the inverse-depth of the 3D surfaces, which are the unknowns for Iso-NRSfM. We prove that the metric tensor and the CS are related across images by simple rules depending only on the warps. This forms a set of important theoretical results. We show that current solvers cannot solve for the first and second order derivatives of the inverse-depth simultaneously. We thus propose an iterative solution in two steps. 1) We solve for the first order derivatives assuming that the second order derivatives are known. We initialise the second order derivatives to zero, which is an infinitesimal planarity assumption. We derive a system of two cubics in two variables for each image pair. The sum-of-squares of these polynomials is independent of the number of images and can be solved globally, forming a well-posed problem for N ≥ 3 images. 2) We solve for the second order derivatives by initialising the first order derivatives from the previous step. We solve a linear system of 4N-4 equations in three variables. We iterate until the first order derivatives converge. The solution for the first order derivatives gives the surfaces' normal fields which we integrate to recover the 3D surfaces. The proposed method outperforms existing work in terms of accuracy and computation cost on synthetic and real datasets.
NASA Astrophysics Data System (ADS)
Parris, B. A.; Egbert, G. D.; Key, K.; Livelybrooks, D.
2016-12-01
Magnetotellurics (MT) is an electromagnetic technique used to model the inner Earth's electrical conductivity structure. MT data can be analyzed using iterative, linearized inversion techniques to generate models imaging, in particular, conductive partial melts and aqueous fluids that play critical roles in subduction zone processes and volcanism. For example, the Magnetotelluric Observations of Cascadia using a Huge Array (MOCHA) experiment provides amphibious data useful for imaging subducted fluids from trench to mantle wedge corner. When using MOD3DEM(Egbert et al. 2012), a finite difference inversion package, we have encountered problems inverting, particularly, sea floor stations due to the strong, nearby conductivity gradients. As a work-around, we have found that denser, finer model grids near the land-sea interface produce better inversions, as characterized by reduced data residuals. This is partly to be due to our ability to more accurately capture topography and bathymetry. We are experimenting with improved interpolation schemes that more accurately track EM fields across cell boundaries, with an eye to enhancing the accuracy of the simulated responses and, thus, inversion results. We are adapting how MOD3DEM interpolates EM fields in two ways. The first seeks to improve weighting functions for interpolants to better address current continuity across grid boundaries. Electric fields are interpolated using a tri-linear spline technique, where the eight nearest electrical field estimates are each given weights determined by the technique, a kind of weighted average. We are modifying these weights to include cross-boundary conductivity ratios to better model current continuity. We are also adapting some of the techniques discussed in Shantsev et al (2014) to enhance the accuracy of the interpolated fields calculated by our forward solver, as well as to better approximate the sensitivities passed to the software's Jacobian that are used to generate a new forward model during each iteration of the inversion.
Investigation of iterative image reconstruction in low-dose breast CT
NASA Astrophysics Data System (ADS)
Bian, Junguo; Yang, Kai; Boone, John M.; Han, Xiao; Sidky, Emil Y.; Pan, Xiaochuan
2014-06-01
There is interest in developing computed tomography (CT) dedicated to breast-cancer imaging. Because breast tissues are radiation-sensitive, the total radiation exposure in a breast-CT scan is kept low, often comparable to a typical two-view mammography exam, thus resulting in a challenging low-dose-data-reconstruction problem. In recent years, evidence has been found that suggests that iterative reconstruction may yield images of improved quality from low-dose data. In this work, based upon the constrained image total-variation minimization program and its numerical solver, i.e., the adaptive steepest descent-projection onto the convex set (ASD-POCS), we investigate and evaluate iterative image reconstructions from low-dose breast-CT data of patients, with a focus on identifying and determining key reconstruction parameters, devising surrogate utility metrics for characterizing reconstruction quality, and tailoring the program and ASD-POCS to the specific reconstruction task under consideration. The ASD-POCS reconstructions appear to outperform the corresponding clinical FDK reconstructions, in terms of subjective visualization and surrogate utility metrics.
NASA Technical Reports Server (NTRS)
Yang, Cheng I.; Guo, Yan-Hu; Liu, C.- H.
1996-01-01
The analysis and design of a submarine propulsor requires the ability to predict the characteristics of both laminar and turbulent flows to a higher degree of accuracy. This report presents results of certain benchmark computations based on an upwind, high-resolution, finite-differencing Navier-Stokes solver. The purpose of the computations is to evaluate the ability, the accuracy and the performance of the solver in the simulation of detailed features of viscous flows. Features of interest include flow separation and reattachment, surface pressure and skin friction distributions. Those features are particularly relevant to the propulsor analysis. Test cases with a wide range of Reynolds numbers are selected; therefore, the effects of the convective and the diffusive terms of the solver can be evaluated separately. Test cases include flows over bluff bodies, such as circular cylinders and spheres, at various low Reynolds numbers, flows over a flat plate with and without turbulence effects, and turbulent flows over axisymmetric bodies with and without propulsor effects. Finally, to enhance the iterative solution procedure, a full approximation scheme V-cycle multigrid method is implemented. Preliminary results indicate that the method significantly reduces the computational effort.
An implicit numerical scheme for the simulation of internal viscous flows on unstructured grids
NASA Technical Reports Server (NTRS)
Jorgenson, Philip C. E.; Pletcher, Richard H.
1994-01-01
The Navier-Stokes equations are solved numerically for two-dimensional steady viscous laminar flows. The grids are generated based on the method of Delaunay triangulation. A finite-volume approach is used to discretize the conservation law form of the compressible flow equations written in terms of primitive variables. A preconditioning matrix is added to the equations so that low Mach number flows can be solved economically. The equations are time marched using either an implicit Gauss-Seidel iterative procedure or a solver based on a conjugate gradient like method. A four color scheme is employed to vectorize the block Gauss-Seidel relaxation procedure. This increases the memory requirements minimally and decreases the computer time spent solving the resulting system of equations substantially. A factor of 7.6 speed up in the matrix solver is typical for the viscous equations. Numerical results are obtained for inviscid flow over a bump in a channel at subsonic and transonic conditions for validation with structured solvers. Viscous results are computed for developing flow in a channel, a symmetric sudden expansion, periodic tandem cylinders in a cross-flow, and a four-port valve. Comparisons are made with available results obtained by other investigators.
A stable partitioned FSI algorithm for incompressible flow and deforming beams
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, L., E-mail: lil19@rpi.edu; Henshaw, W.D., E-mail: henshw@rpi.edu; Banks, J.W., E-mail: banksj3@rpi.edu
2016-05-01
An added-mass partitioned (AMP) algorithm is described for solving fluid–structure interaction (FSI) problems coupling incompressible flows with thin elastic structures undergoing finite deformations. The new AMP scheme is fully second-order accurate and stable, without sub-time-step iterations, even for very light structures when added-mass effects are strong. The fluid, governed by the incompressible Navier–Stokes equations, is solved in velocity-pressure form using a fractional-step method; large deformations are treated with a mixed Eulerian-Lagrangian approach on deforming composite grids. The motion of the thin structure is governed by a generalized Euler–Bernoulli beam model, and these equations are solved in a Lagrangian frame usingmore » two approaches, one based on finite differences and the other on finite elements. The key AMP interface condition is a generalized Robin (mixed) condition on the fluid pressure. This condition, which is derived at a continuous level, has no adjustable parameters and is applied at the discrete level to couple the partitioned domain solvers. Special treatment of the AMP condition is required to couple the finite-element beam solver with the finite-difference-based fluid solver, and two coupling approaches are described. A normal-mode stability analysis is performed for a linearized model problem involving a beam separating two fluid domains, and it is shown that the AMP scheme is stable independent of the ratio of the mass of the fluid to that of the structure. A traditional partitioned (TP) scheme using a Dirichlet–Neumann coupling for the same model problem is shown to be unconditionally unstable if the added mass of the fluid is too large. A series of benchmark problems of increasing complexity are considered to illustrate the behavior of the AMP algorithm, and to compare the behavior with that of the TP scheme. The results of all these benchmark problems verify the stability and accuracy of the AMP scheme. Results for one benchmark problem modeling blood flow in a deforming artery are also compared with corresponding results available in the literature.« less
NASA Astrophysics Data System (ADS)
Li, Zhifu; Hu, Yueming; Li, Di
2016-08-01
For a class of linear discrete-time uncertain systems, a feedback feed-forward iterative learning control (ILC) scheme is proposed, which is comprised of an iterative learning controller and two current iteration feedback controllers. The iterative learning controller is used to improve the performance along the iteration direction and the feedback controllers are used to improve the performance along the time direction. First of all, the uncertain feedback feed-forward ILC system is presented by an uncertain two-dimensional Roesser model system. Then, two robust control schemes are proposed. One can ensure that the feedback feed-forward ILC system is bounded-input bounded-output stable along time direction, and the other can ensure that the feedback feed-forward ILC system is asymptotically stable along time direction. Both schemes can guarantee the system is robust monotonically convergent along the iteration direction. Third, the robust convergent sufficient conditions are given, which contains a linear matrix inequality (LMI). Moreover, the LMI can be used to determine the gain matrix of the feedback feed-forward iterative learning controller. Finally, the simulation results are presented to demonstrate the effectiveness of the proposed schemes.
A massively parallel adaptive scheme for melt migration in geodynamics computations
NASA Astrophysics Data System (ADS)
Dannberg, Juliane; Heister, Timo; Grove, Ryan
2016-04-01
Melt generation and migration are important processes for the evolution of the Earth's interior and impact the global convection of the mantle. While they have been the subject of numerous investigations, the typical time and length-scales of melt transport are vastly different from global mantle convection, which determines where melt is generated. This makes it difficult to study mantle convection and melt migration in a unified framework. In addition, modelling magma dynamics poses the challenge of highly non-linear and spatially variable material properties, in particular the viscosity. We describe our extension of the community mantle convection code ASPECT that adds equations describing the behaviour of silicate melt percolating through and interacting with a viscously deforming host rock. We use the original compressible formulation of the McKenzie equations, augmented by an equation for the conservation of energy. This approach includes both melt migration and melt generation with the accompanying latent heat effects, and it incorporates the individual compressibilities of the solid and the fluid phase. For this, we derive an accurate and stable Finite Element scheme that can be combined with adaptive mesh refinement. This is particularly advantageous for this type of problem, as the resolution can be increased in mesh cells where melt is present and viscosity gradients are high, whereas a lower resolution is sufficient in regions without melt. Together with a high-performance, massively parallel implementation, this allows for high resolution, 3d, compressible, global mantle convection simulations coupled with melt migration. Furthermore, scalable iterative linear solvers are required to solve the large linear systems arising from the discretized system. Finally, we present benchmarks and scaling tests of our solver up to tens of thousands of cores, show the effectiveness of adaptive mesh refinement when applied to melt migration and compare the compressible and incompressible formulation. We then apply our software to large-scale 3d simulations of melting and melt transport in mantle plumes interacting with the lithosphere. Our model of magma dynamics provides a framework for modelling processes on different scales and investigating links between processes occurring in the deep mantle and melt generation and migration. The presented implementation is available online under an Open Source license together with an extensive documentation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vecharynski, Eugene; Brabec, Jiri; Shao, Meiyue
Within this paper, we present two efficient iterative algorithms for solving the linear response eigenvalue problem arising from the time dependent density functional theory. Although the matrix to be diagonalized is nonsymmetric, it has a special structure that can be exploited to save both memory and floating point operations. In particular, the nonsymmetric eigenvalue problem can be transformed into an eigenvalue problem that involves the product of two matrices M and K. We show that, because MK is self-adjoint with respect to the inner product induced by the matrix K, this product eigenvalue problem can be solved efficiently by amore » modified Davidson algorithm and a modified locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm that make use of the K-inner product. Additionally, the solution of the product eigenvalue problem yields one component of the eigenvector associated with the original eigenvalue problem. We show that the other component of the eigenvector can be easily recovered in an inexpensive postprocessing procedure. As a result, the algorithms we present here become more efficient than existing methods that try to approximate both components of the eigenvectors simultaneously. In particular, our numerical experiments demonstrate that the new algorithms presented here consistently outperform the existing state-of-the-art Davidson type solvers by a factor of two in both solution time and storage.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vecharynski, Eugene; Brabec, Jiri; Shao, Meiyue
In this article, we present two efficient iterative algorithms for solving the linear response eigenvalue problem arising from the time dependent density functional theory. Although the matrix to be diagonalized is nonsymmetric, it has a special structure that can be exploited to save both memory and floating point operations. In particular, the nonsymmetric eigenvalue problem can be transformed into an eigenvalue problem that involves the product of two matrices M and K. We show that, because MK is self-adjoint with respect to the inner product induced by the matrix K, this product eigenvalue problem can be solved efficiently by amore » modified Davidson algorithm and a modified locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm that make use of the K-inner product. The solution of the product eigenvalue problem yields one component of the eigenvector associated with the original eigenvalue problem. We show that the other component of the eigenvector can be easily recovered in an inexpensive postprocessing procedure. As a result, the algorithms we present here become more efficient than existing methods that try to approximate both components of the eigenvectors simultaneously. In particular, our numerical experiments demonstrate that the new algorithms presented here consistently outperform the existing state-of-the-art Davidson type solvers by a factor of two in both solution time and storage.« less
An Efficient Multiscale Finite-Element Method for Frequency-Domain Seismic Wave Propagation
Gao, Kai; Fu, Shubin; Chung, Eric T.
2018-02-13
The frequency-domain seismic-wave equation, that is, the Helmholtz equation, has many important applications in seismological studies, yet is very challenging to solve, particularly for large geological models. Iterative solvers, domain decomposition, or parallel strategies can partially alleviate the computational burden, but these approaches may still encounter nontrivial difficulties in complex geological models where a sufficiently fine mesh is required to represent the fine-scale heterogeneities. We develop a novel numerical method to solve the frequency-domain acoustic wave equation on the basis of the multiscale finite-element theory. We discretize a heterogeneous model with a coarse mesh and employ carefully constructed high-order multiscalemore » basis functions to form the basis space for the coarse mesh. Solved from medium- and frequency-dependent local problems, these multiscale basis functions can effectively capture themedium’s fine-scale heterogeneity and the source’s frequency information, leading to a discrete system matrix with a much smaller dimension compared with those from conventional methods.We then obtain an accurate solution to the acoustic Helmholtz equation by solving only a small linear system instead of a large linear system constructed on the fine mesh in conventional methods.We verify our new method using several models of complicated heterogeneities, and the results show that our new multiscale method can solve the Helmholtz equation in complex models with high accuracy and extremely low computational costs.« less
An Efficient Multiscale Finite-Element Method for Frequency-Domain Seismic Wave Propagation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Kai; Fu, Shubin; Chung, Eric T.
The frequency-domain seismic-wave equation, that is, the Helmholtz equation, has many important applications in seismological studies, yet is very challenging to solve, particularly for large geological models. Iterative solvers, domain decomposition, or parallel strategies can partially alleviate the computational burden, but these approaches may still encounter nontrivial difficulties in complex geological models where a sufficiently fine mesh is required to represent the fine-scale heterogeneities. We develop a novel numerical method to solve the frequency-domain acoustic wave equation on the basis of the multiscale finite-element theory. We discretize a heterogeneous model with a coarse mesh and employ carefully constructed high-order multiscalemore » basis functions to form the basis space for the coarse mesh. Solved from medium- and frequency-dependent local problems, these multiscale basis functions can effectively capture themedium’s fine-scale heterogeneity and the source’s frequency information, leading to a discrete system matrix with a much smaller dimension compared with those from conventional methods.We then obtain an accurate solution to the acoustic Helmholtz equation by solving only a small linear system instead of a large linear system constructed on the fine mesh in conventional methods.We verify our new method using several models of complicated heterogeneities, and the results show that our new multiscale method can solve the Helmholtz equation in complex models with high accuracy and extremely low computational costs.« less
Vecharynski, Eugene; Brabec, Jiri; Shao, Meiyue; ...
2017-12-01
In this article, we present two efficient iterative algorithms for solving the linear response eigenvalue problem arising from the time dependent density functional theory. Although the matrix to be diagonalized is nonsymmetric, it has a special structure that can be exploited to save both memory and floating point operations. In particular, the nonsymmetric eigenvalue problem can be transformed into an eigenvalue problem that involves the product of two matrices M and K. We show that, because MK is self-adjoint with respect to the inner product induced by the matrix K, this product eigenvalue problem can be solved efficiently by amore » modified Davidson algorithm and a modified locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm that make use of the K-inner product. The solution of the product eigenvalue problem yields one component of the eigenvector associated with the original eigenvalue problem. We show that the other component of the eigenvector can be easily recovered in an inexpensive postprocessing procedure. As a result, the algorithms we present here become more efficient than existing methods that try to approximate both components of the eigenvectors simultaneously. In particular, our numerical experiments demonstrate that the new algorithms presented here consistently outperform the existing state-of-the-art Davidson type solvers by a factor of two in both solution time and storage.« less
Vecharynski, Eugene; Brabec, Jiri; Shao, Meiyue; ...
2017-08-24
Within this paper, we present two efficient iterative algorithms for solving the linear response eigenvalue problem arising from the time dependent density functional theory. Although the matrix to be diagonalized is nonsymmetric, it has a special structure that can be exploited to save both memory and floating point operations. In particular, the nonsymmetric eigenvalue problem can be transformed into an eigenvalue problem that involves the product of two matrices M and K. We show that, because MK is self-adjoint with respect to the inner product induced by the matrix K, this product eigenvalue problem can be solved efficiently by amore » modified Davidson algorithm and a modified locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm that make use of the K-inner product. Additionally, the solution of the product eigenvalue problem yields one component of the eigenvector associated with the original eigenvalue problem. We show that the other component of the eigenvector can be easily recovered in an inexpensive postprocessing procedure. As a result, the algorithms we present here become more efficient than existing methods that try to approximate both components of the eigenvectors simultaneously. In particular, our numerical experiments demonstrate that the new algorithms presented here consistently outperform the existing state-of-the-art Davidson type solvers by a factor of two in both solution time and storage.« less
NASA Astrophysics Data System (ADS)
LeMesurier, Brenton
2012-01-01
A new approach is described for generating exactly energy-momentum conserving time discretizations for a wide class of Hamiltonian systems of DEs with quadratic momenta, including mechanical systems with central forces; it is well-suited in particular to the large systems that arise in both spatial discretizations of nonlinear wave equations and lattice equations such as the Davydov System modeling energetic pulse propagation in protein molecules. The method is unconditionally stable, making it well-suited to equations of broadly “Discrete NLS form”, including many arising in nonlinear optics. Key features of the resulting discretizations are exact conservation of both the Hamiltonian and quadratic conserved quantities related to continuous linear symmetries, preservation of time reversal symmetry, unconditional stability, and respecting the linearity of certain terms. The last feature allows a simple, efficient iterative solution of the resulting nonlinear algebraic systems that retain unconditional stability, avoiding the need for full Newton-type solvers. One distinction from earlier work on conservative discretizations is a new and more straightforward nearly canonical procedure for constructing the discretizations, based on a “discrete gradient calculus with product rule” that mimics the essential properties of partial derivatives. This numerical method is then used to study the Davydov system, revealing that previously conjectured continuum limit approximations by NLS do not hold, but that sech-like pulses related to NLS solitons can nevertheless sometimes arise.
A depth-first search algorithm to compute elementary flux modes by linear programming
2014-01-01
Background The decomposition of complex metabolic networks into elementary flux modes (EFMs) provides a useful framework for exploring reaction interactions systematically. Generating a complete set of EFMs for large-scale models, however, is near impossible. Even for moderately-sized models (<400 reactions), existing approaches based on the Double Description method must iterate through a large number of combinatorial candidates, thus imposing an immense processor and memory demand. Results Based on an alternative elementarity test, we developed a depth-first search algorithm using linear programming (LP) to enumerate EFMs in an exhaustive fashion. Constraints can be introduced to directly generate a subset of EFMs satisfying the set of constraints. The depth-first search algorithm has a constant memory overhead. Using flux constraints, a large LP problem can be massively divided and parallelized into independent sub-jobs for deployment into computing clusters. Since the sub-jobs do not overlap, the approach scales to utilize all available computing nodes with minimal coordination overhead or memory limitations. Conclusions The speed of the algorithm was comparable to efmtool, a mainstream Double Description method, when enumerating all EFMs; the attrition power gained from performing flux feasibility tests offsets the increased computational demand of running an LP solver. Unlike the Double Description method, the algorithm enables accelerated enumeration of all EFMs satisfying a set of constraints. PMID:25074068
Exploiting Data Sparsity in Parallel Matrix Powers Computations
2013-05-03
2013 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour...matrices of the form A = D+USV H, where D is sparse and USV H has low rank but may be dense. Matrices of this form arise in many practical applications...methods numerical partial di erential equation solvers, and preconditioned iterative methods. If A has this form , our algorithm enables a communication
A geometric multigrid preconditioning strategy for DPG system matrices
Roberts, Nathan V.; Chan, Jesse
2017-08-23
Here, the discontinuous Petrov–Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan (2010, 2011) guarantees the optimality of the solution in an energy norm, and provides several features facilitating adaptive schemes. A key question that has not yet been answered in general – though there are some results for Poisson, e.g.– is how best to precondition the DPG system matrix, so that iterative solvers may be used to allow solution of large-scale problems.
Li, Yang; Bechhoefer, John
2009-01-01
We introduce an algorithm for calculating, offline or in real time and with no explicit system characterization, the feedforward input required for repetitive motions of a system. The algorithm is based on the secant method of numerical analysis and gives accurate motion at frequencies limited only by the signal-to-noise ratio and the actuator power and range. We illustrate the secant-solver algorithm on a stage used for atomic force microscopy.
NASA Technical Reports Server (NTRS)
Zinnecker, Alicia M.; Chapman, Jeffryes W.; Lavelle, Thomas M.; Litt, Johathan S.
2014-01-01
The Toolbox for Modeling and Analysis of Thermodynamic Systems (T-MATS) is a tool that has been developed to allow a user to build custom models of systems governed by thermodynamic principles using a template to model each basic process. Validation of this tool in an engine model application was performed through reconstruction of the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) (v2) using the building blocks from the T-MATS (v1) library. In order to match the two engine models, it was necessary to address differences in several assumptions made in the two modeling approaches. After these modifications were made, validation of the engine model continued by integrating both a steady-state and dynamic iterative solver with the engine plant and comparing results from steady-state and transient simulation of the T-MATS and C-MAPSS models. The results show that the T-MATS engine model was accurate within 3 of the C-MAPSS model, with inaccuracy attributed to the increased dimension of the iterative solver solution space required by the engine model constructed using the T-MATS library. This demonstrates that, given an understanding of the modeling assumptions made in T-MATS and a baseline model, the T-MATS tool provides a viable option for constructing a computational model of a twin-spool turbofan engine that may be used in simulation studies.
Sound transmission through a poroelastic layered panel
NASA Astrophysics Data System (ADS)
Nagler, Loris; Rong, Ping; Schanz, Martin; von Estorff, Otto
2014-04-01
Multi-layered panels are often used to improve the acoustics in cars, airplanes, rooms, etc. For such an application these panels include porous and/or fibrous layers. The proposed numerical method is an approach to simulate the acoustical behavior of such multi-layered panels. The model assumes plate-like structures and, hence, combines plate theories for the different layers. The poroelastic layer is modelled with a recently developed plate theory. This theory uses a series expansion in thickness direction with subsequent analytical integration in this direction to reduce the three dimensions to two. The same idea is used to model either air gaps or fibrous layers. The latter are modeled as equivalent fluid and can be handled like an air gap, i.e., a kind of `air plate' is used. The coupling of the layers is done by using the series expansion to express the continuity conditions on the surfaces of the plates. The final system is solved with finite elements, where domain decomposition techniques in combination with preconditioned iterative solvers are applied to solve the final system of equations. In a large frequency range, the comparison with measurements shows very good agreement. From the numerical solution process it can be concluded that different preconditioners for the different layers are necessary. A reuse of the Krylov subspace of the iterative solvers pays if several excitations have to be computed but not that much in the loop over the frequencies.
Computational Challenges of 3D Radiative Transfer in Atmospheric Models
NASA Astrophysics Data System (ADS)
Jakub, Fabian; Bernhard, Mayer
2017-04-01
The computation of radiative heating and cooling rates is one of the most expensive components in todays atmospheric models. The high computational cost stems not only from the laborious integration over a wide range of the electromagnetic spectrum but also from the fact that solving the integro-differential radiative transfer equation for monochromatic light is already rather involved. This lead to the advent of numerous approximations and parameterizations to reduce the cost of the solver. One of the most prominent one is the so called independent pixel approximations (IPA) where horizontal energy transfer is neglected whatsoever and radiation may only propagate in the vertical direction (1D). Recent studies implicate that the IPA introduces significant errors in high resolution simulations and affects the evolution and development of convective systems. However, using fully 3D solvers such as for example MonteCarlo methods is not even on state of the art supercomputers feasible. The parallelization of atmospheric models is often realized by a horizontal domain decomposition, and hence, horizontal transfer of energy necessitates communication. E.g. a cloud's shadow at a low zenith angle will cast a long shadow and potentially needs to communication through a multitude of processors. Especially light in the solar spectral range may travel long distances through the atmosphere. Concerning highly parallel simulations, it is vital that 3D radiative transfer solvers put a special emphasis on parallel scalability. We will present an introduction to intricacies computing 3D radiative heating and cooling rates as well as report on the parallel performance of the TenStream solver. The TenStream is a 3D radiative transfer solver using the PETSc framework to iteratively solve a set of partial differential equation. We investigate two matrix preconditioners, (a) geometric algebraic multigrid preconditioning(MG+GAMG) and (b) block Jacobi incomplete LU (ILU) factorization. The TenStream solver is tested for up to 4096 cores and shows a parallel scaling efficiency of 80-90% on various supercomputers.
QCAD simulation and optimization of semiconductor double quantum dots
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nielsen, Erik; Gao, Xujiao; Kalashnikova, Irina
2013-12-01
We present the Quantum Computer Aided Design (QCAD) simulator that targets modeling quantum devices, particularly silicon double quantum dots (DQDs) developed for quantum qubits. The simulator has three di erentiating features: (i) its core contains nonlinear Poisson, e ective mass Schrodinger, and Con guration Interaction solvers that have massively parallel capability for high simulation throughput, and can be run individually or combined self-consistently for 1D/2D/3D quantum devices; (ii) the core solvers show superior convergence even at near-zero-Kelvin temperatures, which is critical for modeling quantum computing devices; (iii) it couples with an optimization engine Dakota that enables optimization of gate voltagesmore » in DQDs for multiple desired targets. The Poisson solver includes Maxwell- Boltzmann and Fermi-Dirac statistics, supports Dirichlet, Neumann, interface charge, and Robin boundary conditions, and includes the e ect of dopant incomplete ionization. The solver has shown robust nonlinear convergence even in the milli-Kelvin temperature range, and has been extensively used to quickly obtain the semiclassical electrostatic potential in DQD devices. The self-consistent Schrodinger-Poisson solver has achieved robust and monotonic convergence behavior for 1D/2D/3D quantum devices at very low temperatures by using a predictor-correct iteration scheme. The QCAD simulator enables the calculation of dot-to-gate capacitances, and comparison with experiment and between solvers. It is observed that computed capacitances are in the right ballpark when compared to experiment, and quantum con nement increases capacitance when the number of electrons is xed in a quantum dot. In addition, the coupling of QCAD with Dakota allows to rapidly identify which device layouts are more likely leading to few-electron quantum dots. Very efficient QCAD simulations on a large number of fabricated and proposed Si DQDs have made it possible to provide fast feedback for design comparison and optimization.« less
Quantifying the Energy Efficiency of Object Recognition and Optical Flow
2014-03-28
other linear solvers, such as conjugate- gradient (CG), preconditioned conjugate-gradient (PCG), and red-black Gauss Seidel (RB). We have also... Seidel , and conjugate gradient solvers. We are interested in the energy it takes to get a given solution quality. In Figure 6, we plot the quality of...in terms of Joules. Conversely, our implementation of red-black Gauss Seidel proves to be very inefficient when we consider Joules instead of just
Description and use of LSODE, the Livermore Solver for Ordinary Differential Equations
NASA Technical Reports Server (NTRS)
Radhakrishnan, Krishnan; Hindmarsh, Alan C.
1993-01-01
LSODE, the Livermore Solver for Ordinary Differential Equations, is a package of FORTRAN subroutines designed for the numerical solution of the initial value problem for a system of ordinary differential equations. It is particularly well suited for 'stiff' differential systems, for which the backward differentiation formula method of orders 1 to 5 is provided. The code includes the Adams-Moulton method of orders 1 to 12, so it can be used for nonstiff problems as well. In addition, the user can easily switch methods to increase computational efficiency for problems that change character. For both methods a variety of corrector iteration techniques is included in the code. Also, to minimize computational work, both the step size and method order are varied dynamically. This report presents complete descriptions of the code and integration methods, including their implementation. It also provides a detailed guide to the use of the code, as well as an illustrative example problem.
Unsymmetric ordering using a constrained Markowitz scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amestoy, Patrick R.; Xiaoye S.; Pralet, Stephane
2005-01-18
We present a family of ordering algorithms that can be used as a preprocessing step prior to performing sparse LU factorization. The ordering algorithms simultaneously achieve the objectives of selecting numerically good pivots and preserving the sparsity. We describe the algorithmic properties and challenges in their implementation. By mixing the two objectives we show that we can reduce the amount of fill-in in the factors and reduce the number of numerical problems during factorization. On a set of large unsymmetric real problems, we obtained the median reductions of 12% in the factorization time, of 13% in the size of themore » LU factors, of 20% in the number of operations performed during the factorization phase, and of 11% in the memory needed by the multifrontal solver MA41-UNS. A byproduct of this ordering strategy is an incomplete LU-factored matrix that can be used as a preconditioner in an iterative solver.« less
Multiscale solvers and systematic upscaling in computational physics
NASA Astrophysics Data System (ADS)
Brandt, A.
2005-07-01
Multiscale algorithms can overcome the scale-born bottlenecks that plague most computations in physics. These algorithms employ separate processing at each scale of the physical space, combined with interscale iterative interactions, in ways which use finer scales very sparingly. Having been developed first and well known as multigrid solvers for partial differential equations, highly efficient multiscale techniques have more recently been developed for many other types of computational tasks, including: inverse PDE problems; highly indefinite (e.g., standing wave) equations; Dirac equations in disordered gauge fields; fast computation and updating of large determinants (as needed in QCD); fast integral transforms; integral equations; astrophysics; molecular dynamics of macromolecules and fluids; many-atom electronic structures; global and discrete-state optimization; practical graph problems; image segmentation and recognition; tomography (medical imaging); fast Monte-Carlo sampling in statistical physics; and general, systematic methods of upscaling (accurate numerical derivation of large-scale equations from microscopic laws).
Barrier-breaking performance for industrial problems on the CRAY C916
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graffunder, S.K.
1993-12-31
Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less
NASA Technical Reports Server (NTRS)
Chang, S. C.
1986-01-01
An algorithm for solving a large class of two- and three-dimensional nonseparable elliptic partial differential equations (PDE's) is developed and tested. It uses a modified D'Yakanov-Gunn iterative procedure in which the relaxation factor is grid-point dependent. It is easy to implement and applicable to a variety of boundary conditions. It is also computationally efficient, as indicated by the results of numerical comparisons with other established methods. Furthermore, the current algorithm has the advantage of possessing two important properties which the traditional iterative methods lack; that is: (1) the convergence rate is relatively insensitive to grid-cell size and aspect ratio, and (2) the convergence rate can be easily estimated by using the coefficient of the PDE being solved.
Born iterative reconstruction using perturbed-phase field estimates
Astheimer, Jeffrey P.; Waag, Robert C.
2008-01-01
A method of image reconstruction from scattering measurements for use in ultrasonic imaging is presented. The method employs distorted-wave Born iteration but does not require using a forward-problem solver or solving large systems of equations. These calculations are avoided by limiting intermediate estimates of medium variations to smooth functions in which the propagated fields can be approximated by phase perturbations derived from variations in a geometric path along rays. The reconstruction itself is formed by a modification of the filtered-backpropagation formula that includes correction terms to account for propagation through an estimated background. Numerical studies that validate the method for parameter ranges of interest in medical applications are presented. The efficiency of this method offers the possibility of real-time imaging from scattering measurements. PMID:19062873
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.
1990-01-01
Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
NASA Astrophysics Data System (ADS)
Bernede, Adrien; Poëtte, Gaël
2018-02-01
In this paper, we are interested in the resolution of the time-dependent problem of particle transport in a medium whose composition evolves with time due to interactions. As a constraint, we want to use of Monte-Carlo (MC) scheme for the transport phase. A common resolution strategy consists in a splitting between the MC/transport phase and the time discretization scheme/medium evolution phase. After going over and illustrating the main drawbacks of split solvers in a simplified configuration (monokinetic, scalar Bateman problem), we build a new Unsplit MC (UMC) solver improving the accuracy of the solutions, avoiding numerical instabilities, and less sensitive to time discretization. The new solver is essentially based on a Monte Carlo scheme with time dependent cross sections implying the on-the-fly resolution of a reduced model for each MC particle describing the time evolution of the matter along their flight path.
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...
2017-09-14
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
A fast iterative scheme for the linearized Boltzmann equation
NASA Astrophysics Data System (ADS)
Wu, Lei; Zhang, Jun; Liu, Haihu; Zhang, Yonghao; Reese, Jason M.
2017-06-01
Iterative schemes to find steady-state solutions to the Boltzmann equation are efficient for highly rarefied gas flows, but can be very slow to converge in the near-continuum flow regime. In this paper, a synthetic iterative scheme is developed to speed up the solution of the linearized Boltzmann equation by penalizing the collision operator L into the form L = (L + Nδh) - Nδh, where δ is the gas rarefaction parameter, h is the velocity distribution function, and N is a tuning parameter controlling the convergence rate. The velocity distribution function is first solved by the conventional iterative scheme, then it is corrected such that the macroscopic flow velocity is governed by a diffusion-type equation that is asymptotic-preserving into the Navier-Stokes limit. The efficiency of this new scheme is assessed by calculating the eigenvalue of the iteration, as well as solving for Poiseuille and thermal transpiration flows. We find that the fastest convergence of our synthetic scheme for the linearized Boltzmann equation is achieved when Nδ is close to the average collision frequency. The synthetic iterative scheme is significantly faster than the conventional iterative scheme in both the transition and the near-continuum gas flow regimes. Moreover, due to its asymptotic-preserving properties, the synthetic iterative scheme does not need high spatial resolution in the near-continuum flow regime, which makes it even faster than the conventional iterative scheme. Using this synthetic scheme, with the fast spectral approximation of the linearized Boltzmann collision operator, Poiseuille and thermal transpiration flows between two parallel plates, through channels of circular/rectangular cross sections and various porous media are calculated over the whole range of gas rarefaction. Finally, the flow of a Ne-Ar gas mixture is solved based on the linearized Boltzmann equation with the Lennard-Jones intermolecular potential for the first time, and the difference between these results and those using the hard-sphere potential is discussed.
Algorithmically scalable block preconditioner for fully implicit shallow-water equations in CAM-SE
Lott, P. Aaron; Woodward, Carol S.; Evans, Katherine J.
2014-10-19
Performing accurate and efficient numerical simulation of global atmospheric climate models is challenging due to the disparate length and time scales over which physical processes interact. Implicit solvers enable the physical system to be integrated with a time step commensurate with processes being studied. The dominant cost of an implicit time step is the ancillary linear system solves, so we have developed a preconditioner aimed at improving the efficiency of these linear system solves. Our preconditioner is based on an approximate block factorization of the linearized shallow-water equations and has been implemented within the spectral element dynamical core within themore » Community Atmospheric Model (CAM-SE). Furthermore, in this paper we discuss the development and scalability of the preconditioner for a suite of test cases with the implicit shallow-water solver within CAM-SE.« less
FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics
NASA Astrophysics Data System (ADS)
Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio
2016-12-01
Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
Generalised Assignment Matrix Methodology in Linear Programming
ERIC Educational Resources Information Center
Jerome, Lawrence
2012-01-01
Discrete Mathematics instructors and students have long been struggling with various labelling and scanning algorithms for solving many important problems. This paper shows how to solve a wide variety of Discrete Mathematics and OR problems using assignment matrices and linear programming, specifically using Excel Solvers although the same…
NASA Technical Reports Server (NTRS)
Ferencz, Donald C.; Viterna, Larry A.
1991-01-01
ALPS is a computer program which can be used to solve general linear program (optimization) problems. ALPS was designed for those who have minimal linear programming (LP) knowledge and features a menu-driven scheme to guide the user through the process of creating and solving LP formulations. Once created, the problems can be edited and stored in standard DOS ASCII files to provide portability to various word processors or even other linear programming packages. Unlike many math-oriented LP solvers, ALPS contains an LP parser that reads through the LP formulation and reports several types of errors to the user. ALPS provides a large amount of solution data which is often useful in problem solving. In addition to pure linear programs, ALPS can solve for integer, mixed integer, and binary type problems. Pure linear programs are solved with the revised simplex method. Integer or mixed integer programs are solved initially with the revised simplex, and the completed using the branch-and-bound technique. Binary programs are solved with the method of implicit enumeration. This manual describes how to use ALPS to create, edit, and solve linear programming problems. Instructions for installing ALPS on a PC compatible computer are included in the appendices along with a general introduction to linear programming. A programmers guide is also included for assistance in modifying and maintaining the program.
A new fast direct solver for the boundary element method
NASA Astrophysics Data System (ADS)
Huang, S.; Liu, Y. J.
2017-09-01
A new fast direct linear equation solver for the boundary element method (BEM) is presented in this paper. The idea of the new fast direct solver stems from the concept of the hierarchical off-diagonal low-rank matrix. The hierarchical off-diagonal low-rank matrix can be decomposed into the multiplication of several diagonal block matrices. The inverse of the hierarchical off-diagonal low-rank matrix can be calculated efficiently with the Sherman-Morrison-Woodbury formula. In this paper, a more general and efficient approach to approximate the coefficient matrix of the BEM with the hierarchical off-diagonal low-rank matrix is proposed. Compared to the current fast direct solver based on the hierarchical off-diagonal low-rank matrix, the proposed method is suitable for solving general 3-D boundary element models. Several numerical examples of 3-D potential problems with the total number of unknowns up to above 200,000 are presented. The results show that the new fast direct solver can be applied to solve large 3-D BEM models accurately and with better efficiency compared with the conventional BEM.
NASA Astrophysics Data System (ADS)
Hladowski, Lukasz; Galkowski, Krzysztof; Cai, Zhonglun; Rogers, Eric; Freeman, Chris T.; Lewin, Paul L.
2011-07-01
In this article a new approach to iterative learning control for the practically relevant case of deterministic discrete linear plants with uniform rank greater than unity is developed. The analysis is undertaken in a 2D systems setting that, by using a strong form of stability for linear repetitive processes, allows simultaneous consideration of both trial-to-trial error convergence and along the trial performance, resulting in design algorithms that can be computed using linear matrix inequalities (LMIs). Finally, the control laws are experimentally verified on a gantry robot that replicates a pick and place operation commonly found in a number of applications to which iterative learning control is applicable.
Iterative color-multiplexed, electro-optical processor.
Psaltis, D; Casasent, D; Carlotto, M
1979-11-01
A noncoherent optical vector-matrix multiplier using a linear LED source array and a linear P-I-N photodiode detector array has been combined with a 1-D adder in a feedback loop. The resultant iterative optical processor and its use in solving simultaneous linear equations are described. Operation on complex data is provided by a novel color-multiplexing system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elbert, Stephen T.; Kalsi, Karanjit; Vlachopoulou, Maria
Financial Transmission Rights (FTRs) help power market participants reduce price risks associated with transmission congestion. FTRs are issued based on a process of solving a constrained optimization problem with the objective to maximize the FTR social welfare under power flow security constraints. Security constraints for different FTR categories (monthly, seasonal or annual) are usually coupled and the number of constraints increases exponentially with the number of categories. Commercial software for FTR calculation can only provide limited categories of FTRs due to the inherent computational challenges mentioned above. In this paper, a novel non-linear dynamical system (NDS) approach is proposed tomore » solve the optimization problem. The new formulation and performance of the NDS solver is benchmarked against widely used linear programming (LP) solvers like CPLEX™ and tested on large-scale systems using data from the Western Electricity Coordinating Council (WECC). The NDS is demonstrated to outperform the widely used CPLEX algorithms while exhibiting superior scalability. Furthermore, the NDS based solver can be easily parallelized which results in significant computational improvement.« less
LINFLUX-AE: A Turbomachinery Aeroelastic Code Based on a 3-D Linearized Euler Solver
NASA Technical Reports Server (NTRS)
Reddy, T. S. R.; Bakhle, M. A.; Trudell, J. J.; Mehmed, O.; Stefko, G. L.
2004-01-01
This report describes the development and validation of LINFLUX-AE, a turbomachinery aeroelastic code based on the linearized unsteady 3-D Euler solver, LINFLUX. A helical fan with flat plate geometry is selected as the test case for numerical validation. The steady solution required by LINFLUX is obtained from the nonlinear Euler/Navier Stokes solver TURBO-AE. The report briefly describes the salient features of LINFLUX and the details of the aeroelastic extension. The aeroelastic formulation is based on a modal approach. An eigenvalue formulation is used for flutter analysis. The unsteady aerodynamic forces required for flutter are obtained by running LINFLUX for each mode, interblade phase angle and frequency of interest. The unsteady aerodynamic forces for forced response analysis are obtained from LINFLUX for the prescribed excitation, interblade phase angle, and frequency. The forced response amplitude is calculated from the modal summation of the generalized displacements. The unsteady pressures, work done per cycle, eigenvalues and forced response amplitudes obtained from LINFLUX are compared with those obtained from LINSUB, TURBO-AE, ASTROP2, and ANSYS.
NASA Astrophysics Data System (ADS)
Verscharen, D.; Klein, K. G.; Chandran, B. D. G.; Stevens, M. L.; Salem, C. S.; Bale, S. D.
2017-12-01
The Arbitrary Linear Plasma Solver (ALPS) is a parallelized numerical code that solves the dispersion relation in a hot (even relativistic) magnetized plasma with an arbitrary number of particle species with arbitrary gyrotropic equilibrium distribution functions for any direction of wave propagation with respect to the background field. In this way, ALPS retains generality and overcomes the shortcomings of previous (bi-)Maxwellian solvers for the plasma dispersion relations. The unprecedented high-resolution particle and field data products from Parker Solar Probe (PSP) and Solar Orbiter (SO) will require novel theoretical tools. ALPS is one such tool, and its use will make possible new investigations into the role of non-Maxwellian distributions in the near-Sun solar wind. It can be applied to numerous high-velocity-resolution systems, ranging from current space missions to numerical simulations. We will briefly discuss the ALPS algorithm and demonstrate its functionality based on previous solar-wind measurements. We will then highlight our plans for future applications of ALPS to PSP and SO observations.
2016-01-05
discretizations . We maintain that what is clear at the mathematical level should be equally clear in computation. In this small STIR project, we separate the...concerns of describing and discretizing such models by defining an input language representing PDE, including steady-state and tran- sient, linear and...solvers, such as [8, 9], focused on the solvers themselves and particular families of discretizations (e. g. finite elements), and now it is natural to
Mega-Scale Simulation of Multi-Layer Devices-- Formulation, Kinetics, and Visualization
1994-07-28
prototype code STRIDE, also initially developed under ARO support. The focus of the ARO supported research activities has been in the areas of multi ... FORTRAN -77. During its fifteen-year life- span several generations of researchers have modified the code . Due to this continual develop- ment, the...behavior. The replacement of the linear solver had no effect on the remainder of the code . We replaced the existing solver with a distributed multi -frontal
The preconditioned Gauss-Seidel method faster than the SOR method
NASA Astrophysics Data System (ADS)
Niki, Hiroshi; Kohno, Toshiyuki; Morimoto, Munenori
2008-09-01
In recent years, a number of preconditioners have been applied to linear systems [A.D. Gunawardena, S.K. Jain, L. Snyder, Modified iterative methods for consistent linear systems, Linear Algebra Appl. 154-156 (1991) 123-143; T. Kohno, H. Kotakemori, H. Niki, M. Usui, Improving modified Gauss-Seidel method for Z-matrices, Linear Algebra Appl. 267 (1997) 113-123; H. Kotakemori, K. Harada, M. Morimoto, H. Niki, A comparison theorem for the iterative method with the preconditioner (I+Smax), J. Comput. Appl. Math. 145 (2002) 373-378; H. Kotakemori, H. Niki, N. Okamoto, Accelerated iteration method for Z-matrices, J. Comput. Appl. Math. 75 (1996) 87-97; M. Usui, H. Niki, T.Kohno, Adaptive Gauss-Seidel method for linear systems, Internat. J. Comput. Math. 51(1994)119-125 [10
ERIC Educational Resources Information Center
Smith, Michael
1990-01-01
Presents several examples of the iteration method using computer spreadsheets. Examples included are simple iterative sequences and the solution of equations using the Newton-Raphson formula, linear interpolation, and interval bisection. (YP)
Galaxy Redshifts from Discrete Optimization of Correlation Functions
NASA Astrophysics Data System (ADS)
Lee, Benjamin C. G.; Budavári, Tamás; Basu, Amitabh; Rahman, Mubdi
2016-12-01
We propose a new method of constraining the redshifts of individual extragalactic sources based on celestial coordinates and their ensemble statistics. Techniques from integer linear programming (ILP) are utilized to optimize simultaneously for the angular two-point cross- and autocorrelation functions. Our novel formalism introduced here not only transforms the otherwise hopelessly expensive, brute-force combinatorial search into a linear system with integer constraints but also is readily implementable in off-the-shelf solvers. We adopt Gurobi, a commercial optimization solver, and use Python to build the cost function dynamically. The preliminary results on simulated data show potential for future applications to sky surveys by complementing and enhancing photometric redshift estimators. Our approach is the first application of ILP to astronomical analysis.
Relaxation approximations to second-order traffic flow models by high-resolution schemes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nikolos, I.K.; Delis, A.I.; Papageorgiou, M.
2015-03-10
A relaxation-type approximation of second-order non-equilibrium traffic models, written in conservation or balance law form, is considered. Using the relaxation approximation, the nonlinear equations are transformed to a semi-linear diagonilizable problem with linear characteristic variables and stiff source terms with the attractive feature that neither Riemann solvers nor characteristic decompositions are in need. In particular, it is only necessary to provide the flux and source term functions and an estimate of the characteristic speeds. To discretize the resulting relaxation system, high-resolution reconstructions in space are considered. Emphasis is given on a fifth-order WENO scheme and its performance. The computations reportedmore » demonstrate the simplicity and versatility of relaxation schemes as numerical solvers.« less
Numerical simulation of an elastic structure behavior under transient fluid flow excitation
NASA Astrophysics Data System (ADS)
Afanasyeva, Irina N.; Lantsova, Irina Yu.
2017-01-01
This paper deals with the verification of a numerical technique of modeling fluid-structure interaction (FSI) problems. The configuration consists of incompressible viscous fluid around an elastic structure in the channel. External flow is laminar. Multivariate calculations are performed using special software ANSYS CFX and ANSYS Mechanical. Different types of parameters of mesh deformation and solver controls (time step, under relaxation factor, number of iterations at coupling step) were tested. The results are presented in tables and plots in comparison with reference data.
An efficient transport solver for tokamak plasmas
Park, Jin Myung; Murakami, Masanori; St. John, H. E.; ...
2017-01-03
A simple approach to efficiently solve a coupled set of 1-D diffusion-type transport equations with a stiff transport model for tokamak plasmas is presented based on the 4th order accurate Interpolated Differential Operator scheme along with a nonlinear iteration method derived from a root-finding algorithm. Here, numerical tests using the Trapped Gyro-Landau-Fluid model show that the presented high order method provides an accurate transport solution using a small number of grid points with robust nonlinear convergence.
An Analysis of Elliptic Grid Generation Techniques Using an Implicit Euler Solver.
1986-06-09
automatic determination of the control fu.nction, . elements of covariant metric tensor in the elliptic grid generation system , from the Cm = 1,2,3...computational fluid d’nan1-cs code. Tne code Inclues a tnree-dimensional current research is aimed primaril: at algebraic generation system based on transfinite...start the iterative solution of the f. ow, nea, transfer, and combustion proble:s. elliptic generation system . Tn13 feature also .:ven-.ts :.t be made
Effect of Coannular Flow on Linearized Euler Equation Predictions of Jet Noise
NASA Technical Reports Server (NTRS)
Hixon, R.; Shih, S.-H.; Mankbadi, Reda R.
1997-01-01
An improved version of a previously validated linearized Euler equation solver is used to compute the noise generated by coannular supersonic jets. Results for a single supersonic jet are compared to the results from both a normal velocity profile and an inverted velocity profile supersonic jet.
A hybrid formalism of aerosol gas phase interaction for 3-D global models
NASA Astrophysics Data System (ADS)
Benduhn, F.
2009-04-01
Aerosol chemical composition is a relevant factor to the global climate system with respect to both atmospheric chemistry and the aerosol direct and indirect effects. Aerosol chemical composition determines the capacity of aerosol particles to act as cloud condensation nuclei both explicitly via particle size and implicitly via the aerosol hygroscopic property. Due to the primary role of clouds in the climate system and the sensitivity of cloud formation and radiative properties to the cloud droplet number it is necessary to determine with accuracy the chemical composition of the aerosol. Dissolution, although a formally fairly well known process, may be subject to numerically prohibitive properties that result from the chemical interaction of the species engaged. So-far approaches to model the dissolution of inorganics into the aerosol liquid phase in the framework of a 3-D global model were based on an equilibrium, transient or hybrid equilibrium-transient approach. All of these methods present the disadvantage of a priori assumptions with respect to the mechanism and/or are numerically not manageable in the context of a global climate system model. In this paper a new hybrid formalism to aerosol gas phase interaction is presented within the framework of the H2SO4/HNO3/HCl/NH3 system and a modal approach of aerosol size discretisation. The formalism is distinct from prior hybrid approaches in as much as no a priori assumption on the nature of the regime a particular aerosol mode is in is made. Whether a particular mode is set to be in the equilibrium or the transitory regime is continuously determined during each time increment against relevant criteria considering the estimated equilibration time interval and the interdependence of the aerosol modes relative to the partitioning of the dissolving species. Doing this the aerosol composition range of numerical stiffness due to species interaction during transient dissolution is effectively eluded, and the numerical expense of dissolution in the transient regime is reduced through the minimisation of the number of modes in this regime and a larger time step. Containment of the numerical expense of the modes in the equilibrium regime is ensured through the usage of either an analytical equilibrium solver that requires iteration among the equilibrium modes, or a simple numerical solver based on a differential approach that requires iteration among the chemical species. Both equilibrium solvers require iteration over the water content and the activity coefficients. Decision for using either one or the other solver is made upon the consideration of the actual equilibrating mechanism, either chemical interaction or gas phase partial pressure variation, respectively. The formalism should thus ally appropriate process simplification resulting in reasonable computation time to a high degree of real process conformity as it is ensured by a transitory representation of dissolution. The resulting effectiveness and limits of the formalism are illustrated with numerical examples.
Parallel fast multipole boundary element method applied to computational homogenization
NASA Astrophysics Data System (ADS)
Ptaszny, Jacek
2018-01-01
In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
NASA Astrophysics Data System (ADS)
Köhn, A.; Guidi, L.; Holzhauer, E.; Maj, O.; Poli, E.; Snicker, A.; Weber, H.
2018-07-01
Plasma turbulence, and edge density fluctuations in particular, can under certain conditions broaden the cross-section of injected microwave beams significantly. This can be a severe problem for applications relying on well-localized deposition of the microwave power, like the control of MHD instabilities. Here we investigate this broadening mechanism as a function of fluctuation level, background density and propagation length in a fusion-relevant scenario using two numerical codes, the full-wave code IPF-FDMC and the novel wave kinetic equation solver WKBeam. The latter treats the effects of fluctuations using a statistical approach, based on an iterative solution of the scattering problem (Born approximation). The full-wave simulations are used to benchmark this approach. The Born approximation is shown to be valid over a large parameter range, including ITER-relevant scenarios.
Formulation for Simultaneous Aerodynamic Analysis and Design Optimization
NASA Technical Reports Server (NTRS)
Hou, G. W.; Taylor, A. C., III; Mani, S. V.; Newman, P. A.
1993-01-01
An efficient approach for simultaneous aerodynamic analysis and design optimization is presented. This approach does not require the performance of many flow analyses at each design optimization step, which can be an expensive procedure. Thus, this approach brings us one step closer to meeting the challenge of incorporating computational fluid dynamic codes into gradient-based optimization techniques for aerodynamic design. An adjoint-variable method is introduced to nullify the effect of the increased number of design variables in the problem formulation. The method has been successfully tested on one-dimensional nozzle flow problems, including a sample problem with a normal shock. Implementations of the above algorithm are also presented that incorporate Newton iterations to secure a high-quality flow solution at the end of the design process. Implementations with iterative flow solvers are possible and will be required for large, multidimensional flow problems.
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
Grayver, Alexander V.; Kolev, Tzanio V.
2015-11-01
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grayver, Alexander V.; Kolev, Tzanio V.
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Newton's method: A link between continuous and discrete solutions of nonlinear problems
NASA Technical Reports Server (NTRS)
Thurston, G. A.
1980-01-01
Newton's method for nonlinear mechanics problems replaces the governing nonlinear equations by an iterative sequence of linear equations. When the linear equations are linear differential equations, the equations are usually solved by numerical methods. The iterative sequence in Newton's method can exhibit poor convergence properties when the nonlinear problem has multiple solutions for a fixed set of parameters, unless the iterative sequences are aimed at solving for each solution separately. The theory of the linear differential operators is often a better guide for solution strategies in applying Newton's method than the theory of linear algebra associated with the numerical analogs of the differential operators. In fact, the theory for the differential operators can suggest the choice of numerical linear operators. In this paper the method of variation of parameters from the theory of linear ordinary differential equations is examined in detail in the context of Newton's method to demonstrate how it might be used as a guide for numerical solutions.
An implicit-iterative solution of the heat conduction equation with a radiation boundary condition
NASA Technical Reports Server (NTRS)
Williams, S. D.; Curry, D. M.
1977-01-01
For the problem of predicting one-dimensional heat transfer between conducting and radiating mediums by an implicit finite difference method, four different formulations were used to approximate the surface radiation boundary condition while retaining an implicit formulation for the interior temperature nodes. These formulations are an explicit boundary condition, a linearized boundary condition, an iterative boundary condition, and a semi-iterative boundary method. The results of these methods in predicting surface temperature on the space shuttle orbiter thermal protection system model under a variety of heating rates were compared. The iterative technique caused the surface temperature to be bounded at each step. While the linearized and explicit methods were generally more efficient, the iterative and semi-iterative techniques provided a realistic surface temperature response without requiring step size control techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Chen, E-mail: chuang3@fsu.edu
A key element in the density functional embedding theory (DFET) is the embedding potential. We discuss two major issues related to the embedding potential: (1) its non-uniqueness and (2) the numerical difficulty for solving for it, especially for the spin-polarized systems. To resolve the first issue, we extend DFET to finite temperature: all quantities, such as the subsystem densities and the total system’s density, are calculated at a finite temperature. This is a physical extension since materials work at finite temperatures. We show that the embedding potential is strictly unique at T > 0. To resolve the second issue, wemore » introduce an efficient iterative embedding potential solver. We discuss how to relax the magnetic moments in subsystems and how to equilibrate the chemical potentials across subsystems. The solver is robust and efficient for several non-trivial examples, in all of which good quality spin-polarized embedding potentials were obtained. We also demonstrate the solver on an extended periodic system: iron body-centered cubic (110) surface, which is related to the modeling of the heterogeneous catalysis involving iron, such as the Fischer-Tropsch and the Haber processes. This work would make it efficient and accurate to perform embedding simulations of some challenging material problems, such as the heterogeneous catalysis and the defects of complicated spin configurations in electronic materials.« less
Least-Squares Spectral Element Solutions to the CAA Workshop Benchmark Problems
NASA Technical Reports Server (NTRS)
Lin, Wen H.; Chan, Daniel C.
1997-01-01
This paper presents computed results for some of the CAA benchmark problems via the acoustic solver developed at Rocketdyne CFD Technology Center under the corporate agreement between Boeing North American, Inc. and NASA for the Aerospace Industry Technology Program. The calculations are considered as benchmark testing of the functionality, accuracy, and performance of the solver. Results of these computations demonstrate that the solver is capable of solving the propagation of aeroacoustic signals. Testing of sound generation and on more realistic problems is now pursued for the industrial applications of this solver. Numerical calculations were performed for the second problem of Category 1 of the current workshop problems for an acoustic pulse scattered from a rigid circular cylinder, and for two of the first CAA workshop problems, i. e., the first problem of Category 1 for the propagation of a linear wave and the first problem of Category 4 for an acoustic pulse reflected from a rigid wall in a uniform flow of Mach 0.5. The aim for including the last two problems in this workshop is to test the effectiveness of some boundary conditions set up in the solver. Numerical results of the last two benchmark problems have been compared with their corresponding exact solutions and the comparisons are excellent. This demonstrates the high fidelity of the solver in handling wave propagation problems. This feature lends the method quite attractive in developing a computational acoustic solver for calculating the aero/hydrodynamic noise in a violent flow environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anton, Luis; MartI, Jose M; Ibanez, Jose M
2010-05-01
We obtain renormalized sets of right and left eigenvectors of the flux vector Jacobians of the relativistic MHD equations, which are regular and span a complete basis in any physical state including degenerate ones. The renormalization procedure relies on the characterization of the degeneracy types in terms of the normal and tangential components of the magnetic field to the wave front in the fluid rest frame. Proper expressions of the renormalized eigenvectors in conserved variables are obtained through the corresponding matrix transformations. Our work completes previous analysis that present different sets of right eigenvectors for non-degenerate and degenerate states, andmore » can be seen as a relativistic generalization of earlier work performed in classical MHD. Based on the full wave decomposition (FWD) provided by the renormalized set of eigenvectors in conserved variables, we have also developed a linearized (Roe-type) Riemann solver. Extensive testing against one- and two-dimensional standard numerical problems allows us to conclude that our solver is very robust. When compared with a family of simpler solvers that avoid the knowledge of the full characteristic structure of the equations in the computation of the numerical fluxes, our solver turns out to be less diffusive than HLL and HLLC, and comparable in accuracy to the HLLD solver. The amount of operations needed by the FWD solver makes it less efficient computationally than those of the HLL family in one-dimensional problems. However, its relative efficiency increases in multidimensional simulations.« less
On a new iterative method for solving linear systems and comparison results
NASA Astrophysics Data System (ADS)
Jing, Yan-Fei; Huang, Ting-Zhu
2008-10-01
In Ujevic [A new iterative method for solving linear systems, Appl. Math. Comput. 179 (2006) 725-730], the author obtained a new iterative method for solving linear systems, which can be considered as a modification of the Gauss-Seidel method. In this paper, we show that this is a special case from a point of view of projection techniques. And a different approach is established, which is both theoretically and numerically proven to be better than (at least the same as) Ujevic's. As the presented numerical examples show, in most cases, the convergence rate is more than one and a half that of Ujevic.
Optimal Facility Location Tool for Logistics Battle Command (LBC)
2015-08-01
64 Appendix B. VBA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Appendix C. Story...should city planners have located emergency service facilities so that all households (the demand) had equal access to coverage?” The critical...programming language called Visual Basic for Applications ( VBA ). CPLEX is a commercial solver for linear, integer, and mixed integer linear programming problems
NASA Astrophysics Data System (ADS)
Munhoven, G.
2013-08-01
The total alkalinity-pH equation, which relates total alkalinity and pH for a given set of total concentrations of the acid-base systems that contribute to total alkalinity in a given water sample, is reviewed and its mathematical properties established. We prove that the equation function is strictly monotone and always has exactly one positive root. Different commonly used approximations are discussed and compared. An original method to derive appropriate initial values for the iterative solution of the cubic polynomial equation based upon carbonate-borate-alkalinity is presented. We then review different methods that have been used to solve the total alkalinity-pH equation, with a main focus on biogeochemical models. The shortcomings and limitations of these methods are made out and discussed. We then present two variants of a new, robust and universally convergent algorithm to solve the total alkalinity-pH equation. This algorithm does not require any a priori knowledge of the solution. SolveSAPHE (Solver Suite for Alkalinity-PH Equations) provides reference implementations of several variants of the new algorithm in Fortran 90, together with new implementations of other, previously published solvers. The new iterative procedure is shown to converge from any starting value to the physical solution. The extra computational cost for the convergence security is only 10-15% compared to the fastest algorithm in our test series.
NASA Technical Reports Server (NTRS)
Zinnecker, Alicia M.; Chapman, Jeffryes W.; Lavelle, Thomas M.; Litt, Jonathan S.
2014-01-01
The Toolbox for the Modeling and Analysis of Thermodynamic Systems (T-MATS) is a tool that has been developed to allow a user to build custom models of systems governed by thermodynamic principles using a template to model each basic process. Validation of this tool in an engine model application was performed through reconstruction of the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) (v2) using the building blocks from the T-MATS (v1) library. In order to match the two engine models, it was necessary to address differences in several assumptions made in the two modeling approaches. After these modifications were made, validation of the engine model continued by integrating both a steady-state and dynamic iterative solver with the engine plant and comparing results from steady-state and transient simulation of the T-MATS and C-MAPSS models. The results show that the T-MATS engine model was accurate within 3% of the C-MAPSS model, with inaccuracy attributed to the increased dimension of the iterative solver solution space required by the engine model constructed using the T-MATS library. This demonstrates that, given an understanding of the modeling assumptions made in T-MATS and a baseline model, the T-MATS tool provides a viable option for constructing a computational model of a twin-spool turbofan engine that may be used in simulation studies.
NASA Astrophysics Data System (ADS)
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
NASA Astrophysics Data System (ADS)
Pohlman, Matthew Michael
The study of heat transfer and fluid flow in a vertical Bridgman device is motivated by current industrial difficulties in growing crystals with as few defects as possible. For example, Gallium Arsenide (GaAs) is of great interest to the semiconductor industry but remains an uneconomical alternative to silicon because of the manufacturing problems. This dissertation is a two dimensional study of the fluid in an idealized Bridgman device. The model nonlinear PDEs are discretized using second order finite differencing. Newton's method solves the resulting nonlinear discrete equations. The large sparse linear systems involving the Jacobian are solved iteratively using the Generalized Minimum Residual method (GMRES). By adapting fast direct solvers for elliptic equations with simple boundary conditions, a good preconditioner is developed which is essential for GMRES to converge quickly. Trends of the fluid flow and heat transfer for typical ranges of the physical parameters are determined. Also, the size of the terms in the mathematical model are found by numerical investigation, in order to find what terms are in balance as the physical parameters vary. The results suggest the plausibility of simpler asymptotic solutions.
NASA Astrophysics Data System (ADS)
Kong, Fande; Cai, Xiao-Chuan
2017-07-01
Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear in many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexact Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here "geometry" includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.
Performance of fully-coupled algebraic multigrid preconditioners for large-scale VMS resistive MHD
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, P. T.; Shadid, J. N.; Hu, J. J.
Here, we explore the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. Our study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of themore » original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.« less
Performance of fully-coupled algebraic multigrid preconditioners for large-scale VMS resistive MHD
Lin, P. T.; Shadid, J. N.; Hu, J. J.; ...
2017-11-06
Here, we explore the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. Our study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of themore » original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.« less
NASA Astrophysics Data System (ADS)
Huang, Chien-Jung; White, Susan; Huang, Shao-Ching; Mallya, Sanjay; Eldredge, Jeff
2016-11-01
Obstructive sleep apnea (OSA) is a medical condition characterized by repetitive partial or complete occlusion of the airway during sleep. The soft tissues in the upper airway of OSA patients are prone to collapse under the low pressure loads incurred during breathing. The ultimate goal of this research is the development of a versatile numerical tool for simulation of air-tissue interactions in the patient specific upper airway geometry. This tool is expected to capture several phenomena, including flow-induced vibration (snoring) and large deformations during airway collapse of the complex airway geometry in respiratory flow conditions. Here, we present our ongoing progress toward this goal. To avoid mesh regeneration, for flow model, a sharp-interface embedded boundary method is used on Cartesian grids for resolving the fluid-structure interface, while for the structural model, a cut-cell finite element method is used. Also, to properly resolve large displacements, non-linear elasticity model is used. The fluid and structure solvers are connected with the strongly coupled iterative algorithm. The parallel computation is achieved with the numerical library PETSc. Some two- and three- dimensional preliminary results are shown to demonstrate the ability of this tool.
Scaling Optimization of the SIESTA MHD Code
NASA Astrophysics Data System (ADS)
Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan
2013-10-01
SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
NASA Astrophysics Data System (ADS)
Liu, Ying; Xu, Zhenhuan; Li, Yuguo
2018-04-01
We present a goal-oriented adaptive finite element (FE) modelling algorithm for 3-D magnetotelluric fields in generally anisotropic conductivity media. The model consists of a background layered structure, containing anisotropic blocks. Each block and layer might be anisotropic by assigning to them 3 × 3 conductivity tensors. The second-order partial differential equations are solved using the adaptive finite element method (FEM). The computational domain is subdivided into unstructured tetrahedral elements, which allow for complex geometries including bathymetry and dipping interfaces. The grid refinement process is guided by a global posteriori error estimator and is performed iteratively. The system of linear FE equations for electric field E is solved with a direct solver MUMPS. Then the magnetic field H can be found, in which the required derivatives are computed numerically using cubic spline interpolation. The 3-D FE algorithm has been validated by comparisons with both the 3-D finite-difference solution and 2-D FE results. Two model types are used to demonstrate the effects of anisotropy upon 3-D magnetotelluric responses: horizontal and dipping anisotropy. Finally, a 3D sea hill model is modelled to study the effect of oblique interfaces and the dipping anisotropy.
Kong, Fande; Cai, Xiao-Chuan
2017-03-24
Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexactmore » Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here ''geometry'' includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.« less
Assessment of Preconditioner for a USM3D Hierarchical Adaptive Nonlinear Method (HANIM) (Invited)
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.
2016-01-01
Enhancements to the previously reported mixed-element USM3D Hierarchical Adaptive Nonlinear Iteration Method (HANIM) framework have been made to further improve robustness, efficiency, and accuracy of computational fluid dynamic simulations. The key enhancements include a multi-color line-implicit preconditioner, a discretely consistent symmetry boundary condition, and a line-mapping method for the turbulence source term discretization. The USM3D iterative convergence for the turbulent flows is assessed on four configurations. The configurations include a two-dimensional (2D) bump-in-channel, the 2D NACA 0012 airfoil, a three-dimensional (3D) bump-in-channel, and a 3D hemisphere cylinder. The Reynolds Averaged Navier Stokes (RANS) solutions have been obtained using a Spalart-Allmaras turbulence model and families of uniformly refined nested grids. Two types of HANIM solutions using line- and point-implicit preconditioners have been computed. Additional solutions using the point-implicit preconditioner alone (PA) method that broadly represents the baseline solver technology have also been computed. The line-implicit HANIM shows superior iterative convergence in most cases with progressively increasing benefits on finer grids.
Final Report: Subcontract B623868 Algebraic Multigrid solvers for coupled PDE systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brannick, J.
The Pennsylvania State University (“Subcontractor”) continued to work on the design of algebraic multigrid solvers for coupled systems of partial differential equations (PDEs) arising in numerical modeling of various applications, with a main focus on solving the Dirac equation arising in Quantum Chromodynamics (QCD). The goal of the proposed work was to develop combined geometric and algebraic multilevel solvers that are robust and lend themselves to efficient implementation on massively parallel heterogeneous computers for these QCD systems. The research in these areas built on previous works, focusing on the following three topics: (1) the development of parallel full-multigrid (PFMG) andmore » non-Galerkin coarsening techniques in this frame work for solving the Wilson Dirac system; (2) the use of these same Wilson MG solvers for preconditioning the Overlap and Domain Wall formulations of the Dirac equation; and (3) the design and analysis of algebraic coarsening algorithms for coupled PDE systems including Stokes equation, Maxwell equation and linear elasticity.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vay, Jean-Luc, E-mail: jlvay@lbl.gov; Haber, Irving; Godfrey, Brendan B.
Pseudo-spectral electromagnetic solvers (i.e. representing the fields in Fourier space) have extraordinary precision. In particular, Haber et al. presented in 1973 a pseudo-spectral solver that integrates analytically the solution over a finite time step, under the usual assumption that the source is constant over that time step. Yet, pseudo-spectral solvers have not been widely used, due in part to the difficulty for efficient parallelization owing to global communications associated with global FFTs on the entire computational domains. A method for the parallelization of electromagnetic pseudo-spectral solvers is proposed and tested on single electromagnetic pulses, and on Particle-In-Cell simulations of themore » wakefield formation in a laser plasma accelerator. The method takes advantage of the properties of the Discrete Fourier Transform, the linearity of Maxwell’s equations and the finite speed of light for limiting the communications of data within guard regions between neighboring computational domains. Although this requires a small approximation, test results show that no significant error is made on the test cases that have been presented. The proposed method opens the way to solvers combining the favorable parallel scaling of standard finite-difference methods with the accuracy advantages of pseudo-spectral methods.« less
NASA Astrophysics Data System (ADS)
Feng, Wenqiang; Guo, Zhenlin; Lowengrub, John S.; Wise, Steven M.
2018-01-01
We present a mass-conservative full approximation storage (FAS) multigrid solver for cell-centered finite difference methods on block-structured, locally cartesian grids. The algorithm is essentially a standard adaptive FAS (AFAS) scheme, but with a simple modification that comes in the form of a mass-conservative correction to the coarse-level force. This correction is facilitated by the creation of a zombie variable, analogous to a ghost variable, but defined on the coarse grid and lying under the fine grid refinement patch. We show that a number of different types of fine-level ghost cell interpolation strategies could be used in our framework, including low-order linear interpolation. In our approach, the smoother, prolongation, and restriction operations need never be aware of the mass conservation conditions at the coarse-fine interface. To maintain global mass conservation, we need only modify the usual FAS algorithm by correcting the coarse-level force function at points adjacent to the coarse-fine interface. We demonstrate through simulations that the solver converges geometrically, at a rate that is h-independent, and we show the generality of the solver, applying it to several nonlinear, time-dependent, and multi-dimensional problems. In several tests, we show that second-order asymptotic (h → 0) convergence is observed for the discretizations, provided that (1) at least linear interpolation of the ghost variables is employed, and (2) the mass conservation corrections are applied to the coarse-level force term.
NASA Astrophysics Data System (ADS)
Whiteley, J. P.
2017-10-01
Large, incompressible elastic deformations are governed by a system of nonlinear partial differential equations. The finite element discretisation of these partial differential equations yields a system of nonlinear algebraic equations that are usually solved using Newton's method. On each iteration of Newton's method, a linear system must be solved. We exploit the structure of the Jacobian matrix to propose a preconditioner, comprising two steps. The first step is the solution of a relatively small, symmetric, positive definite linear system using the preconditioned conjugate gradient method. This is followed by a small number of multigrid V-cycles for a larger linear system. Through the use of exemplar elastic deformations, the preconditioner is demonstrated to facilitate the iterative solution of the linear systems arising. The number of GMRES iterations required has only a very weak dependence on the number of degrees of freedom of the linear systems.
NASA Astrophysics Data System (ADS)
Allen, Jeffery M.
This research involves a few First-Order System Least Squares (FOSLS) formulations of a nonlinear-Stokes flow model for ice sheets. In Glen's flow law, a commonly used constitutive equation for ice rheology, the viscosity becomes infinite as the velocity gradients approach zero. This typically occurs near the ice surface or where there is basal sliding. The computational difficulties associated with the infinite viscosity are often overcome by an arbitrary modification of Glen's law that bounds the maximum viscosity. The FOSLS formulations developed in this thesis are designed to overcome this difficulty. The first FOSLS formulation is just the first-order representation of the standard nonlinear, full-Stokes and is known as the viscosity formulation and suffers from the problem above. To overcome the problem of infinite viscosity, two new formulation exploit the fact that the deviatoric stress, the product of viscosity and strain-rate, approaches zero as the viscosity goes to infinity. Using the deviatoric stress as the basis for a first-order system results in the the basic fluidity system. Augmenting the basic fluidity system with a curl-type equation results in the augmented fluidity system, which is more amenable to the iterative solver, Algebraic MultiGrid (AMG). A Nested Iteration (NI) Newton-FOSLS-AMG approach is used to solve the nonlinear-Stokes problems. Several test problems from the ISMIP set of benchmarks is examined to test the effectiveness of the various formulations. These test show that the viscosity based method is more expensive and less accurate. The basic fluidity system shows optimal finite-element convergence. However, there is not yet an efficient iterative solver for this type of system and this is the topic of future research. Alternatively, AMG performs better on the augmented fluidity system when using specific scaling. Unfortunately, this scaling results in reduced finite-element convergence.
Investigation of a Parabolic Iterative Solver for Three-dimensional Configurations
NASA Technical Reports Server (NTRS)
Nark, Douglas M.; Watson, Willie R.; Mani, Ramani
2007-01-01
A parabolic iterative solution procedure is investigated that seeks to extend the parabolic approximation used within the internal propagation module of the duct noise propagation and radiation code CDUCT-LaRC. The governing convected Helmholtz equation is split into a set of coupled equations governing propagation in the positive and negative directions. The proposed method utilizes an iterative procedure to solve the coupled equations in an attempt to account for possible reflections from internal bifurcations, impedance discontinuities, and duct terminations. A geometry consistent with the NASA Langley Curved Duct Test Rig is considered and the effects of acoustic treatment and non-anechoic termination are included. Two numerical implementations are studied and preliminary results indicate that improved accuracy in predicted amplitude and phase can be obtained for modes at a cut-off ratio of 1.7. Further predictions for modes at a cut-off ratio of 1.1 show improvement in predicted phase at the expense of increased amplitude error. Possible methods of improvement are suggested based on analytic and numerical analysis. It is hoped that coupling the parabolic iterative approach with less efficient, high fidelity finite element approaches will ultimately provide the capability to perform efficient, higher fidelity acoustic calculations within complex 3-D geometries for impedance eduction and noise propagation and radiation predictions.
Parallel iterative methods for sparse linear and nonlinear equations
NASA Technical Reports Server (NTRS)
Saad, Youcef
1989-01-01
As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weiss, Chester J
Software solves the three-dimensional Poisson equation div(k(grad(u)) = f, by the finite element method for the case when material properties, k, are distributed over hierarchy of edges, facets and tetrahedra in the finite element mesh. Method is described in Weiss, CJ, Finite element analysis for model parameters distributed on a hierarchy of geometric simplices, Geophysics, v82, E155-167, doi:10.1190/GEO2017-0058.1 (2017). A standard finite element method for solving Poisson’s equation is augmented by including in the 3D stiffness matrix additional 2D and 1D stiffness matrices representing the contributions from material properties associated with mesh faces and edges, respectively. The resulting linear systemmore » is solved iteratively using the conjugate gradient method with Jacobi preconditioning. To minimize computer storage for program execution, the linear solver computes matrix-vector contractions element-by-element over the mesh, without explicit storage of the global stiffness matrix. Program output vtk compliant for visualization and rendering by 3rd party software. Program uses dynamic memory allocation and as such there are no hard limits on problem size outside of those imposed by the operating system and configuration on which the software is run. Dimension, N, of the finite element solution vector is constrained by the the addressable space in 32-vs-64 bit operating systems. Total storage requirements for the problem. Total working space required for the program is approximately 13*N double precision words.« less