Parallelizable approximate solvers for recursions arising in preconditioning
Shapira, Y.
1996-12-31
For the recursions used in the Modified Incomplete LU (MILU) preconditioner, namely, the incomplete decomposition, forward elimination and back substitution processes, a parallelizable approximate solver is presented. The present analysis shows that the solutions of the recursions depend only weakly on their initial conditions and may be interpreted to indicate that the inexact solution is close, in some sense, to the exact one. The method is based on a domain decomposition approach, suitable for parallel implementations with message passing architectures. It requires a fixed number of communication steps per preconditioned iteration, independently of the number of subdomains or the size of the problem. The overlapping subdomains are either cubes (suitable for mesh-connected arrays of processors) or constructed by the data-flow rule of the recursions (suitable for line-connected arrays with possibly SIMD or vector processors). Numerical examples show that, in both cases, the overhead in the number of iterations required for convergence of the preconditioned iteration is small relatively to the speed-up gained.
Approximate Riemann solvers for the Godunov SPH (GSPH)
NASA Astrophysics Data System (ADS)
Puri, Kunal; Ramachandran, Prabhu
2014-08-01
The Godunov Smoothed Particle Hydrodynamics (GSPH) method is coupled with non-iterative, approximate Riemann solvers for solutions to the compressible Euler equations. The use of approximate solvers avoids the expensive solution of the non-linear Riemann problem for every interacting particle pair, as required by GSPH. In addition, we establish an equivalence between the dissipative terms of GSPH and the signal based SPH artificial viscosity, under the restriction of a class of approximate Riemann solvers. This equivalence is used to explain the anomalous “wall heating” experienced by GSPH and we provide some suggestions to overcome it. Numerical tests in one and two dimensions are used to validate the proposed Riemann solvers. A general SPH pairing instability is observed for two-dimensional problems when using unequal mass particles. In general, Ducowicz Roe's and HLLC approximate Riemann solvers are found to be suitable replacements for the iterative Riemann solver in the original GSPH scheme.
Parallel iterative solvers and preconditioners using approximate hierarchical methods
Grama, A.; Kumar, V.; Sameh, A.
1996-12-31
In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.
Approximate Riemann Solvers for the Cosmic Ray Magnetohydrodynamical Equations
NASA Astrophysics Data System (ADS)
Kudoh, Yuki; Hanawa, Tomoyuki
2016-08-01
We analyze the cosmic-ray magnetohydrodynamic (CR MHD) equations to improve the numerical simulations. We propose to solve them in the fully conservation form, which is equivalent to the conventional CR MHD equations. In the fully conservation form, the CR energy equation is replaced with the CR "number" conservation, where the CR number density is defined as the three fourths power of the CR energy density. The former contains an extra source term, while latter does not. An approximate Riemann solver is derived from the CR MHD equations in the fully conservation form. Based on the analysis, we propose a numerical scheme of which solutions satisfy the Rankine-Hugoniot relation at any shock. We demonstrate that it reproduces the Riemann solution derived by Pfrommer et al. (2006) for a 1D CR hydrodynamic shock tube problem. We compare the solution with those obtained by solving the CR energy equation. The latter solutions deviate from the Riemann solution seriously, when the CR pressure dominates over the gas pressure in the post-shocked gas. The former solutions converge to the Riemann solution and are of the second order accuracy in space and time. Our numerical examples include an expansion of high pressure sphere in an magnetized medium. Fast and slow shocks are sharply resolved in the example. We also discuss possible extension of the CR MHD equations to evaluate the average CR energy.
Li, Xinya; Deng, Z Daniel; Sun, Yannan; Martinez, Jayson J; Fu, Tao; McMichael, Geoffrey A; Carlson, Thomas J
2014-01-01
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature. PMID:25427517
NASA Astrophysics Data System (ADS)
Li, Xinya; Deng, Z. Daniel; Sun, Yannan; Martinez, Jayson J.; Fu, Tao; McMichael, Geoffrey A.; Carlson, Thomas J.
2014-11-01
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.
Li, Xinya; Deng, Z. Daniel; USA, Richland Washington; Sun, Yannan; USA, Richland Washington; Martinez, Jayson J.; USA, Richland Washington; Fu, Tao; USA, Richland Washington; McMichael, Geoffrey A.; et al
2014-11-27
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developedmore » using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.« less
Li, Xinya; Deng, Z. Daniel; USA, Richland Washington; Sun, Yannan; USA, Richland Washington; Martinez, Jayson J.; USA, Richland Washington; Fu, Tao; USA, Richland Washington; McMichael, Geoffrey A.; USA, Richland Washington; Carlson, Thomas J.; USA, Richland Washington
2014-11-27
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.
Li, Xinya; Deng, Z. Daniel; Sun, Yannan; Martinez, Jayson J.; Fu, Tao; McMichael, Geoffrey A.; Carlson, Thomas J.
2014-01-01
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature. PMID:25427517
Parallelizable adiabatic gate teleportation
NASA Astrophysics Data System (ADS)
Nakago, Kosuke; Hajdušek, Michal; Nakayama, Shojun; Murao, Mio
2015-12-01
To investigate how a temporally ordered gate sequence can be parallelized in adiabatic implementations of quantum computation, we modify adiabatic gate teleportation, a model of quantum computation proposed by Bacon and Flammia [Phys. Rev. Lett. 103, 120504 (2009), 10.1103/PhysRevLett.103.120504], to a form deterministically simulating parallelized gate teleportation, which is achievable only by postselection. We introduce a twisted Heisenberg-type interaction Hamiltonian, a Heisenberg-type spin interaction where the coordinates of the second qubit are twisted according to a unitary gate. We develop parallelizable adiabatic gate teleportation (PAGT) where a sequence of unitary gates is performed in a single step of the adiabatic process. In PAGT, numeric calculations suggest the necessary time for the adiabatic evolution implementing a sequence of L unitary gates increases at most as O (L5) . However, we show that it has the interesting property that it can map the temporal order of gates to the spatial order of interactions specified by the final Hamiltonian. Using this property, we present a controlled-PAGT scheme to manipulate the order of gates by a control qubit. In the controlled-PAGT scheme, two differently ordered sequential unitary gates F G and G F are coherently performed depending on the state of a control qubit by simultaneously applying the twisted Heisenberg-type interaction Hamiltonians implementing unitary gates F and G . We investigate why the twisted Heisenberg-type interaction Hamiltonian allows PAGT. We show that the twisted Heisenberg-type interaction Hamiltonian has an ability to perform a transposed unitary gate by just modifying the space ordering of the final Hamiltonian implementing a unitary gate in adiabatic gate teleportation. The dynamics generated by the time-reversed Hamiltonian represented by the transposed unitary gate enables deterministic simulation of a postselected event of parallelized gate teleportation in adiabatic
Approximate factorization with an elliptic pressure solver for incompressible flow
NASA Technical Reports Server (NTRS)
Bernard, R. S.; Thompson, J. F.
1982-01-01
Two-dimensional curvilinear coordinates are used to solve the incompressible Navier-Stokes equations, in conjunction with approximate factorization for the solution of the momentum equation and the successive overrelaxation by lines method for the solution of a Poisson equation for the pressure. The combined algorithm, although not fully explicit, is marginally stable at Reynolds numbers lower than 10,000 and time increments of 0.01. Pressure distributions calculated for attack angles of zero and 6 deg are of the same shape as the experimental curves, but are shifted to one side.
Improved implementation of the HLL approximate Riemann solver for one-dimensional open channel flows
Technology Transfer Automated Retrieval System (TEKTRAN)
Several new techniques are proposed to overcome the deficiencies in the conventional formulation of the approximate Riemann solvers for one-dimensional open channel flows, which include numerical imbalance and inaccuracy in the solution of discharge. The former arises in the case of irregular geomet...
An approximate Riemann solver for magnetohydrodynamics (that works in more than one dimension)
NASA Technical Reports Server (NTRS)
Powell, Kenneth G.
1994-01-01
An approximate Riemann solver is developed for the governing equations of ideal magnetohydrodynamics (MHD). The Riemann solver has an eight-wave structure, where seven of the waves are those used in previous work on upwind schemes for MHD, and the eighth wave is related to the divergence of the magnetic field. The structure of the eighth wave is not immediately obvious from the governing equations as they are usually written, but arises from a modification of the equations that is presented in this paper. The addition of the eighth wave allows multidimensional MHD problems to be solved without the use of staggered grids or a projection scheme, one or the other of which was necessary in previous work on upwind schemes for MHD. A test problem made up of a shock tube with rotated initial conditions is solved to show that the two-dimensional code yields answers consistent with the one-dimensional methods developed previously.
Lu, Yujie; Zhu, Banghe; Shen, Haiou; Rasmussen, John C; Wang, Ge; Sevick-Muraca, Eva M
2010-08-21
Fluorescence molecular imaging/tomography may play an important future role in preclinical research and clinical diagnostics. Time- and frequency-domain fluorescence imaging can acquire more measurement information than the continuous wave (CW) counterpart, improving the image quality of fluorescence molecular tomography. Although diffusion approximation (DA) theory has been extensively applied in optical molecular imaging, high-order photon migration models need to be further investigated to match quantitation provided by nuclear imaging. In this paper, a frequency-domain parallel adaptive finite element solver is developed with simplified spherical harmonics (SP(N)) approximations. To fully evaluate the performance of the SP(N) approximations, a fast time-resolved tetrahedron-based Monte Carlo fluorescence simulator suitable for complex heterogeneous geometries is developed using a convolution strategy to realize the simulation of the fluorescence excitation and emission. The validation results show that high-order SP(N) can effectively correct the modeling errors of the diffusion equation, especially when the tissues have high absorption characteristics or when high modulation frequency measurements are used. Furthermore, the parallel adaptive mesh evolution strategy improves the modeling precision and the simulation speed significantly on a realistic digital mouse phantom. This solver is a promising platform for fluorescence molecular tomography using high-order approximations to the radiative transfer equation. PMID:20671350
Approximate Harten-Lax-van Leer Riemann solvers for relativistic magnetohydrodynamics
NASA Astrophysics Data System (ADS)
Mignone, Andrea; Bodo, G.; Ugliano, M.
2012-11-01
We review a particular class of approximate Riemann solvers in the context of the equations of ideal relativistic magnetohydrodynamics. Commonly prefixed as Harten-Lax-van Leer (HLL), this family of solvers approaches the solution of the Riemann problem by providing suitable guesses to the outermots characteristic speeds, without any prior knowledge of the solution. By requiring consistency with the integral form of the conservation law, a simplified set of jump conditions with a reduced number of characteristic waves may be obtained. The degree of approximation crucially depends on the wave pattern used in prepresnting the Riemann fan arising from the initial discontinuity breakup. In the original HLL scheme, the solution is approximated by collapsing the full characteristic structure into a single average state enclosed by two outermost fast mangnetosonic speeds. On the other hand, HLLC and HLLD improves the accuracy of the solution by restoring the tangential and Alfvén modes therefore leading to a representation of the Riemann fan in terms of 3 and 5 waves, respectively.
Gorpas, Dimitris; Andersson-Engels, Stefan
2012-12-01
The solution of the forward problem in fluorescence molecular imaging strongly influences the successful convergence of the fluorophore reconstruction. The most common approach to meeting this problem has been to apply the diffusion approximation. However, this model is a first-order angular approximation of the radiative transfer equation, and thus is subject to some well-known limitations. This manuscript proposes a methodology that confronts these limitations by applying the radiative transfer equation in spatial regions in which the diffusion approximation gives decreased accuracy. The explicit integro differential equations that formulate this model were solved by applying the Galerkin finite element approximation. The required spatial discretization of the investigated domain was implemented through the Delaunay triangulation, while the azimuthal discretization scheme was used for the angular space. This model has been evaluated on two simulation geometries and the results were compared with results from an independent Monte Carlo method and the radiative transfer equation by calculating the absolute values of the relative errors between these models. The results show that the proposed forward solver can approximate the radiative transfer equation and the Monte Carlo method with better than 95% accuracy, while the accuracy of the diffusion approximation is approximately 10% lower. PMID:23208221
NASA Astrophysics Data System (ADS)
Jouvet, Guillaume
2015-04-01
In this paper, a multilayer generalisation of the Shallow Shelf Approximation (SSA) is considered. In this recent hybrid ice flow model, the ice thickness is divided into thin layers, which can spread out, contract and slide over each other in such a way that the velocity profile is layer-wise constant. Like the SSA (1-layer model), the multilayer model can be reformulated as a minimisation problem. However, unlike the SSA, the functional to be minimised involves a new penalisation term for the interlayer jumps of the velocity, which represents the vertical shear stresses induced by interlayer sliding. Taking advantage of this reformulation, numerical solvers developed for the SSA can be naturally extended layer-wise or column-wise. Numerical results show that the column-wise extension of a Newton multigrid solver proves to be robust in the sense that its convergence is barely influenced by the number of layers and the type of ice flow. In addition, the multilayer formulation appears to be naturally better conditioned than the one of the first-order approximation to face the anisotropic conditions of the sliding-dominant ice flow of ISMIP-HOM experiments.
Jouvet, Guillaume
2015-04-15
In this paper, a multilayer generalisation of the Shallow Shelf Approximation (SSA) is considered. In this recent hybrid ice flow model, the ice thickness is divided into thin layers, which can spread out, contract and slide over each other in such a way that the velocity profile is layer-wise constant. Like the SSA (1-layer model), the multilayer model can be reformulated as a minimisation problem. However, unlike the SSA, the functional to be minimised involves a new penalisation term for the interlayer jumps of the velocity, which represents the vertical shear stresses induced by interlayer sliding. Taking advantage of this reformulation, numerical solvers developed for the SSA can be naturally extended layer-wise or column-wise. Numerical results show that the column-wise extension of a Newton multigrid solver proves to be robust in the sense that its convergence is barely influenced by the number of layers and the type of ice flow. In addition, the multilayer formulation appears to be naturally better conditioned than the one of the first-order approximation to face the anisotropic conditions of the sliding-dominant ice flow of ISMIP-HOM experiments.
Nonlinear Solver Approaches for the Diffusive Wave Approximation to the Shallow Water Equations
NASA Astrophysics Data System (ADS)
Collier, N.; Knepley, M.
2015-12-01
The diffusive wave approximation to the shallow water equations (DSW) is a doubly-degenerate, nonlinear, parabolic partial differential equation used to model overland flows. Despite its challenges, the DSW equation has been extensively used to model the overland flow component of various integrated surface/subsurface models. The equation's complications become increasingly problematic when ponding occurs, a feature which becomes pervasive when solving on large domains with realistic terrain. In this talk I discuss the various forms and regularizations of the DSW equation and highlight their effect on the solvability of the nonlinear system. In addition to this analysis, I present results of a numerical study which tests the applicability of a class of composable nonlinear algebraic solvers recently added to the Portable, Extensible, Toolkit for Scientific Computation (PETSc).
A low-Mach number fix for Roe’s approximate Riemann solver
NASA Astrophysics Data System (ADS)
Rieper, Felix
2011-06-01
We present a low-Mach number fix for Roe's approximate Riemann solver (LMRoe). As the Mach number Ma tends to zero, solutions to the Euler equations converge to solutions of the incompressible equations. Yet, standard upwind schemes do not reproduce this convergence: the artificial viscosity grows like 1/Ma, leading to a loss of accuracy as Ma → 0. With a discrete asymptotic analysis of the Roe scheme we identify the responsible term: the jump in the normal velocity component Δ U of the Riemann problem. The remedy consists of reducing this term by one order of magnitude in terms of the Mach number. This is achieved by simply multiplying Δ U with the local Mach number. With an asymptotic analysis it is shown that all discrepancies between continuous and discrete asymptotics disappear, while, at the same time, checkerboard modes are suppressed. Low Mach number test cases show, first, that the accuracy of LMRoe is independent of the Mach number, second, that the solution converges to the incompressible limit for Ma → 0 on a fixed mesh and, finally, that the new scheme does not produce pressure checkerboard modes. High speed test cases demonstrate the fall back of the new scheme to the classical Roe scheme at moderate and high Mach numbers.
NASA Astrophysics Data System (ADS)
Regnier, D.; Verrière, M.; Dubray, N.; Schunck, N.
2016-03-01
We describe the software package FELIX that solves the equations of the time-dependent generator coordinate method (TDGCM) in N-dimensions (N ≥ 1) under the Gaussian overlap approximation. The numerical resolution is based on the Galerkin finite element discretization of the collective space and the Crank-Nicolson scheme for time integration. The TDGCM solver is implemented entirely in C++. Several additional tools written in C++, Python or bash scripting language are also included for convenience. In this paper, the solver is tested with a series of benchmarks calculations. We also demonstrate the ability of our code to handle a realistic calculation of fission dynamics.
Regnier, D.; Verriere, M.; Dubray, N.; Schunck, N.
2015-11-30
In this study, we describe the software package FELIX that solves the equations of the time-dependent generator coordinate method (TDGCM) in NN-dimensions (N ≥ 1) under the Gaussian overlap approximation. The numerical resolution is based on the Galerkin finite element discretization of the collective space and the Crank–Nicolson scheme for time integration. The TDGCM solver is implemented entirely in C++. Several additional tools written in C++, Python or bash scripting language are also included for convenience. In this paper, the solver is tested with a series of benchmarks calculations. We also demonstrate the ability of our code to handle a realistic calculation of fission dynamics.
Divergence-free approximate Riemann solver for the quasi-neutral two-fluid plasma model
NASA Astrophysics Data System (ADS)
Amano, Takanobu
2015-10-01
A numerical method for the quasi-neutral two-fluid (QNTF) plasma model is described. The basic equations are ion and electron fluid equations and the Maxwell equations without displacement current. The neglect of displacement current is consistent with the assumption of charge neutrality. Therefore, Langmuir waves and electromagnetic waves are eliminated from the system, which is in clear contrast to the fully electromagnetic two-fluid model. It thus reduces to the ideal magnetohydrodynamic (MHD) equations in the long wavelength limit, but the two-fluid effect appearing at ion and electron inertial scales is fully taken into account. It is shown that the basic equations may be rewritten in a form that has formally the same structure as the MHD equations. The total mass, momentum, and energy are all written in the conservative form. A new three-dimensional numerical simulation code has been developed for the QNTF equations. The HLL (Harten-Lax-van Leer) approximate Riemann solver combined with the upwind constrained transport (UCT) scheme is applied. The method was originally developed for MHD [25], but works quite well for the present model as well. The simulation code is able to capture sharp multidimensional discontinuities as well as dispersive waves arising from the two-fluid effect at small scales without producing ∇ ṡ B errors. It is well known that conventional Hall-MHD codes often suffer a numerical stability issue associated with short wavelength whistler waves. On the other hand, since finite electron inertia introduces an upper bound to the phase speed of whistler waves in the present model, our code is free from the issue even without explicit dissipation terms or implicit time integration. Numerical experiments have confirmed that there is no need to resolve characteristic time scales such as plasma frequency or cyclotron frequency for numerical stability. Consequently, the QNTF model offers a better alternative to the Hall-MHD or fully
String theory on parallelizable PP-waves
NASA Astrophysics Data System (ADS)
Sadri, Darius; Sheikh-Jabbari, Mohammad M.
2003-06-01
The most general parallelizable pp-wave backgrounds which are non-dilatonic solutions in the NS-NS sector of type IIA and IIB string theories are considered. We demonstrate that parallelizable pp-wave backgrounds are necessarily homogeneous plane-waves, and that a large class of homogeneous plane-waves are parallelizable, stating the necessary conditions. Such plane-waves can be classified according to the number of preserved supersymmetries. In type IIA, these include backgrounds preserving 16, 18, 20, 22 and 24 supercharges, while in the IIB case they preserve 16, 20, 24 or 28 supercharges. An intriguing property of parallelizable pp-wave backgrounds is that the bosonic part of these solutions are invariant under T-duality, while the number of supercharges might change under T-duality. Due to their alpha' exactness, they provide interesting backgrounds for studying string theory. Quantization of string modes, their compactification and behaviour under T-duality are studied. In addition, we consider BPS Dp-branes, and show that these Dp-branes can be classified in terms of the locations of their world volumes with respect to the background H-field.
Regnier, D.; Verriere, M.; Dubray, N.; Schunck, N.
2015-11-30
In this study, we describe the software package FELIX that solves the equations of the time-dependent generator coordinate method (TDGCM) in NN-dimensions (N ≥ 1) under the Gaussian overlap approximation. The numerical resolution is based on the Galerkin finite element discretization of the collective space and the Crank–Nicolson scheme for time integration. The TDGCM solver is implemented entirely in C++. Several additional tools written in C++, Python or bash scripting language are also included for convenience. In this paper, the solver is tested with a series of benchmarks calculations. We also demonstrate the ability of our code to handle amore » realistic calculation of fission dynamics.« less
NASA Technical Reports Server (NTRS)
Rumsey, Christopher L.; Van Leer, Bram; Roe, Philip L.
1991-01-01
A new two-dimensional approximate Riemann solver has been developed that obtains fluxes on grid faces via wave decomposition. By utilizing information propagation in the velocity-difference directions rather than in the grid-normal directions, this flux function more appropriately interprets and hence more sharply resolves shock and shear waves when they lie oblique to the grid. The model uses five waves to describe the difference in states at a grid face. Two acoustic waves, one shear wave, and one entropy wave propagate in the direction defined by the local velocity difference vector, while the fifth wave is a shear wave that propagates at a right angle to the other four. Test cases presented include a shock reflecting off a wall, a pure shear wave, supersonic flow over an airfoil, and viscous separated airfoil flow. Results using the new model give significantly sharper shock and shear contours than a grid-aligned solver. Navier-Stokes computations over an aifoil show reduced pressure distortions in the separated region as a result of the grid-independent upwinding.
Fast solvers for finite difference approximations for the Stokes and Navier-Stokes equations
Shin, D.
1992-01-01
The authors consider several methods for solving the linear equations arising from finite difference discretizations of the Stokes equations. The pressure equation method presented here for the first time, apparently, and the method, presented by Bramble and Pasciak, are shown to have computational effort that grows slowly with the number of grid points. The methods work with second-order accurate discretizations. Computational results are shown for both the Stokes and incompressible Navier-Stokes at low Reynolds number. The inf-sup conditions resulting from three finite difference approximations of the Stokes equations are proven. These conditions are used to prove that the Schur complement Q[sub h] of the linear system generated by each of these approximations is bounded uniformly away from zero. For the pressure equation method, this guarantees that the conjugate gradient method applied to Q[sub h] converges in a finite number of iterations which is independent of mesh size. The fact that Q[sub h] is bounded below is used to prove convergence estimates for the solutions generated by these finite difference approximations. One of the estimates is for a staggered grid and the estimate of the scheme shows that both the pressure and the velocity parts of the solution are second-order accurate. Iterative methods are compared by the use of the regularized central differencing introduced by Strikwerda. Several finite difference approximations of the Stokes equations by the SOR method are compared and the excellence of the approximations by the regularized central differencing over the other finite difference approximation is mentioned. This difference gives rise to a linear equation with a matrix which is slightly non-symmetric. The convergence of the typical steepest descent method and conjugate gradient method, which is almost as same as the typical conjugate gradient method, applied to slightly non-symmetric positive definite matrices are proven.
NASA Astrophysics Data System (ADS)
Yeckel, Andrew; Lun, Lisa; Derby, Jeffrey J.
2009-12-01
A new, approximate block Newton (ABN) method is derived and tested for the coupled solution of nonlinear models, each of which is treated as a modular, black box. Such an approach is motivated by a desire to maintain software flexibility without sacrificing solution efficiency or robustness. Though block Newton methods of similar type have been proposed and studied, we present a unique derivation and use it to sort out some of the more confusing points in the literature. In particular, we show that our ABN method behaves like a Newton iteration preconditioned by an inexact Newton solver derived from subproblem Jacobians. The method is demonstrated on several conjugate heat transfer problems modeled after melt crystal growth processes. These problems are represented by partitioned spatial regions, each modeled by independent heat transfer codes and linked by temperature and flux matching conditions at the boundaries common to the partitions. Whereas a typical block Gauss-Seidel iteration fails about half the time for the model problem, quadratic convergence is achieved by the ABN method under all conditions studied here. Additional performance advantages over existing methods are demonstrated and discussed.
Homman, Ahmed-Amine; Maillet, Jean-Bernard; Roussel, Julien; Stoltz, Gabriel
2016-01-14
This work presents new parallelizable numerical schemes for the integration of dissipative particle dynamics with energy conservation. So far, no numerical scheme introduced in the literature is able to correctly preserve the energy over long times and give rise to small errors on average properties for moderately small time steps, while being straightforwardly parallelizable. We present in this article two new methods, both straightforwardly parallelizable, allowing to correctly preserve the total energy of the system. We illustrate the accuracy and performance of these new schemes both on equilibrium and nonequilibrium parallel simulations. PMID:26772559
NASA Astrophysics Data System (ADS)
Homman, Ahmed-Amine; Maillet, Jean-Bernard; Roussel, Julien; Stoltz, Gabriel
2016-01-01
This work presents new parallelizable numerical schemes for the integration of dissipative particle dynamics with energy conservation. So far, no numerical scheme introduced in the literature is able to correctly preserve the energy over long times and give rise to small errors on average properties for moderately small time steps, while being straightforwardly parallelizable. We present in this article two new methods, both straightforwardly parallelizable, allowing to correctly preserve the total energy of the system. We illustrate the accuracy and performance of these new schemes both on equilibrium and nonequilibrium parallel simulations.
NASA Astrophysics Data System (ADS)
Tezaur, I. K.; Perego, M.; Salinger, A. G.; Tuminaro, R. S.; Price, S. F.
2015-04-01
This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, along with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.
Tezaur, I. K.; Perego, M.; Salinger, A. G.; Tuminaro, R. S.; Price, S. F.
2015-04-27
This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, alongmore » with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.« less
Tezaur, I. K.; Perego, M.; Salinger, A. G.; Tuminaro, R. S.; Price, S. F.
2015-04-27
This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, along with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring
NASA Technical Reports Server (NTRS)
Padovan, Joe; Kwang, Abel
1994-01-01
This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
García-Risueño, Pablo; Alberdi-Rodriguez, Joseba; Oliveira, Micael J T; Andrade, Xavier; Pippig, Michael; Muguerza, Javier; Arruabarrena, Agustin; Rubio, Angel
2014-03-01
We present an analysis of different methods to calculate the classical electrostatic Hartree potential created by charge distributions. Our goal is to provide the reader with an estimation on the performance-in terms of both numerical complexity and accuracy-of popular Poisson solvers, and to give an intuitive idea on the way these solvers operate. Highly parallelizable routines have been implemented in a first-principle simulation code (Octopus) to be used in our tests, so that reliable conclusions about the capability of methods to tackle large systems in cluster computing can be obtained from our work. PMID:24249048
Real-space method for highly parallelizable electronic transport calculations
NASA Astrophysics Data System (ADS)
Feldman, Baruch; Seideman, Tamar; Hod, Oded; Kronik, Leeor
2014-07-01
We present a real-space method for first-principles nanoscale electronic transport calculations. We use the nonequilibrium Green's function method with density functional theory and implement absorbing boundary conditions (ABCs, also known as complex absorbing potentials, or CAPs) to represent the effects of the semi-infinite leads. In real space, the Kohn-Sham Hamiltonian matrix is highly sparse. As a result, the transport problem parallelizes naturally and can scale favorably with system size, enabling the computation of conductance in relatively large molecular junction models. Our use of ABCs circumvents the demanding task of explicitly calculating the leads' self-energies from surface Green's functions, and is expected to be more accurate than the use of the jellium approximation. In addition, we take advantage of the sparsity in real space to solve efficiently for the Green's function over the entire energy range relevant to low-bias transport. We illustrate the advantages of our method with calculations on several challenging test systems and find good agreement with reference calculation results.
Stanley, Vendall S.; Heroux, Michael A.; Hoekstra, Robert J.; Sala, Marzio
2004-03-01
Amesos is the Direct Sparse Solver Package in Trilinos. The goal of Amesos is to make AX=S as easy as it sounds, at least for direct methods. Amesos provides interfaces to a number of third party sparse direct solvers, including SuperLU, SuperLU MPI, DSCPACK, UMFPACK and KLU. Amesos provides a common object oriented interface to the best sparse direct solvers in the world. A sparse direct solver solves for x in Ax = b. where A is a matrix and x and b are vectors (or multi-vectors). A sparse direct solver flrst factors A into trinagular matrices L and U such that A = LU via gaussian elimination and then solves LU x = b. Switching amongst solvers in Amesos roquires a change to a single parameter. Yet, no solver needs to be linked it, unless it is used. All conversions between the matrices provided by the user and the format required by the underlying solver is performed by Amesos. As new sparse direct solvers are created, they will be incorporated into Amesos, allowing the user to simpty link with the new solver, change a single parameter in the calling sequence, and use the new solver. Amesos allows users to specify whether the matrix has changed. Amesos can be used anywhere that any sparse direct solver is needed.
Energy Science and Technology Software Center (ESTSC)
2004-03-01
Amesos is the Direct Sparse Solver Package in Trilinos. The goal of Amesos is to make AX=S as easy as it sounds, at least for direct methods. Amesos provides interfaces to a number of third party sparse direct solvers, including SuperLU, SuperLU MPI, DSCPACK, UMFPACK and KLU. Amesos provides a common object oriented interface to the best sparse direct solvers in the world. A sparse direct solver solves for x in Ax = b. wheremore » A is a matrix and x and b are vectors (or multi-vectors). A sparse direct solver flrst factors A into trinagular matrices L and U such that A = LU via gaussian elimination and then solves LU x = b. Switching amongst solvers in Amesos roquires a change to a single parameter. Yet, no solver needs to be linked it, unless it is used. All conversions between the matrices provided by the user and the format required by the underlying solver is performed by Amesos. As new sparse direct solvers are created, they will be incorporated into Amesos, allowing the user to simpty link with the new solver, change a single parameter in the calling sequence, and use the new solver. Amesos allows users to specify whether the matrix has changed. Amesos can be used anywhere that any sparse direct solver is needed.« less
Solving block linear systems with low-rank off-diagonal blocks is easily parallelizable
Menkov, V.
1996-12-31
An easily and efficiently parallelizable direct method is given for solving a block linear system Bx = y, where B = D + Q is the sum of a non-singular block diagonal matrix D and a matrix Q with low-rank blocks. This implicitly defines a new preconditioning method with an operation count close to the cost of calculating a matrix-vector product Qw for some w, plus at most twice the cost of calculating Qw for some w. When implemented on a parallel machine the processor utilization can be as good as that of those operations. Order estimates are given for the general case, and an implementation is compared to block SSOR preconditioning.
NASA Technical Reports Server (NTRS)
Ilin, Andrew V.
2006-01-01
The Magnetic Field Solver computer program calculates the magnetic field generated by a group of collinear, cylindrical axisymmetric electromagnet coils. Given the current flowing in, and the number of turns, axial position, and axial and radial dimensions of each coil, the program calculates matrix coefficients for a finite-difference system of equations that approximates a two-dimensional partial differential equation for the magnetic potential contributed by the coil. The program iteratively solves these finite-difference equations by use of the modified incomplete Cholesky preconditioned-conjugate-gradient method. The total magnetic potential as a function of axial (z) and radial (r) position is then calculated as a sum of the magnetic potentials of the individual coils, using a high-accuracy interpolation scheme. Then the r and z components of the magnetic field as functions of r and z are calculated from the total magnetic potential by use of a high-accuracy finite-difference scheme. Notably, for the finite-difference calculations, the program generates nonuniform two-dimensional computational meshes from nonuniform one-dimensional meshes. Each mesh is generated in such a way as to minimize the numerical error for a benchmark one-dimensional magnetostatic problem.
Matrix decomposition graphics processing unit solver for Poisson image editing
NASA Astrophysics Data System (ADS)
Lei, Zhao; Wei, Li
2012-10-01
In recent years, gradient-domain methods have been widely discussed in the image processing field, including seamless cloning and image stitching. These algorithms are commonly carried out by solving a large sparse linear system: the Poisson equation. However, solving the Poisson equation is a computational and memory intensive task which makes it not suitable for real-time image editing. A new matrix decomposition graphics processing unit (GPU) solver (MDGS) is proposed to settle the problem. A matrix decomposition method is used to distribute the work among GPU threads, so that MDGS will take full advantage of the computing power of current GPUs. Additionally, MDGS is a hybrid solver (combines both the direct and iterative techniques) and has two-level architecture. These enable MDGS to generate identical solutions with those of the common Poisson methods and achieve high convergence rate in most cases. This approach is advantageous in terms of parallelizability, enabling real-time image processing, low memory-taken and extensive applications.
A Fast Poisson Solver with Periodic Boundary Conditions for GPU Clusters in Various Configurations
NASA Astrophysics Data System (ADS)
Rattermann, Dale Nicholas
Fast Poisson solvers using the Fast Fourier Transform on uniform grids are especially suited for parallel implementation, making them appropriate for portability on graphical processing unit (GPU) devices. The goal of the following work was to implement, test, and evaluate a fast Poisson solver for periodic boundary conditions for use on a variety of GPU configurations. The solver used in this research was FLASH, an immersed-boundary-based method, which is well suited for complex, time-dependent geometries, has robust adaptive mesh refinement/de-refinement capabilities to capture evolving flow structures, and has been successfully implemented on conventional, parallel supercomputers. However, these solvers are still computationally costly to employ, and the total solver time is dominated by the solution of the pressure Poisson equation using state-of-the-art multigrid methods. FLASH improves the performance of its multigrid solvers by integrating a parallel FFT solver on a uniform grid during a coarse level. This hybrid solver could then be theoretically improved by replacing the highly-parallelizable FFT solver with one that utilizes GPUs, and, thus, was the motivation for my research. In the present work, the CPU-utilizing parallel FFT solver (PFFT) used in the base version of FLASH for solving the Poisson equation on uniform grids has been modified to enable parallel execution on CUDA-enabled GPU devices. New algorithms have been implemented to replace the Poisson solver that decompose the computational domain and send each new block to a GPU for parallel computation. One-dimensional (1-D) decomposition of the computational domain minimizes the amount of network traffic involved in this bandwidth-intensive computation by limiting the amount of all-to-all communication required between processes. Advanced techniques have been incorporated and implemented in a GPU-centric code design, while allowing end users the flexibility of parameter control at runtime in
Kinetic simulation of fiber amplifier based on parallelizable and bidirectional algorithm
NASA Astrophysics Data System (ADS)
Chen, Haihuan; Yang, Huanbi; Wu, Wenhan
2015-10-01
The simulation of light waves propagating in fibers oppositely has to handle the extremely huge volume of data when employing sequential and unidirectional methods, where the simulation is in a coordinate system that moves along with the light waves. Therefore, alternative simulation algorithm should be used when calculating counter propagating light waves. Parallelizable and bidirectional (PB) algorithm simulates the light waves matching in time domain instead of space domain, does not need iteration, and permits efficient parallelization on multiple processors. The PB method is proposed to calculate the propagation of dispersing Gaussian pulse and a bit stream in fibers. However, PB method also has apparent advantages when simulating pulses in fiber laser amplifiers, which has not been investigated detailed yet. In this paper, we perform the simulation of pulses in a rare-earth-ions doped fiber amplifier. The influence of pump power, signal power, repetition rate, pulse width and fiber length on the amplifier's output average power, peak power, pulse energy and pulse shape are investigated. The results indicate that the PB method is effective when simulating high power amplification of pulses in fiber amplifier. Furthermore, nonlinear effects can be added into the simulation conveniently. The work in this paper will provide a more economic and efficient method to simulate power amplification of fiber lasers.
Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes
Popendorf, Kris; Tsuyoshi, Hachiya; Osana, Yasunori; Sakakibara, Yasubumi
2010-01-01
Background With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. Methodology/Principal Findings Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. Conclusions/Significance Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency
Parallel Multigrid Equation Solver
Energy Science and Technology Software Center (ESTSC)
2001-09-07
Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.
Kotulski, Joseph D.; Womble, David E.; Greenberg, David; Driessen, Brian
2004-03-01
PLIRIS is an object-oriented solver built on top of a previous matrix solver used in a number of application codes. Puns solves a linear system directly via LU factorization with partial pivoting. The user provides the linear system in terms of Epetra Objects including a matrix and right-hand-sides. The user can then factor the matrix and perform the forward and back solve at a later time or solve for multiple right-hand-sides at once. This package is used when dense matrices are obtained in the problem formulation. These dense matrices occur whenever boundary element techniques are chosen for the solution procedure. This has been used in electromagnetics for both static and frequency domain problems.
Energy Science and Technology Software Center (ESTSC)
2004-03-01
PLIRIS is an object-oriented solver built on top of a previous matrix solver used in a number of application codes. Puns solves a linear system directly via LU factorization with partial pivoting. The user provides the linear system in terms of Epetra Objects including a matrix and right-hand-sides. The user can then factor the matrix and perform the forward and back solve at a later time or solve for multiple right-hand-sides at once. This packagemore » is used when dense matrices are obtained in the problem formulation. These dense matrices occur whenever boundary element techniques are chosen for the solution procedure. This has been used in electromagnetics for both static and frequency domain problems.« less
A non-conforming 3D spherical harmonic transport solver
Van Criekingen, S.
2006-07-01
A new 3D transport solver for the time-independent Boltzmann transport equation has been developed. This solver is based on the second-order even-parity form of the transport equation. The angular discretization is performed through the expansion of the angular neutron flux in spherical harmonics (PN method). The novelty of this solver is the use of non-conforming finite elements for the spatial discretization. Such elements lead to a discontinuous flux approximation. This interface continuity requirement relaxation property is shared with mixed-dual formulations such as the ones based on Raviart-Thomas finite elements. Encouraging numerical results are presented. (authors)
Energy Science and Technology Software Center (ESTSC)
2007-03-01
HPCCG is a simple PDE application and preconditioned conjugate gradient solver that solves a linear system on a beam-shaped domain. Although it does not address many performance issues present in real engineering applications, such as load imbalance and preconditioner scalability, it can serve as a first "sanity test" of new processor design choices, inter-connect network design choices and the scalability of a new computer system. Because it is self-contained, easy to compile and easily scaledmore » to 100s or 1000s of porcessors, it can be an attractive study code for computer system designers.« less
Euler solvers for transonic applications
NASA Technical Reports Server (NTRS)
Vanleer, Bram
1989-01-01
The 1980s may well be called the Euler era of applied aerodynamics. Computer codes based on discrete approximations of the Euler equations are now routinely used to obtain solutions of transonic flow problems in which the effects of entropy and vorticity production are significant. Such codes can even predict separation from a sharp edge, owing to the inclusion of artificial dissipation, intended to lend numerical stability to the calculation but at the same time enforcing the Kutta condition. One effect not correctly predictable by Euler codes is the separation from a smooth surface, and neither is viscous drag; for these some form of the Navier-Stokes equation is needed. It, therefore, comes as no surprise to observe that the Navier-Stokes has already begun before Euler solutions were fully exploited. Moreover, most numerical developments for the Euler equations are now constrained by the requirement that the techniques introduced, notably artificial dissipation, must not interfere with the new physics added when going from an Euler to a full Navier-Stokes approximation. In order to appreciate the contributions of Euler solvers to the understanding of transonic aerodynamics, it is useful to review the components of these computational tools. Space discretization, time- or pseudo-time marching and boundary procedures, the essential constituents are discussed. The subject of grid generation and grid adaptation to the solution are touched upon only where relevant. A list of unanswered questions and an outlook for the future are covered.
Parallel tridiagonal equation solvers
NASA Technical Reports Server (NTRS)
Stone, H. S.
1974-01-01
Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.
Modiri, A; Gu, X; Sawant, A
2014-06-15
Purpose: We present a particle swarm optimization (PSO)-based 4D IMRT planning technique designed for dynamic MLC tracking delivery to lung tumors. The key idea is to utilize the temporal dimension as an additional degree of freedom rather than a constraint in order to achieve improved sparing of organs at risk (OARs). Methods: The target and normal structures were manually contoured on each of the ten phases of a 4DCT scan acquired from a lung SBRT patient who exhibited 1.5cm tumor motion despite the use of abdominal compression. Corresponding ten IMRT plans were generated using the Eclipse treatment planning system. These plans served as initial guess solutions for the PSO algorithm. Fluence weights were optimized over the entire solution space i.e., 10 phases × 12 beams × 166 control points. The size of the solution space motivated our choice of PSO, which is a highly parallelizable stochastic global optimization technique that is well-suited for such large problems. A summed fluence map was created using an in-house B-spline deformable image registration. Each plan was compared with a corresponding, internal target volume (ITV)-based IMRT plan. Results: The PSO 4D IMRT plan yielded comparable PTV coverage and significantly higher dose—sparing for parallel and serial OARs compared to the ITV-based plan. The dose-sparing achieved via PSO-4DIMRT was: lung Dmean = 28%; lung V20 = 90%; spinal cord Dmax = 23%; esophagus Dmax = 31%; heart Dmax = 51%; heart Dmean = 64%. Conclusion: Truly 4D IMRT that uses the temporal dimension as an additional degree of freedom can achieve significant dose sparing of serial and parallel OARs. Given the large solution space, PSO represents an attractive, parallelizable tool to achieve globally optimal solutions for such problems. This work was supported through funding from the National Institutes of Health and Varian Medical Systems. Amit Sawant has research funding from Varian Medical Systems, VisionRT Ltd. and Elekta.
Amesos2 Templated Direct Sparse Solver Package
Energy Science and Technology Software Center (ESTSC)
2011-05-24
Amesos2 is a templated direct sparse solver package. Amesos2 provides interfaces to direct sparse solvers, rather than providing native solver capabilities. Amesos2 is a derivative work of the Trilinos package Amesos.
Fast wavelet based sparse approximate inverse preconditioner
Wan, W.L.
1996-12-31
Incomplete LU factorization is a robust preconditioner for both general and PDE problems but unfortunately not easy to parallelize. Recent study of Huckle and Grote and Chow and Saad showed that sparse approximate inverse could be a potential alternative while readily parallelizable. However, for special class of matrix A that comes from elliptic PDE problems, their preconditioners are not optimal in the sense that independent of mesh size. A reason may be that no good sparse approximate inverse exists for the dense inverse matrix. Our observation is that for this kind of matrices, its inverse entries typically have piecewise smooth changes. We can take advantage of this fact and use wavelet compression techniques to construct a better sparse approximate inverse preconditioner. We shall show numerically that our approach is effective for this kind of matrices.
Sherlock Holmes, Master Problem Solver.
ERIC Educational Resources Information Center
Ballew, Hunter
1994-01-01
Shows the connections between Sherlock Holmes's investigative methods and mathematical problem solving, including observations, characteristics of the problem solver, importance of data, questioning the obvious, learning from experience, learning from errors, and indirect proof. (MKR)
Structured Multifrontal Sparse Solver
Energy Science and Technology Software Center (ESTSC)
2014-05-01
StruMF is an algebraic structured preconditioner for the interative solution of large sparse linear systems. The preconditioner corresponds to a multifrontal variant of sparse LU factorization in which some dense blocks of the factors are approximated with low-rank matrices. It is algebraic in that it only requires the linear system itself, and the approximation threshold that determines the accuracy of individual low-rank approximations. Favourable rank properties are obtained using a block partitioning which is amore » refinement of the partitioning induced by nested dissection ordering.« less
MILAMIN 2 - Fast MATLAB FEM solver
NASA Astrophysics Data System (ADS)
Dabrowski, Marcin; Krotkiewski, Marcin; Schmid, Daniel W.
2013-04-01
MILAMIN is a free and efficient MATLAB-based two-dimensional FEM solver utilizing unstructured meshes [Dabrowski et al., G-cubed (2008)]. The code consists of steady-state thermal diffusion and incompressible Stokes flow solvers implemented in approximately 200 lines of native MATLAB code. The brevity makes the code easily customizable. An important quality of MILAMIN is speed - it can handle millions of nodes within minutes on one CPU core of a standard desktop computer, and is faster than many commercial solutions. The new MILAMIN 2 allows three-dimensional modeling. It is designed as a set of functional modules that can be used as building blocks for efficient FEM simulations using MATLAB. The utilities are largely implemented as native MATLAB functions. For performance critical parts we use MUTILS - a suite of compiled MEX functions optimized for shared memory multi-core computers. The most important features of MILAMIN 2 are: 1. Modular approach to defining, tracking, and discretizing the geometry of the model 2. Interfaces to external mesh generators (e.g., Triangle, Fade2d, T3D) and mesh utilities (e.g., element type conversion, fast point location, boundary extraction) 3. Efficient computation of the stiffness matrix for a wide range of element types, anisotropic materials and three-dimensional problems 4. Fast global matrix assembly using a dedicated MEX function 5. Automatic integration rules 6. Flexible prescription (spatial, temporal, and field functions) and efficient application of Dirichlet, Neuman, and periodic boundary conditions 7. Treatment of transient and non-linear problems 8. Various iterative and multi-level solution strategies 9. Post-processing tools (e.g., numerical integration) 10. Visualization primitives using MATLAB, and VTK export functions We provide a large number of examples that show how to implement a custom FEM solver using the MILAMIN 2 framework. The examples are MATLAB scripts of increasing complexity that address a given
Self-correcting Multigrid Solver
Jerome L.V. Lewandowski
2004-06-29
A new multigrid algorithm based on the method of self-correction for the solution of elliptic problems is described. The method exploits information contained in the residual to dynamically modify the source term (right-hand side) of the elliptic problem. It is shown that the self-correcting solver is more efficient at damping the short wavelength modes of the algebraic error than its standard equivalent. When used in conjunction with a multigrid method, the resulting solver displays an improved convergence rate with no additional computational work.
Scalable Parallel Algebraic Multigrid Solvers
Bank, R; Lu, S; Tong, C; Vassilevski, P
2005-03-23
The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.
General complex polynomial root solver
NASA Astrophysics Data System (ADS)
Skowron, J.; Gould, A.
2012-12-01
This general complex polynomial root solver, implemented in Fortran and further optimized for binary microlenses, uses a new algorithm to solve polynomial equations and is 1.6-3 times faster than the ZROOTS subroutine that is commercially available from Numerical Recipes, depending on application. The largest improvement, when compared to naive solvers, comes from a fail-safe procedure that permits skipping the majority of the calculations in the great majority of cases, without risking catastrophic failure in the few cases that these are actually required.
NASA Technical Reports Server (NTRS)
Mineck, Raymond E.; Thomas, James L.; Biedron, Robert T.; Diskin, Boris
2005-01-01
FMG3D (full multigrid 3 dimensions) is a pilot computer program that solves equations of fluid flow using a finite difference representation on a structured grid. Infrastructure exists for three dimensions but the current implementation treats only two dimensions. Written in Fortran 90, FMG3D takes advantage of the recursive subroutine feature, dynamic memory allocation, and structured-programming constructs of that language. FMG3D supports multi-block grids with three types of block-to-block interfaces: periodic, C-zero, and C-infinity. For all three types, grid points must match at interfaces. For periodic and C-infinity types, derivatives of grid metrics must be continuous at interfaces. The available equation sets are as follows: scalar elliptic equations, scalar convection equations, and the pressure-Poisson formulation of the Navier-Stokes equations for an incompressible fluid. All the equation sets are implemented with nonzero forcing functions to enable the use of user-specified solutions to assist in verification and validation. The equations are solved with a full multigrid scheme using a full approximation scheme to converge the solution on each succeeding grid level. Restriction to the next coarser mesh uses direct injection for variables and full weighting for residual quantities; prolongation of the coarse grid correction from the coarse mesh to the fine mesh uses bilinear interpolation; and prolongation of the coarse grid solution uses bicubic interpolation.
Time-domain Raman analytical forward solvers.
Martelli, Fabrizio; Binzoni, Tiziano; Sekar, Sanathana Konugolu Venkata; Farina, Andrea; Cavalieri, Stefano; Pifferi, Antonio
2016-09-01
A set of time-domain analytical forward solvers for Raman signals detected from homogeneous diffusive media is presented. The time-domain solvers have been developed for two geometries: the parallelepiped and the finite cylinder. The potential presence of a background fluorescence emission, contaminating the Raman signal, has also been taken into account. All the solvers have been obtained as solutions of the time dependent diffusion equation. The validation of the solvers has been performed by means of comparisons with the results of "gold standard" Monte Carlo simulations. These forward solvers provide an accurate tool to explore the information content encoded in the time-resolved Raman measurements. PMID:27607645
Linear iterative solvers for implicit ODE methods
NASA Technical Reports Server (NTRS)
Saylor, Paul E.; Skeel, Robert D.
1990-01-01
The numerical solution of stiff initial value problems, which lead to the problem of solving large systems of mildly nonlinear equations are considered. For many problems derived from engineering and science, a solution is possible only with methods derived from iterative linear equation solvers. A common approach to solving the nonlinear equations is to employ an approximate solution obtained from an explicit method. The error is examined to determine how it is distributed among the stiff and non-stiff components, which bears on the choice of an iterative method. The conclusion is that error is (roughly) uniformly distributed, a fact that suggests the Chebyshev method (and the accompanying Manteuffel adaptive parameter algorithm). This method is described, also commenting on Richardson's method and its advantages for large problems. Richardson's method and the Chebyshev method with the Mantueffel algorithm are applied to the solution of the nonlinear equations by Newton's method.
A generalized gyrokinetic Poisson solver
Lin, Z.; Lee, W.W.
1995-03-01
A generalized gyrokinetic Poisson solver has been developed, which employs local operations in the configuration space to compute the polarization density response. The new technique is based on the actual physical process of gyrophase-averaging. It is useful for nonlocal simulations using general geometry equilibrium. Since it utilizes local operations rather than the global ones such as FFT, the new method is most amenable to massively parallel algorithms.
On unstructured grids and solvers
NASA Technical Reports Server (NTRS)
Barth, T. J.
1990-01-01
The fundamentals and the state-of-the-art technology for unstructured grids and solvers are highlighted. Algorithms and techniques pertinent to mesh generation are discussed. It is shown that grid generation and grid manipulation schemes rely on fast multidimensional searching. Flow solution techniques for the Euler equations, which can be derived from the integral form of the equations are discussed. Sample calculations are also provided.
Parallelized solvers for heat conduction formulations
NASA Technical Reports Server (NTRS)
Padovan, Joe; Kwang, Abel
1991-01-01
Based on multilevel partitioning, this paper develops a structural parallelizable solution methodology that enables a significant reduction in computational effort and memory requirements for very large scale linear and nonlinear steady and transient thermal (heat conduction) models. Due to the generality of the formulation of the scheme, both finite element and finite difference simulations can be treated. Diverse model topologies can thus be handled, including both simply and multiply connected (branched/perforated) geometries. To verify the methodology, analytical and numerical benchmark trends are verified in both sequential and parallel computer environments.
Benchmarking ICRF Full-wave Solvers for ITER
R. V. Budny, L. Berry, R. Bilato, P. Bonoli, M. Brambilla, R. J. Dumont, A. Fukuyama, R. Harvey, E. F. Jaeger, K. Indireshkumar, E. Lerche, D. McCune, C. K. Phillips, V. Vdovin, J. Wright, and members of the ITPA-IOS
2011-01-06
Abstract Benchmarking of full-wave solvers for ICRF simulations is performed using plasma profiles and equilibria obtained from integrated self-consistent modeling predictions of four ITER plasmas. One is for a high performance baseline (5.3 T, 15 MA) DT H-mode. The others are for half-field, half-current plasmas of interest for the pre-activation phase with bulk plasma ion species being either hydrogen or He4. The predicted profiles are used by six full-wave solver groups to simulate the ICRF electromagnetic fields and heating, and by three of these groups to simulate the current-drive. Approximate agreement is achieved for the predicted heating power for the DT and He4 cases. Factor of two disagreements are found for the cases with second harmonic He3 heating in bulk H cases. Approximate agreement is achieved simulating the ICRF current drive.
Assessment of linear finite-difference Poisson-Boltzmann solvers.
Wang, Jun; Luo, Ray
2010-06-01
CPU time and memory usage are two vital issues that any numerical solvers for the Poisson-Boltzmann equation have to face in biomolecular applications. In this study, we systematically analyzed the CPU time and memory usage of five commonly used finite-difference solvers with a large and diversified set of biomolecular structures. Our comparative analysis shows that modified incomplete Cholesky conjugate gradient and geometric multigrid are the most efficient in the diversified test set. For the two efficient solvers, our test shows that their CPU times increase approximately linearly with the numbers of grids. Their CPU times also increase almost linearly with the negative logarithm of the convergence criterion at very similar rate. Our comparison further shows that geometric multigrid performs better in the large set of tested biomolecules. However, modified incomplete Cholesky conjugate gradient is superior to geometric multigrid in molecular dynamics simulations of tested molecules. We also investigated other significant components in numerical solutions of the Poisson-Boltzmann equation. It turns out that the time-limiting step is the free boundary condition setup for the linear systems for the selected proteins if the electrostatic focusing is not used. Thus, development of future numerical solvers for the Poisson-Boltzmann equation should balance all aspects of the numerical procedures in realistic biomolecular applications. PMID:20063271
Assessment of Linear Finite-Difference Poisson-Boltzmann Solvers
Wang, Jun; Luo, Ray
2009-01-01
CPU time and memory usage are two vital issues that any numerical solvers for the Poisson-Boltzmann equation have to face in biomolecular applications. In this study we systematically analyzed the CPU time and memory usage of five commonly used finite-difference solvers with a large and diversified set of biomolecular structures. Our comparative analysis shows that modified incomplete Cholesky conjugate gradient and geometric multigrid are the most efficient in the diversified test set. For the two efficient solvers, our test shows that their CPU times increase approximately linearly with the numbers of grids. Their CPU times also increase almost linearly with the negative logarithm of the convergence criterion at very similar rate. Our comparison further shows that geometric multigrid performs better in the large set of tested biomolecules. However, modified incomplete Cholesky conjugate gradient is superior to geometric multigrid in molecular dynamics simulations of tested molecules. We also investigated other significant components in numerical solutions of the Poisson-Boltzmann equation. It turns out that the time-limiting step is the free boundary condition setup for the linear systems for the selected proteins if the electrostatic focusing is not used. Thus, development of future numerical solvers for the Poisson-Boltzmann equation should balance all aspects of the numerical procedures in realistic biomolecular applications. PMID:20063271
NASA Astrophysics Data System (ADS)
Pelanti, Marica; Bouchut, François; Mangeney, Anne
2011-02-01
We present a Riemann solver derived by a relaxation technique for classical single-phase shallow flow equations and for a two-phase shallow flow model describing a mixture of solid granular material and fluid. Our primary interest is the numerical approximation of this two-phase solid/fluid model, whose complexity poses numerical difficulties that cannot be efficiently addressed by existing solvers. In particular, we are concerned with ensuring a robust treatment of dry bed states. The relaxation system used by the proposed solver is formulated by introducing auxiliary variables that replace the momenta in the spatial gradients of the original model systems. The resulting relaxation solver is related to Roe solver in that its Riemann solution for the flow height and relaxation variables is formally computed as Roe's Riemann solution. The relaxation solver has the advantage of a certain degree of freedom in the specification of the wave structure through the choice of the relaxation parameters. This flexibility can be exploited to handle robustly vacuum states, which is a well known difficulty of standard Roe's method, while maintaining Roe's low diffusivity. For the single-phase model positivity of flow height is rigorously preserved. For the two-phase model positivity of volume fractions in general is not ensured, and a suitable restriction on the CFL number might be needed. Nonetheless, numerical experiments suggest that the proposed two-phase flow solver efficiently models wet/dry fronts and vacuum formation for a large range of flow conditions. As a corollary of our study, we show that for single-phase shallow flow equations the relaxation solver is formally equivalent to the VFRoe solver with conservative variables of Gallouët and Masella [T. Gallouët, J.-M. Masella, Un schéma de Godunov approché C.R. Acad. Sci. Paris, Série I, 323 (1996) 77-84]. The relaxation interpretation allows establishing positivity conditions for this VFRoe method.
Finite Element Interface to Linear Solvers
Williams, Alan
2005-03-18
Sparse systems of linear equations arise in many engineering applications, including finite elements, finite volumes, and others. The solution of linear systems is often the most computationally intensive portion of the application. Depending on the complexity of problems addressed by the application, there may be no single solver capable of solving all of the linear systems that arise. This motivates the desire to switch an application from one solver librwy to another, depending on the problem being solved. The interfaces provided by solver libraries differ greatly, making it difficult to switch an application code from one library to another. The amount of library-specific code in an application Can be greatly reduced by having an abstraction layer between solver libraries and the application, putting a common "face" on various solver libraries. One such abstraction layer is the Finite Element Interface to Linear Solvers (EEl), which has seen significant use by finite element applications at Sandia National Laboratories and Lawrence Livermore National Laboratory.
Analysis Tools for CFD Multigrid Solvers
NASA Technical Reports Server (NTRS)
Mineck, Raymond E.; Thomas, James L.; Diskin, Boris
2004-01-01
Analysis tools are needed to guide the development and evaluate the performance of multigrid solvers for the fluid flow equations. Classical analysis tools, such as local mode analysis, often fail to accurately predict performance. Two-grid analysis tools, herein referred to as Idealized Coarse Grid and Idealized Relaxation iterations, have been developed and evaluated within a pilot multigrid solver. These new tools are applicable to general systems of equations and/or discretizations and point to problem areas within an existing multigrid solver. Idealized Relaxation and Idealized Coarse Grid are applied in developing textbook-efficient multigrid solvers for incompressible stagnation flow problems.
The impact of improved sparse linear solvers on industrial engineering applications
Heroux, M.; Baddourah, M.; Poole, E.L.; Yang, Chao Wu
1996-12-31
There are usually many factors that ultimately determine the quality of computer simulation for engineering applications. Some of the most important are the quality of the analytical model and approximation scheme, the accuracy of the input data and the capability of the computing resources. However, in many engineering applications the characteristics of the sparse linear solver are the key factors in determining how complex a problem a given application code can solve. Therefore, the advent of a dramatically improved solver often brings with it dramatic improvements in our ability to do accurate and cost effective computer simulations. In this presentation we discuss the current status of sparse iterative and direct solvers in several key industrial CFD and structures codes, and show the impact that recent advances in linear solvers have made on both our ability to perform challenging simulations and the cost of those simulations. We also present some of the current challenges we have and the constraints we face in trying to improve these solvers. Finally, we discuss future requirements for sparse linear solvers on high performance architectures and try to indicate the opportunities that exist if we can develop even more improvements in linear solver capabilities.
NITSOL: A Newton iterative solver for nonlinear systems
Pernice, M.; Walker, H.F.
1996-12-31
Newton iterative methods, also known as truncated Newton methods, are implementations of Newton`s method in which the linear systems that characterize Newton steps are solved approximately using iterative linear algebra methods. Here, we outline a well-developed Newton iterative algorithm together with a Fortran implementation called NITSOL. The basic algorithm is an inexact Newton method globalized by backtracking, in which each initial trial step is determined by applying an iterative linear solver until an inexact Newton criterion is satisfied. In the implementation, the user can specify inexact Newton criteria in several ways and select an iterative linear solver from among several popular {open_quotes}transpose-free{close_quotes} Krylov subspace methods. Jacobian-vector products used by the Krylov solver can be either evaluated analytically with a user-supplied routine or approximated using finite differences of function values. A flexible interface permits a wide variety of preconditioning strategies and allows the user to define a preconditioner and optionally update it periodically. We give details of these and other features and demonstrate the performance of the implementation on a representative set of test problems.
A spectral Poisson solver for kinetic plasma simulation
NASA Astrophysics Data System (ADS)
Szeremley, Daniel; Obberath, Jens; Brinkmann, Ralf
2011-10-01
Plasma resonance spectroscopy is a well established plasma diagnostic method, realized in several designs. One of these designs is the multipole resonance probe (MRP). In its idealized - geometrically simplified - version it consists of two dielectrically shielded, hemispherical electrodes to which an RF signal is applied. A numerical tool is under development which is capable of simulating the dynamics of the plasma surrounding the MRP in electrostatic approximation. In this contribution we concentrate on the specialized Poisson solver for that tool. The plasma is represented by an ensemble of point charges. By expanding both the charge density and the potential into spherical harmonics, a largely analytical solution of the Poisson problem can be employed. For a practical implementation, the expansion must be appropriately truncated. With this spectral solver we are able to efficiently solve the Poisson equation in a kinetic plasma simulation without the need of introducing a spatial discretization.
Elliptic Solvers with Adaptive Mesh Refinement on Complex Geometries
Phillip, B.
2000-07-24
Adaptive Mesh Refinement (AMR) is a numerical technique for locally tailoring the resolution computational grids. Multilevel algorithms for solving elliptic problems on adaptive grids include the Fast Adaptive Composite grid method (FAC) and its parallel variants (AFAC and AFACx). Theory that confirms the independence of the convergence rates of FAC and AFAC on the number of refinement levels exists under certain ellipticity and approximation property conditions. Similar theory needs to be developed for AFACx. The effectiveness of multigrid-based elliptic solvers such as FAC, AFAC, and AFACx on adaptively refined overlapping grids is not clearly understood. Finally, a non-trivial eye model problem will be solved by combining the power of using overlapping grids for complex moving geometries, AMR, and multilevel elliptic solvers.
MACSYMA's symbolic ordinary differential equation solver
NASA Technical Reports Server (NTRS)
Golden, J. P.
1977-01-01
The MACSYMA's symbolic ordinary differential equation solver ODE2 is described. The code for this routine is delineated, which is of interest because it is written in top-level MACSYMA language, and may serve as a good example of programming in that language. Other symbolic ordinary differential equation solvers are mentioned.
KLU2 Direct Linear Solver Package
Energy Science and Technology Software Center (ESTSC)
2012-01-04
KLU2 is a direct sparse solver for solving unsymmetric linear systems. It is related to the existing KLU solver, (in Amesos package and also as a stand-alone package from University of Florida) but provides template support for scalar and ordinal types. It uses a left looking LU factorization method.
Improving Resource-Unaware SAT Solvers
NASA Astrophysics Data System (ADS)
Hölldobler, Steffen; Manthey, Norbert; Saptawijaya, Ari
The paper discusses cache utilization in state-of-the-art SAT solvers. The aim of the study is to show how a resource-unaware SAT solver can be improved by utilizing the cache sensibly. The analysis is performed on a CDCL-based SAT solver using a subset of the industrial SAT Competition 2009 benchmark. For the analysis, the total cycles, the resource stall cycles, the L2 cache hits and the L2 cache misses are traced using sample based profiling. Based on the analysis, several techniques - some of which have not been used in SAT solvers so far - are proposed resulting in a combined speedup up to 83% without affecting the search path of the solver. The average speedup on the benchmark is 60%. The new techniques are also applied to MiniSAT2.0 improving its runtime by 20% on average.
Belos Block Linear Solvers Package
Energy Science and Technology Software Center (ESTSC)
2004-03-01
Belos is an extensible and interoperable framework for large-scale, iterative methods for solving systems of linear equations with multiple right-hand sides. The motivation for this framework is to provide a generic interface to a collection of algorithms for solving large-scale linear systems. Belos is interoperable because both the matrix and vectors are considered to be opaque objects--only knowledge of the matrix and vectors via elementary operations is necessary. An implementation of Balos is accomplished viamore » the use of interfaces. One of the goals of Belos is to allow the user flexibility in specifying the data representation for the matrix and vectors and so leverage any existing software investment. The algorithms that will be included in package are Krylov-based linear solvers, like Block GMRES (Generalized Minimal RESidual) and Block CG (Conjugate-Gradient).« less
A robust multilevel simultaneous eigenvalue solver
NASA Technical Reports Server (NTRS)
Costiner, Sorin; Taasan, Shlomo
1993-01-01
Multilevel (ML) algorithms for eigenvalue problems are often faced with several types of difficulties such as: the mixing of approximated eigenvectors by the solution process, the approximation of incomplete clusters of eigenvectors, the poor representation of solution on coarse levels, and the existence of close or equal eigenvalues. Algorithms that do not treat appropriately these difficulties usually fail, or their performance degrades when facing them. These issues motivated the development of a robust adaptive ML algorithm which treats these difficulties, for the calculation of a few eigenvectors and their corresponding eigenvalues. The main techniques used in the new algorithm include: the adaptive completion and separation of the relevant clusters on different levels, the simultaneous treatment of solutions within each cluster, and the robustness tests which monitor the algorithm's efficiency and convergence. The eigenvectors' separation efficiency is based on a new ML projection technique generalizing the Rayleigh Ritz projection, combined with a technique, the backrotations. These separation techniques, when combined with an FMG formulation, in many cases lead to algorithms of O(qN) complexity, for q eigenvectors of size N on the finest level. Previously developed ML algorithms are less focused on the mentioned difficulties. Moreover, algorithms which employ fine level separation techniques are of O(q(sub 2)N) complexity and usually do not overcome all these difficulties. Computational examples are presented where Schrodinger type eigenvalue problems in 2-D and 3-D, having equal and closely clustered eigenvalues, are solved with the efficiency of the Poisson multigrid solver. A second order approximation is obtained in O(qN) work, where the total computational work is equivalent to only a few fine level relaxations per eigenvector.
Approximating the Generalized Voronoi Diagram of Closely Spaced Objects
Edwards, John; Daniel, Eric; Pascucci, Valerio; Bajaj, Chandrajit
2015-06-22
We present an algorithm to compute an approximation of the generalized Voronoi diagram (GVD) on arbitrary collections of 2D or 3D geometric objects. In particular, we focus on datasets with closely spaced objects; GVD approximation is expensive and sometimes intractable on these datasets using previous algorithms. With our approach, the GVD can be computed using commodity hardware even on datasets with many, extremely tightly packed objects. Our approach is to subdivide the space with an octree that is represented with an adjacency structure. We then use a novel adaptive distance transform to compute the distance function on octree vertices. The computed distance field is sampled more densely in areas of close object spacing, enabling robust and parallelizable GVD surface generation. We demonstrate our method on a variety of data and show example applications of the GVD in 2D and 3D.
Approximating the Generalized Voronoi Diagram of Closely Spaced Objects
Edwards, John; Daniel, Eric; Pascucci, Valerio; Bajaj, Chandrajit
2016-01-01
We present an algorithm to compute an approximation of the generalized Voronoi diagram (GVD) on arbitrary collections of 2D or 3D geometric objects. In particular, we focus on datasets with closely spaced objects; GVD approximation is expensive and sometimes intractable on these datasets using previous algorithms. With our approach, the GVD can be computed using commodity hardware even on datasets with many, extremely tightly packed objects. Our approach is to subdivide the space with an octree that is represented with an adjacency structure. We then use a novel adaptive distance transform to compute the distance function on octree vertices. The computed distance field is sampled more densely in areas of close object spacing, enabling robust and parallelizable GVD surface generation. We demonstrate our method on a variety of data and show example applications of the GVD in 2D and 3D. PMID:27540272
ALPS - A LINEAR PROGRAM SOLVER
NASA Technical Reports Server (NTRS)
Viterna, L. A.
1994-01-01
Linear programming is a widely-used engineering and management tool. Scheduling, resource allocation, and production planning are all well-known applications of linear programs (LP's). Most LP's are too large to be solved by hand, so over the decades many computer codes for solving LP's have been developed. ALPS, A Linear Program Solver, is a full-featured LP analysis program. ALPS can solve plain linear programs as well as more complicated mixed integer and pure integer programs. ALPS also contains an efficient solution technique for pure binary (0-1 integer) programs. One of the many weaknesses of LP solvers is the lack of interaction with the user. ALPS is a menu-driven program with no special commands or keywords to learn. In addition, ALPS contains a full-screen editor to enter and maintain the LP formulation. These formulations can be written to and read from plain ASCII files for portability. For those less experienced in LP formulation, ALPS contains a problem "parser" which checks the formulation for errors. ALPS creates fully formatted, readable reports that can be sent to a printer or output file. ALPS is written entirely in IBM's APL2/PC product, Version 1.01. The APL2 workspace containing all the ALPS code can be run on any APL2/PC system (AT or 386). On a 32-bit system, this configuration can take advantage of all extended memory. The user can also examine and modify the ALPS code. The APL2 workspace has also been "packed" to be run on any DOS system (without APL2) as a stand-alone "EXE" file, but has limited memory capacity on a 640K system. A numeric coprocessor (80X87) is optional but recommended. The standard distribution medium for ALPS is a 5.25 inch 360K MS-DOS format diskette. IBM, IBM PC and IBM APL2 are registered trademarks of International Business Machines Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
GARDNER, P.R.
2006-04-01
Sudoku, also known as Number Place, is a logic-based placement puzzle. The aim of the puzzle is to enter a numerical digit from 1 through 9 in each cell of a 9 x 9 grid made up of 3 x 3 subgrids (called ''regions''), starting with various digits given in some cells (the ''givens''). Each row, column, and region must contain only one instance of each numeral. Completing the puzzle requires patience and logical ability. Although first published in a U.S. puzzle magazine in 1979, Sudoku initially caught on in Japan in 1986 and attained international popularity in 2005. Last fall, after noticing Sudoku puzzles in some newspapers and magazines, I attempted a few just to see how hard they were. Of course, the difficulties varied considerably. ''Obviously'' one could use Trial and Error but all the advice was to ''Use Logic''. Thinking to flex, and strengthen, those powers, I began to tackle the puzzles systematically. That is, when I discovered a new tactical rule, I would write it down, eventually generating a list of ten or so, with some having overlap. They served pretty well except for the more difficult puzzles, but even then I managed to develop an additional three rules that covered all of them until I hit the Oregonian puzzle shown. With all of my rules, I could not seem to solve that puzzle. Initially putting my failure down to rapid mental fatigue (being unable to hold a sufficient quantity of information in my mind at one time), I decided to write a program to implement my rules and see what I had failed to notice earlier. The solver, too, failed. That is, my rules were insufficient to solve that particular puzzle. I happened across a book written by a fellow who constructs such puzzles and who claimed that, sometimes, the only tactic left was trial and error. With a trial and error routine implemented, my solver successfully completed the Oregonian puzzle, and has successfully solved every puzzle submitted to it since.
SIERRA framework version 4 : solver services.
Williams, Alan B.
2005-02-01
Several SIERRA applications make use of third-party libraries to solve systems of linear and nonlinear equations, and to solve eigenproblems. The classes and interfaces in the SIERRA framework that provide linear system assembly services and access to solver libraries are collectively referred to as solver services. This paper provides an overview of SIERRA's solver services including the design goals that drove the development, and relationships and interactions among the various classes. The process of assembling and manipulating linear systems will be described, as well as access to solution methods and other operations.
A scalable 2-D parallel sparse solver
Kothari, S.C.; Mitra, S.
1995-12-01
Scalability beyond a small number of processors, typically 32 or less, is known to be a problem for existing parallel general sparse (PGS) direct solvers. This paper presents a parallel general sparse PGS direct solver for general sparse linear systems on distributed memory machines. The algorithm is based on the well-known sequential sparse algorithm Y12M. To achieve efficient parallelization, a 2-D scattered decomposition of the sparse matrix is used. The proposed algorithm is more scalable than existing parallel sparse direct solvers. Its scalability is evaluated on a 256 processor nCUBE2s machine using Boeing/Harwell benchmark matrices.
NASA Technical Reports Server (NTRS)
Ferencz, Donald C.; Viterna, Larry A.
1991-01-01
ALPS is a computer program which can be used to solve general linear program (optimization) problems. ALPS was designed for those who have minimal linear programming (LP) knowledge and features a menu-driven scheme to guide the user through the process of creating and solving LP formulations. Once created, the problems can be edited and stored in standard DOS ASCII files to provide portability to various word processors or even other linear programming packages. Unlike many math-oriented LP solvers, ALPS contains an LP parser that reads through the LP formulation and reports several types of errors to the user. ALPS provides a large amount of solution data which is often useful in problem solving. In addition to pure linear programs, ALPS can solve for integer, mixed integer, and binary type problems. Pure linear programs are solved with the revised simplex method. Integer or mixed integer programs are solved initially with the revised simplex, and the completed using the branch-and-bound technique. Binary programs are solved with the method of implicit enumeration. This manual describes how to use ALPS to create, edit, and solve linear programming problems. Instructions for installing ALPS on a PC compatible computer are included in the appendices along with a general introduction to linear programming. A programmers guide is also included for assistance in modifying and maintaining the program.
Parallelizing alternating direction implicit solver on GPUs
Technology Transfer Automated Retrieval System (TEKTRAN)
We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
NASA Astrophysics Data System (ADS)
Willemsen, Bram; Malcolm, Alison; Lewis, Winston
2016-03-01
In a set of problems ranging from 4-D seismic to salt boundary estimation, updates to the velocity model often have a highly localized nature. Numerical techniques for these applications such as full-waveform inversion (FWI) require an estimate of the wavefield to compute the model updates. When dealing with localized problems, it is wasteful to compute these updates in the global domain, when we only need them in our region of interest. This paper introduces a local solver that generates forward and adjoint wavefields which are, to machine precision, identical to those generated by a full-domain solver evaluated within the region of interest. This means that the local solver computes all interactions between model updates within the region of interest and the inhomogeneities in the background model outside. Because no approximations are made in the calculation of the forward and adjoint wavefields, the local solver can compute the identical gradient in the region of interest as would be computed by the more expensive full-domain solver. In this paper, the local solver is used to efficiently generate the FWI gradient at the boundary of a salt body. This gradient is then used in a level set method to automatically update the salt boundary.
Optimization of solver for gas flow modeling
NASA Astrophysics Data System (ADS)
Savichkin, D.; Dodulad, O.; Kloss, Yu
2014-05-01
The main purpose of the work is optimization of the solver for rarefied gas flow modeling based on the Boltzmann equation. Optimization method is based on SIMD extensions for ×86 processors. Computational code is profiled and manually optimized with SSE instructions. Heat flow, shock waves and Knudsen pump are modeled with optimized solver. Dependencies of computational time from mesh sizes and CPU capabilities are provided.
Evaluating point-based POMDP solvers on multicore machines.
Shani, Guy
2010-08-01
Recent scaling up of partially observable Markov decision process solvers toward realistic applications is largely due to point-based methods which quickly provide approximate solutions for midsized problems. New multicore machines offer an opportunity to scale up to larger domains. These machines support parallel execution and can speed up existing algorithms considerably. In this paper, we evaluate several ways in which point-based algorithms can be adapted to parallel computing. We overview the challenges and opportunities and present experimental results, providing evidence to the usability of our suggestions. PMID:19914897
A parallel PCG solver for MODFLOW.
Dong, Yanhui; Li, Guomin
2009-01-01
In order to simulate large-scale ground water flow problems more efficiently with MODFLOW, the OpenMP programming paradigm was used to parallelize the preconditioned conjugate-gradient (PCG) solver with in this study. Incremental parallelization, the significant advantage supported by OpenMP on a shared-memory computer, made the solver transit to a parallel program smoothly one block of code at a time. The parallel PCG solver, suitable for both MODFLOW-2000 and MODFLOW-2005, is verified using an 8-processor computer. Both the impact of compilers and different model domain sizes were considered in the numerical experiments. Based on the timing results, execution times using the parallel PCG solver are typically about 1.40 to 5.31 times faster than those using the serial one. In addition, the simulation results are the exact same as the original PCG solver, because the majority of serial codes were not changed. It is worth noting that this parallelizing approach reduces cost in terms of software maintenance because only a single source PCG solver code needs to be maintained in the MODFLOW source tree. PMID:19563427
Finite Element Interface to Linear Solvers
Energy Science and Technology Software Center (ESTSC)
2005-03-18
Sparse systems of linear equations arise in many engineering applications, including finite elements, finite volumes, and others. The solution of linear systems is often the most computationally intensive portion of the application. Depending on the complexity of problems addressed by the application, there may be no single solver capable of solving all of the linear systems that arise. This motivates the desire to switch an application from one solver librwy to another, depending on themore » problem being solved. The interfaces provided by solver libraries differ greatly, making it difficult to switch an application code from one library to another. The amount of library-specific code in an application Can be greatly reduced by having an abstraction layer between solver libraries and the application, putting a common "face" on various solver libraries. One such abstraction layer is the Finite Element Interface to Linear Solvers (EEl), which has seen significant use by finite element applications at Sandia National Laboratories and Lawrence Livermore National Laboratory.« less
PSPIKE: A Parallel Hybrid Sparse Linear System Solver
NASA Astrophysics Data System (ADS)
Manguoglu, Murat; Sameh, Ahmed H.; Schenk, Olaf
The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse linear system solvers. These solvers must optimize parallel performance, processor (serial) performance, as well as memory requirements, while being robust across broad classes of applications and systems. In this paper, we present a new parallel solver that combines the desirable characteristics of direct methods (robustness) and effective iterative solvers (low computational cost), while alleviating their drawbacks (memory requirements, lack of robustness). Our proposed hybrid solver is based on the general sparse solver PARDISO, and the “Spike” family of hybrid solvers. The resulting algorithm, called PSPIKE, is as robust as direct solvers, more reliable than classical preconditioned Krylov subspace methods, and much more scalable than direct sparse solvers. We support our performance and parallel scalability claims using detailed experimental studies and comparison with direct solvers, as well as classical preconditioned Krylov methods.
An advanced implicit solver for MHD
NASA Astrophysics Data System (ADS)
Udrea, Bogdan
A new implicit algorithm has been developed for the solution of the time-dependent, viscous and resistive single fluid magnetohydrodynamic (MHD) equations. The algorithm is based on an approximate Riemann solver for the hyperbolic fluxes and central differencing applied on a staggered grid for the parabolic fluxes. The algorithm employs a locally aligned coordinate system that allows the solution to the Riemann problems to be solved in a natural direction, normal to cell interfaces. The result is an original scheme that is robust and reduces the complexity of the flux formulas. The evaluation of the parabolic fluxes is also implemented using a locally aligned coordinate system, this time on the staggered grid. The implicit formulation employed by WARP3 is a two level scheme that was applied for the first time to the single fluid MHD model. The flux Jacobians that appear in the implicit scheme are evaluated numerically. The linear system that results from the implicit discretization is solved using a robust symmetric Gauss-Seidel method. The code has an explicit mode capability so that implementation and test of new algorithms or new physics can be performed in this simpler mode. Last but not least the code was designed and written to run on parallel computers so that complex, high resolution runs can be per formed in hours rather than days. The code has been benchmarked against analytical and experimental gas dynamics and MHD results. The benchmarks consisted of one-dimensional Riemann problems and diffusion dominated problems, two-dimensional supersonic flow over a wedge, axisymmetric magnetoplasmadynamic (MPD) thruster simulation and three-dimensional supersonic flow over intersecting wedges and spheromak stability simulation. The code has been proven to be robust and the results of the simulations showed excellent agreement with analytical and experimental results. Parallel performance studies showed that the code performs as expected when run on parallel
NASA Astrophysics Data System (ADS)
Lafferty, Nathan; Badreddine, Hassan; Niceno, Bojan; Prasser, Horst-Michael
2015-11-01
A parallelizable flood fill algorithm is developed for identifying and tracking closed regions of fluids, dispersed phases, in CFD simulations of multiphase flows. It is used in conjunction with a newly developed method, corrective interface tracking, for simulating finite size dispersed bubbly flows in which the bubbles are too small relative to the grid to be simulated accurately with interface tracking techniques and too large relative to the grid for Lagrangian particle tracking techniques. The latter situation arising if local bubble induced turbulence is resolved, or modeled with LES. With corrective interface tracking the governing equations are solved on a static Eulerian grid. A correcting force, derived from empirical correlation based hydrodynamic forces, is applied to the bubble which is then advected using interface tracking techniques. This method results in accurate fluid-gas two-way coupling, bubble shapes, and terminal rise velocities. The flood fill algorithm and corrective interface tracking technique are applied to an air/water simulation of multiple bubbles rising and merging with a free surface. They are then validated against the same simulation performed using only interface tracking with a much finer grid.
Ordinary Differential Equation System Solver
Energy Science and Technology Software Center (ESTSC)
1992-03-05
LSODE is a package of subroutines for the numerical solution of the initial value problem for systems of first order ordinary differential equations. The package is suitable for either stiff or nonstiff systems. For stiff systems the Jacobian matrix may be treated in either full or banded form. LSODE can also be used when the Jacobian can be approximated by a band matrix.
NASA Technical Reports Server (NTRS)
Martin, E. D.; Lomax, H.
1977-01-01
Revised and extended versions of a fast, direct (noniterative) numerical Cauchy-Riemann solver are presented for solving finite difference approximations of first order systems of partial differential equations. Although the difference operators treated are linear and elliptic, one significant application of these extended direct Cauchy-Riemann solvers is in the fast, semidirect (iterative) solution of fluid dynamic problems governed by the nonlinear mixed elliptic-hyperbolic equations of transonic flow. Different versions of the algorithms are derived and the corresponding FORTRAN computer programs for a simple example problem are described and listed. The algorithms are demonstrated to be efficient and accurate.
New iterative solvers for the NAG Libraries
Salvini, S.; Shaw, G.
1996-12-31
The purpose of this paper is to introduce the work which has been carried out at NAG Ltd to update the iterative solvers for sparse systems of linear equations, both symmetric and unsymmetric, in the NAG Fortran 77 Library. Our current plans to extend this work and include it in our other numerical libraries in our range are also briefly mentioned. We have added to the Library the new Chapter F11, entirely dedicated to sparse linear algebra. At Mark 17, the F11 Chapter includes sparse iterative solvers, preconditioners, utilities and black-box routines for sparse symmetric (both positive-definite and indefinite) linear systems. Mark 18 will add solvers, preconditioners, utilities and black-boxes for sparse unsymmetric systems: the development of these has already been completed.
Using SPARK as a Solver for Modelica
Wetter, Michael; Wetter, Michael; Haves, Philip; Moshier, Michael A.; Sowell, Edward F.
2008-06-30
Modelica is an object-oriented acausal modeling language that is well positioned to become a de-facto standard for expressing models of complex physical systems. To simulate a model expressed in Modelica, it needs to be translated into executable code. For generating run-time efficient code, such a translation needs to employ algebraic formula manipulations. As the SPARK solver has been shown to be competitive for generating such code but currently cannot be used with the Modelica language, we report in this paper how SPARK's symbolic and numerical algorithms can be implemented in OpenModelica, an open-source implementation of a Modelica modeling and simulation environment. We also report benchmark results that show that for our air flow network simulation benchmark, the SPARK solver is competitive with Dymola, which is believed to provide the best solver for Modelica.
Multigrid in energy preconditioner for Krylov solvers
Slaybaugh, R.N.; Evans, T.M.; Davidson, G.G.; Wilson, P.P.H.
2013-06-01
We have added a new multigrid in energy (MGE) preconditioner to the Denovo discrete-ordinates radiation transport code. This preconditioner takes advantage of a new multilevel parallel decomposition. A multigroup Krylov subspace iterative solver that is decomposed in energy as well as space-angle forms the backbone of the transport solves in Denovo. The space-angle-energy decomposition facilitates scaling to hundreds of thousands of cores. The multigrid in energy preconditioner scales well in the energy dimension and significantly reduces the number of Krylov iterations required for convergence. This preconditioner is well-suited for use with advanced eigenvalue solvers such as Rayleigh Quotient Iteration and Arnoldi.
ODE System Solver W. Krylov Iteration & Rootfinding
Hindmarsh, Alan C.
1991-09-09
LSODKR is a new initial value ODE solver for stiff and nonstiff systems. It is a variant of the LSODPK and LSODE solvers, intended mainly for large stiff systems. The main differences between LSODKR and LSODE are the following: (a) for stiff systems, LSODKR uses a corrector iteration composed of Newton iteration and one of four preconditioned Krylov subspace iteration methods. The user must supply routines for the preconditioning operations, (b) Within the corrector iteration, LSODKR does automatic switching between functional (fixpoint) iteration and modified Newton iteration, (c) LSODKR includes the ability to find roots of given functions of the solution during the integration.
ODE System Solver W. Krylov Iteration & Rootfinding
Energy Science and Technology Software Center (ESTSC)
1991-09-09
LSODKR is a new initial value ODE solver for stiff and nonstiff systems. It is a variant of the LSODPK and LSODE solvers, intended mainly for large stiff systems. The main differences between LSODKR and LSODE are the following: (a) for stiff systems, LSODKR uses a corrector iteration composed of Newton iteration and one of four preconditioned Krylov subspace iteration methods. The user must supply routines for the preconditioning operations, (b) Within the corrector iteration,more » LSODKR does automatic switching between functional (fixpoint) iteration and modified Newton iteration, (c) LSODKR includes the ability to find roots of given functions of the solution during the integration.« less
Steady potential solver for unsteady aerodynamic analyses
NASA Technical Reports Server (NTRS)
Hoyniak, Dan
1994-01-01
Development of a steady flow solver for use with LINFLO was the objective of this report. The solver must be compatible with LINFLO, be composed of composite mesh, and have transonic capability. The approaches used were: (1) steady flow potential equations written in nonconservative form; (2) Newton's Method; (3) implicit, least-squares, interpolation method to obtain finite difference equations; and (4) matrix inversion routines from LINFLO. This report was given during the NASA LeRC Workshop on Forced Response in Turbomachinery in August of 1993.
Wave Speeds, Riemann Solvers and Artificial Viscosity
Rider, W.J.
1999-07-18
A common perspective on the numerical solution of the equation Euler equations for shock physics is examined. The common viewpoint is based upon the selection of nonlinear wavespeeds upon which the dissipation (implicit or explicit) is founded. This perspective shows commonality between Riemann solver based method (i.e. Godunov-type) and artificial viscosity (i.e. von Neumann-Richtmyer). As an example we derive an improved nonlinear viscous stabilization of a Richtmyer-Lax-Wendroff method. Additionally, we will define a form of classical artificial viscosity based upon the HLL Riemann solver.
Code Verification of the HIGRAD Computational Fluid Dynamics Solver
Van Buren, Kendra L.; Canfield, Jesse M.; Hemez, Francois M.; Sauer, Jeremy A.
2012-05-04
The purpose of this report is to outline code and solution verification activities applied to HIGRAD, a Computational Fluid Dynamics (CFD) solver of the compressible Navier-Stokes equations developed at the Los Alamos National Laboratory, and used to simulate various phenomena such as the propagation of wildfires and atmospheric hydrodynamics. Code verification efforts, as described in this report, are an important first step to establish the credibility of numerical simulations. They provide evidence that the mathematical formulation is properly implemented without significant mistakes that would adversely impact the application of interest. Highly accurate analytical solutions are derived for four code verification test problems that exercise different aspects of the code. These test problems are referred to as: (i) the quiet start, (ii) the passive advection, (iii) the passive diffusion, and (iv) the piston-like problem. These problems are simulated using HIGRAD with different levels of mesh discretization and the numerical solutions are compared to their analytical counterparts. In addition, the rates of convergence are estimated to verify the numerical performance of the solver. The first three test problems produce numerical approximations as expected. The fourth test problem (piston-like) indicates the extent to which the code is able to simulate a 'mild' discontinuity, which is a condition that would typically be better handled by a Lagrangian formulation. The current investigation concludes that the numerical implementation of the solver performs as expected. The quality of solutions is sufficient to provide credible simulations of fluid flows around wind turbines. The main caveat associated to these findings is the low coverage provided by these four problems, and somewhat limited verification activities. A more comprehensive evaluation of HIGRAD may be beneficial for future studies.
Newton-Raphson preconditioner for Krylov type solvers on GPU devices.
Kushida, Noriyuki
2016-01-01
A new Newton-Raphson method based preconditioner for Krylov type linear equation solvers for GPGPU is developed, and the performance is investigated. Conventional preconditioners improve the convergence of Krylov type solvers, and perform well on CPUs. However, they do not perform well on GPGPUs, because of the complexity of implementing powerful preconditioners. The developed preconditioner is based on the BFGS Hessian matrix approximation technique, which is well known as a robust and fast nonlinear equation solver. Because the Hessian matrix in the BFGS represents the coefficient matrix of a system of linear equations in some sense, the approximated Hessian matrix can be a preconditioner. On the other hand, BFGS is required to store dense matrices and to invert them, which should be avoided on modern computers and supercomputers. To overcome these disadvantages, we therefore introduce a limited memory BFGS, which requires less memory space and less computational effort than the BFGS. In addition, a limited memory BFGS can be implemented with BLAS libraries, which are well optimized for target architectures. There are advantages and disadvantages to the Hessian matrix approximation becoming better as the Krylov solver iteration continues. The preconditioning matrix varies through Krylov solver iterations, and only flexible Krylov solvers can work well with the developed preconditioner. The GCR method, which is a flexible Krylov solver, is employed because of the prevalence of GCR as a Krylov solver with a variable preconditioner. As a result of the performance investigation, the new preconditioner indicates the following benefits: (1) The new preconditioner is robust; i.e., it converges while conventional preconditioners (the diagonal scaling, and the SSOR preconditioners) fail. (2) In the best case scenarios, it is over 10 times faster than conventional preconditioners on a CPU. (3) Because it requries only simple operations, it performs well on a GPGPU. In
Frequency Domain Modelling by a Direct-Iterative Solver: A Space and Wavelet Approach
NASA Astrophysics Data System (ADS)
Hustedt, B.; Operto, S.; Virieux, J.
2002-12-01
Seismic forward modelling of wave propagation phenomena in complex rheologic media using a frequency domain finite-difference (FDFD) technique is of special interest for multisource experiments and waveform inversion schemes, because the complete wavefield solution can be computed in a fast and efficient way. FDFD modelling requires the inversion of an extremely large matrix-equation A x x = b, by either a direct or an iterative solver. The direct solver computes an effective inverse of A, called LU factorization. The main handicap is additional computer memory required for storing matrix fill-in coefficients, that are created during the factorization process. Iterative solvers are not limited by memory constraints (additional coefficients), but the convergence depends on a good initial solution difficult to guess before hand. For both solvers, available computer resources has limited wide-spread FDFD modelling applications to mainly two-dimensional (2D) and rarely three-dimensional (3D) problems. In order to overcome these limits, we propose the combination of a direct solver and an iterative solver, called Direct-Iterative Solver (DIS). The direct solver is used to compute an exact wavefield solution on a coarse discretized grid. We use a multifrontal decomposition technique. The coarse-grid size is determined preliminary by limits of the available computer resources, rather than by the wave simulation problem. We project the exact coarse-grid solution on a fine-grid, and use it as an initial solution for an iterative solver, which convergences to an acceptable approximation of the desired fine-grid solution. Two different DIS schemes have been implemented and tested for numerical accuracy and computational performance. The first approach, called the Direct-Iterative-Space Solver (DISS), projects the coarse-grid solution on the fine-grid by a bilinear interpolation. Though the interpolated solution nicely approximates the desired fine-grid solution, still for
Implicit solvers for unstructured meshes
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.; Mavriplis, Dimitri J.
1991-01-01
Implicit methods were developed and tested for unstructured mesh computations. The approximate system which arises from the Newton linearization of the nonlinear evolution operator is solved by using the preconditioned GMRES (Generalized Minimum Residual) technique. Three different preconditioners were studied, namely, the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over relaxation (SSOR). The preconditioners were optimized to have good vectorization properties. SSOR and ILU were also studied as iterative schemes. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also studied. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
CASTRO: A NEW COMPRESSIBLE ASTROPHYSICAL SOLVER. III. MULTIGROUP RADIATION HYDRODYNAMICS
Zhang, W.; Almgren, A.; Bell, J.; Howell, L.; Burrows, A.; Dolence, J.
2013-01-15
We present a formulation for multigroup radiation hydrodynamics that is correct to order O(v/c) using the comoving-frame approach and the flux-limited diffusion approximation. We describe a numerical algorithm for solving the system, implemented in the compressible astrophysics code, CASTRO. CASTRO uses a Eulerian grid with block-structured adaptive mesh refinement based on a nested hierarchy of logically rectangular variable-sized grids with simultaneous refinement in both space and time. In our multigroup radiation solver, the system is split into three parts: one part that couples the radiation and fluid in a hyperbolic subsystem, another part that advects the radiation in frequency space, and a parabolic part that evolves radiation diffusion and source-sink terms. The hyperbolic subsystem and the frequency space advection are solved explicitly with high-order Godunov schemes, whereas the parabolic part is solved implicitly with a first-order backward Euler method. Our multigroup radiation solver works for both neutrino and photon radiation.
CASTRO: A New Compressible Astrophysical Solver. III. Multigroup Radiation Hydrodynamics
NASA Astrophysics Data System (ADS)
Zhang, W.; Howell, L.; Almgren, A.; Burrows, A.; Dolence, J.; Bell, J.
2013-01-01
We present a formulation for multigroup radiation hydrodynamics that is correct to order O(v/c) using the comoving-frame approach and the flux-limited diffusion approximation. We describe a numerical algorithm for solving the system, implemented in the compressible astrophysics code, CASTRO. CASTRO uses a Eulerian grid with block-structured adaptive mesh refinement based on a nested hierarchy of logically rectangular variable-sized grids with simultaneous refinement in both space and time. In our multigroup radiation solver, the system is split into three parts: one part that couples the radiation and fluid in a hyperbolic subsystem, another part that advects the radiation in frequency space, and a parabolic part that evolves radiation diffusion and source-sink terms. The hyperbolic subsystem and the frequency space advection are solved explicitly with high-order Godunov schemes, whereas the parabolic part is solved implicitly with a first-order backward Euler method. Our multigroup radiation solver works for both neutrino and photon radiation.
NASA Astrophysics Data System (ADS)
Jia, Jingfei; Kim, Hyun K.; Hielscher, Andreas H.
2015-12-01
It is well known that radiative transfer equation (RTE) provides more accurate tomographic results than its diffusion approximation (DA). However, RTE-based tomographic reconstruction codes have limited applicability in practice due to their high computational cost. In this article, we propose a new efficient method for solving the RTE forward problem with multiple light sources in an all-at-once manner instead of solving it for each source separately. To this end, we introduce here a novel linear solver called block biconjugate gradient stabilized method (block BiCGStab) that makes full use of the shared information between different right hand sides to accelerate solution convergence. Two parallelized block BiCGStab methods are proposed for additional acceleration under limited threads situation. We evaluate the performance of this algorithm with numerical simulation studies involving the Delta-Eddington approximation to the scattering phase function. The results show that the single threading block RTE solver proposed here reduces computation time by a factor of 1.5-3 as compared to the traditional sequential solution method and the parallel block solver by a factor of 1.5 as compared to the traditional parallel sequential method. This block linear solver is, moreover, independent of discretization schemes and preconditioners used; thus further acceleration and higher accuracy can be expected when combined with other existing discretization schemes or preconditioners.
Perturbative forward solver software for small localized fluorophores in tissue
Martelli, F.; Bianco, S. Del; Di Ninni, P.
2011-01-01
In this paper a forward solver software for the time domain and the CW domain based on the Born approximation for simulating the effect of small localized fluorophores embedded in a non-fluorescent biological tissue is proposed. The fluorescence emission is treated with a mathematical model that describes the migration of photons from the source to the fluorophore and of emitted fluorescent photons from the fluorophore to the detector for all those geometries for which Green’s functions are available. Subroutines written in FORTRAN that can be used for calculating the fluorescent signal for the infinite medium and for the slab are provided with a linked file. With these subroutines, quantities such as reflectance, transmittance, and fluence rate can be calculated. PMID:22254165
Perturbative forward solver software for small localized fluorophores in tissue.
Martelli, F; Del Bianco, S; Di Ninni, P
2012-01-01
In this paper a forward solver software for the time domain and the CW domain based on the Born approximation for simulating the effect of small localized fluorophores embedded in a non-fluorescent biological tissue is proposed. The fluorescence emission is treated with a mathematical model that describes the migration of photons from the source to the fluorophore and of emitted fluorescent photons from the fluorophore to the detector for all those geometries for which Green's functions are available. Subroutines written in FORTRAN that can be used for calculating the fluorescent signal for the infinite medium and for the slab are provided with a linked file. With these subroutines, quantities such as reflectance, transmittance, and fluence rate can be calculated. PMID:22254165
Aleph Field Solver Challenge Problem Results Summary.
Hooper, Russell; Moore, Stan Gerald
2015-01-01
Aleph models continuum electrostatic and steady and transient thermal fields using a finite-element method. Much work has gone into expanding the core solver capability to support enriched mod- eling consisting of multiple interacting fields, special boundary conditions and two-way interfacial coupling with particles modeled using Aleph's complementary particle-in-cell capability. This report provides quantitative evidence for correct implementation of Aleph's field solver via order- of-convergence assessments on a collection of problems of increasing complexity. It is intended to provide Aleph with a pedigree and to establish a basis for confidence in results for more challeng- ing problems important to Sandia's mission that Aleph was specifically designed to address.
Verifying a Local Generic Solver in Coq
NASA Astrophysics Data System (ADS)
Hofmann, Martin; Karbyshev, Aleksandr; Seidl, Helmut
Fixpoint engines are the core components of program analysis tools and compilers. If these tools are to be trusted, special attention should be paid also to the correctness of such solvers. In this paper we consider the local generic fixpoint solver RLD which can be applied to constraint systems {x}sqsupseteq fx,{x}in V, over some lattice {D} where the right-hand sides f x are given as arbitrary functions implemented in some specification language. The verification of this algorithm is challenging, because it uses higher-order functions and relies on side effects to track variable dependences as they are encountered dynamically during fixpoint iterations. Here, we present a correctness proof of this algorithm which has been formalized by means of the interactive proof assistant Coq.
Domain decomposition for the SPN solver MINOS
Jamelot, Erell; Baudron, Anne-Marie; Lautard, Jean-Jacques
2012-07-01
In this article we present a domain decomposition method for the mixed SPN equations, discretized with Raviart-Thomas-Nedelec finite elements. This domain decomposition is based on the iterative Schwarz algorithm with Robin interface conditions to handle communications. After having described this method, we give details on how to optimize the convergence. Finally, we give some numerical results computed in a realistic 3D domain. The computations are done with the MINOS solver of the APOLLO3 (R) code. (authors)
A perspective on unstructured grid flow solvers
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.
1995-01-01
This survey paper assesses the status of compressible Euler and Navier-Stokes solvers on unstructured grids. Different spatial and temporal discretization options for steady and unsteady flows are discussed. The integration of these components into an overall framework to solve practical problems is addressed. Issues such as grid adaptation, higher order methods, hybrid discretizations and parallel computing are briefly discussed. Finally, some outstanding issues and future research directions are presented.
User documentation for PVODE, an ODE solver for parallel computers
Hindmarsh, A.C., LLNL
1998-05-01
PVODE is a general purpose ordinary differential equation (ODE) solver for stiff and nonstiff ODES It is based on CVODE [5] [6], which is written in ANSI- standard C PVODE uses MPI (Message-Passing Interface) [8] and a revised version of the vector module in CVODE to achieve parallelism and portability PVODE is intended for the SPMD (Single Program Multiple Data) environment with distributed memory, in which all vectors are identically distributed across processors In particular, the vector module is designed to help the user assign a contiguous segment of a given vector to each of the processors for parallel computation The idea is for each processor to solve a certain fixed subset of the ODES To better understand PVODE, we first need to understand CVODE and its historical background The ODE solver CVODE, which was written by Cohen and Hindmarsh, combines features of two earlier Fortran codes, VODE [l] and VODPK [3] Those two codes were written by Brown, Byrne, and Hindmarsh. Both use variable-coefficient multi-step integration methods, and address both stiff and nonstiff systems (Stiffness is defined as the presence of one or more very small damping time constants ) VODE uses direct linear algebraic techniques to solve the underlying banded or dense linear systems of equations in conjunction with a modified Newton method in the stiff ODE case On the other hand, VODPK uses a preconditioned Krylov iterative method [2] to solve the underlying linear system User-supplied preconditioners directly address the dominant source of stiffness Consequently, CVODE implements both the direct and iterative methods Currently, with regard to the nonlinear and linear system solution, PVODE has three method options available. functional iteration, Newton iteration with a diagonal approximate Jacobian, and Newton iteration with the iterative method SPGMR (Scaled Preconditioned Generalized Minimal Residual method) Both CVODE and PVODE are written in such a way that other linear
Galerkin CFD solvers for use in a multi-disciplinary suite for modeling advanced flight vehicles
NASA Astrophysics Data System (ADS)
Moffitt, Nicholas J.
This work extends existing Galerkin CFD solvers for use in a multi-disciplinary suite. The suite is proposed as a means of modeling advanced flight vehicles, which exhibit strong coupling between aerodynamics, structural dynamics, controls, rigid body motion, propulsion, and heat transfer. Such applications include aeroelastics, aeroacoustics, stability and control, and other highly coupled applications. The suite uses NASA STARS for modeling structural dynamics and heat transfer. Aerodynamics, propulsion, and rigid body dynamics are modeled in one of the five CFD solvers below. Euler2D and Euler3D are Galerkin CFD solvers created at OSU by Cowan (2003). These solvers are capable of modeling compressible inviscid aerodynamics with modal elastics and rigid body motion. This work reorganized these solvers to improve efficiency during editing and at run time. Simple and efficient propulsion models were added, including rocket, turbojet, and scramjet engines. Viscous terms were added to the previous solvers to create NS2D and NS3D. The viscous contributions were demonstrated in the inertial and non-inertial frames. Variable viscosity (Sutherland's equation) and heat transfer boundary conditions were added to both solvers but not verified in this work. Two turbulence models were implemented in NS2D and NS3D: Spalart-Allmarus (SA) model of Deck, et al. (2002) and Menter's SST model (1994). A rotation correction term (Shur, et al., 2000) was added to the production of turbulence. Local time stepping and artificial dissipation were adapted to each model. CFDsol is a Taylor-Galerkin solver with an SA turbulence model. This work improved the time accuracy, far field stability, viscous terms, Sutherland?s equation, and SA model with NS3D as a guideline and added the propulsion models from Euler3D to CFDsol. Simple geometries were demonstrated to utilize current meshing and processing capabilities. Air-breathing hypersonic flight vehicles (AHFVs) represent the ultimate
Guerin, P.; Baudron, A. M.; Lautard, J. J.
2006-07-01
This paper describes a new technique for determining the pin power in heterogeneous core calculations. It is based on a domain decomposition with overlapping sub-domains and a component mode synthesis technique for the global flux determination. Local basis functions are used to span a discrete space that allows fundamental global mode approximation through a Galerkin technique. Two approaches are given to obtain these local basis functions: in the first one (Component Mode Synthesis method), the first few spatial eigenfunctions are computed on each sub-domain, using periodic boundary conditions. In the second one (Factorized Component Mode Synthesis method), only the fundamental mode is computed, and we use a factorization principle for the flux in order to replace the higher order Eigenmodes. These different local spatial functions are extended to the global domain by defining them as zero outside the sub-domain. These methods are well-fitted for heterogeneous core calculations because the spatial interface modes are taken into account in the domain decomposition. Although these methods could be applied to higher order angular approximations - particularly easily to a SPN approximation - the numerical results we provide are obtained using a diffusion model. We show the methods' accuracy for reactor cores loaded with UOX and MOX assemblies, for which standard reconstruction techniques are known to perform poorly. Furthermore, we show that our methods are highly and easily parallelizable. (authors)
Lee, S. L.; Hovland, P. D.
2000-11-01
PVODE is a high-performance ordinary differential equation solver for the types of initial value problems (IVPs) that arise in large-scale computational simulations. Often, one wants to compute sensitivities with respect to certain parameters in the IVP. We discuss the use of automatic differentiation (AD) to compute these sensitivities in the context of PVODE. Results on a simple test problem indicate that the use of AD-generated derivative code can reduce the time to solution over finite difference approximations.
Lee, S L; Hovland, P D
2000-09-15
PVODE is a high-performance ordinary differential equation solver for the types of initial value problems (IVPs) that arise in large-scale computational simulations. often, one wants to compute sensitivities with respect to certain parameters in the IVP. They discuss the use of automatic differentiation (AD) to compute these sensitivities in the context of PVODE. Results on a simple test problem indicate that the use of AD-generated derivative code can reduce the time to solution over finite difference approximations.
High Energy Boundary Conditions for a Cartesian Mesh Euler Solver
NASA Technical Reports Server (NTRS)
Pandya, Shishir; Murman, Scott; Aftosmis, Michael
2003-01-01
Inlets and exhaust nozzles are common place in the world of flight. Yet, many aerodynamic simulation packages do not provide a method of modelling such high energy boundaries in the flow field. For the purposes of aerodynamic simulation, inlets and exhausts are often fared over and it is assumed that the flow differences resulting from this assumption are minimal. While this is an adequate assumption for the prediction of lift, the lack of a plume behind the aircraft creates an evacuated base region thus effecting both drag and pitching moment values. In addition, the flow in the base region is often mis-predicted resulting in incorrect base drag. In order to accurately predict these quantities, a method for specifying inlet and exhaust conditions needs to be available in aerodynamic simulation packages. A method for a first approximation of a plume without accounting for chemical reactions is added to the Cartesian mesh based aerodynamic simulation package CART3D. The method consists of 3 steps. In the first step, a components approach where each triangle is assigned a component number is used. Here, a method for marking the inlet or exhaust plane triangles as separate components is discussed. In step two, the flow solver is modified to accept a reference state for the components marked inlet or exhaust. In the third step, the flow solver uses these separated components and the reference state to compute the correct flow condition at that triangle. The present method is implemented in the CART3D package which consists of a set of tools for generating a Cartesian volume mesh from a set of component triangulations. The Euler equations are solved on the resulting unstructured Cartesian mesh. The present methods is implemented in this package and its usefulness is demonstrated with two validation cases. A generic missile body is also presented to show the usefulness of the method on a real world geometry.
Domain decomposed preconditioners with Krylov subspace methods as subdomain solvers
Pernice, M.
1994-12-31
Domain decomposed preconditioners for nonsymmetric partial differential equations typically require the solution of problems on the subdomains. Most implementations employ exact solvers to obtain these solutions. Consequently work and storage requirements for the subdomain problems grow rapidly with the size of the subdomain problems. Subdomain solves constitute the single largest computational cost of a domain decomposed preconditioner, and improving the efficiency of this phase of the computation will have a significant impact on the performance of the overall method. The small local memory available on the nodes of most message-passing multicomputers motivates consideration of the use of an iterative method for solving subdomain problems. For large-scale systems of equations that are derived from three-dimensional problems, memory considerations alone may dictate the need for using iterative methods for the subdomain problems. In addition to reduced storage requirements, use of an iterative solver on the subdomains allows flexibility in specifying the accuracy of the subdomain solutions. Substantial savings in solution time is possible if the quality of the domain decomposed preconditioner is not degraded too much by relaxing the accuracy of the subdomain solutions. While some work in this direction has been conducted for symmetric problems, similar studies for nonsymmetric problems appear not to have been pursued. This work represents a first step in this direction, and explores the effectiveness of performing subdomain solves using several transpose-free Krylov subspace methods, GMRES, transpose-free QMR, CGS, and a smoothed version of CGS. Depending on the difficulty of the subdomain problem and the convergence tolerance used, a reduction in solution time is possible in addition to the reduced memory requirements. The domain decomposed preconditioner is a Schur complement method in which the interface operators are approximated using interface probing.
New Multigrid Solver Advances in TOPS
Falgout, R D; Brannick, J; Brezina, M; Manteuffel, T; McCormick, S
2005-06-27
In this paper, we highlight new multigrid solver advances in the Terascale Optimal PDE Simulations (TOPS) project in the Scientific Discovery Through Advanced Computing (SciDAC) program. We discuss two new algebraic multigrid (AMG) developments in TOPS: the adaptive smoothed aggregation method ({alpha}SA) and a coarse-grid selection algorithm based on compatible relaxation (CR). The {alpha}SA method is showing promising results in initial studies for Quantum Chromodynamics (QCD) applications. The CR method has the potential to greatly improve the applicability of AMG.
DPS--a computerised diagnostic problem solver.
Bartos, P; Gyárfas, F; Popper, M
1982-01-01
The paper contains a short description of the DPS system which is a computerized diagnostic problem solver. The system is under development of the Research Institute of Medical Bionics in Bratislava, Czechoslovakia. Its underlying philosophy yields from viewing the diagnostic process as process of cognitive problem solving. The implementation of the system is based on the methods of Artificial Intelligence and utilisation of production systems and frame theory should be noted in this context. Finally a list of program modules and their characterisation is presented. PMID:6811229
Updates to the NEQAIR Radiation Solver
NASA Technical Reports Server (NTRS)
Cruden, Brett A.; Brandis, Aaron M.
2014-01-01
The NEQAIR code is one of the original heritage solvers for radiative heating prediction in aerothermal environments, and is still used today for mission design purposes. This paper discusses the implementation of the first major revision to the NEQAIR code in the last five years, NEQAIR v14.0. The most notable features of NEQAIR v14.0 are the parallelization of the radiation computation, reducing runtimes by about 30×, and the inclusion of mid-wave CO2 infrared radiation.
Input-output-controlled nonlinear equation solvers
NASA Technical Reports Server (NTRS)
Padovan, Joseph
1988-01-01
To upgrade the efficiency and stability of the successive substitution (SS) and Newton-Raphson (NR) schemes, the concept of input-output-controlled solvers (IOCS) is introduced. By employing the formal properties of the constrained version of the SS and NR schemes, the IOCS algorithm can handle indefiniteness of the system Jacobian, can maintain iterate monotonicity, and provide for separate control of load incrementation and iterate excursions, as well as having other features. To illustrate the algorithmic properties, the results for several benchmark examples are presented. These define the associated numerical efficiency and stability of the IOCS.
Comparison of electromagnetic solvers for antennas mounted on vehicles
NASA Astrophysics Data System (ADS)
Mocker, M. S. L.; Hipp, S.; Spinnler, F.; Tazi, H.; Eibert, T. F.
2015-11-01
An electromagnetic solver comparison for various use cases of antennas mounted on vehicles is presented. For this purpose, several modeling approaches, called transient, frequency and integral solver, including the features fast resonant method and autoregressive filter, offered by CST MWS, are investigated. The solvers and methods are compared for a roof antenna itself, a simplified vehicle, a roof including a panorama window and a combination of antenna and vehicle. With these examples, the influence of different materials, data formats and parameters such as size and complexity are investigated. Also, the necessary configurations for the mesh and the solvers are described.
A HLL-Rankine-Hugoniot Riemann solver for complex non-linear hyperbolic problems
NASA Astrophysics Data System (ADS)
Guy, Capdeville
2013-10-01
We present a new HLL-type approximate Riemann solver that aims at capturing any isolated discontinuity without necessitating extensive characteristic analysis of governing partial differential equations. This property is especially attractive for complex hyperbolic systems with more than two equations. Following Linde's (2002) approach [6], we introduce a generic middle wave into the classical two-state HLL solver. The property of this third wave is typified by the way of a "strength indicator" that is derived from polynomial considerations. The polynomial that constitutes the basis of the procedure is made non-oscillatory by an adapted fourth-order WENO algorithm (CWENO4). This algorithm makes it possible to derive an expression for the strength indicator. According to the size of this latter parameter, the resulting solver (HLL-RH), either computes the multi-dimensional Rankine-Hugoniot equations if an isolated discontinuity appears in the Riemann fan, or asymptotically tends towards the two-state HLL solver if the solution is locally smooth. The asymptotic version of the HLL-RH solver is demonstrated to be positively conservative and entropy satisfying in its first-order multi-dimensional form provided that a relevant and not too restrictive CFL condition is considered; specific limitations of the conservative increments of the numerical solution and a suited entropy condition enable to maintain these properties in its high-order version. With a monotonicity-preserving algorithm for the time integration, the numerical method so generated, is third order in time and fourth-order accurate in space for the smooth part of the solution; moreover, the scheme is stable and accurate when capturing a shock wave, whatever the complexity of the underlying differential system. Extensive numerical tests for the one- and two-dimensional Euler equation of gas dynamics and comparisons with classical Godunov-type methods help to point out the potentialities and insufficiencies
Algebraic Multiscale Solver for Elastic Geomechanical Deformation
NASA Astrophysics Data System (ADS)
Castelletto, N.; Hajibeygi, H.; Tchelepi, H.
2015-12-01
Predicting the geomechanical response of geological formations to thermal, pressure, and mechanical loading is important in many engineering applications. The mathematical formulation that describes deformation of a reservoir coupled with flow and transport entails heterogeneous coefficients with a wide range of length scales. Such detailed heterogeneous descriptions of reservoir properties impose severe computational challenges for the study of realistic-scale (km) reservoirs. To deal with these challenges, we developed an Algebraic Multiscale Solver for ELastic geomechanical deformation (EL-AMS). Constructed on finite element fine-scale system, EL-AMS imposes a coarse-scale grid, which is a non-overlapping decomposition of the domain. Then, local (coarse) basis functions for the displacement vector are introduced. These basis functions honor the elastic properties of the local domains subject to the imposed local boundary conditions. The basis form the Restriction and Prolongation operators. These operators allow for the construction of accurate coarse-scale systems for the displacement. While the multiscale system is efficient for resolving low-frequency errors, coupling it with a fine-scale smoother, e.g., ILU(0), leads to an efficient iterative solver. Numerical results for several test cases illustrate that EL-AMS is quite efficient and applicable to simulate elastic deformation of large-scale heterogeneous reservoirs.
Using the scalable nonlinear equations solvers package
Gropp, W.D.; McInnes, L.C.; Smith, B.F.
1995-02-01
SNES (Scalable Nonlinear Equations Solvers) is a software package for the numerical solution of large-scale systems of nonlinear equations on both uniprocessors and parallel architectures. SNES also contains a component for the solution of unconstrained minimization problems, called SUMS (Scalable Unconstrained Minimization Solvers). Newton-like methods, which are known for their efficiency and robustness, constitute the core of the package. As part of the multilevel PETSc library, SNES incorporates many features and options from other parts of PETSc. In keeping with the spirit of the PETSc library, the nonlinear solution routines are data-structure-neutral, making them flexible and easily extensible. This users guide contains a detailed description of uniprocessor usage of SNES, with some added comments regarding multiprocessor usage. At this time the parallel version is undergoing refinement and extension, as we work toward a common interface for the uniprocessor and parallel cases. Thus, forthcoming versions of the software will contain additional features, and changes to parallel interface may result at any time. The new parallel version will employ the MPI (Message Passing Interface) standard for interprocessor communication. Since most of these details will be hidden, users will need to perform only minimal message-passing programming.
On code verification of RANS solvers
NASA Astrophysics Data System (ADS)
Eça, L.; Klaij, C. M.; Vaz, G.; Hoekstra, M.; Pereira, F. S.
2016-04-01
This article discusses Code Verification of Reynolds-Averaged Navier Stokes (RANS) solvers that rely on face based finite volume discretizations for volumes of arbitrary shape. The study includes test cases with known analytical solutions (generated with the method of manufactured solutions) corresponding to laminar and turbulent flow, with the latter using eddy-viscosity turbulence models. The procedure to perform Code Verification based on grid refinement studies is discussed and the requirements for its correct application are illustrated in a simple one-dimensional problem. It is shown that geometrically similar grids are recommended for proper Code Verification and so the data should not have scatter making the use of least square fits unnecessary. Results show that it may be advantageous to determine the extrapolated error to cell size/time step zero instead of assuming that it is zero, especially when it is hard to determine the asymptotic order of grid convergence. In the RANS examples, several of the features of the ReFRESCO solver are checked including the effects of the available turbulence models in the convergence properties of the code. It is shown that it is required to account for non-orthogonality effects in the discretization of the diffusion terms and that the turbulence quantities transport equations can deteriorate the order of grid convergence of mean flow quantities.
Two-Dimensional Ffowcs Williams/Hawkings Equation Solver
NASA Technical Reports Server (NTRS)
Lockard, David P.
2005-01-01
FWH2D is a Fortran 90 computer program that solves a two-dimensional (2D) version of the equation, derived by J. E. Ffowcs Williams and D. L. Hawkings, for sound generated by turbulent flow. FWH2D was developed especially for estimating noise generated by airflows around such approximately 2D airframe components as slats. The user provides input data on fluctuations of pressure, density, and velocity on some surface. These data are combined with information about the geometry of the surface to calculate histories of thickness and loading terms. These histories are fast-Fourier-transformed into the frequency domain. For each frequency of interest and each observer position specified by the user, kernel functions are integrated over the surface by use of the trapezoidal rule to calculate a pressure signal. The resulting frequency-domain signals are inverse-fast-Fourier-transformed back into the time domain. The output of the code consists of the time- and frequency-domain representations of the pressure signals at the observer positions. Because of its approximate nature, FWH2D overpredicts the noise from a finite-length (3D) component. The advantage of FWH2D is that it requires a fraction of the computation time of a 3D Ffowcs Williams/Hawkings solver.
NASA Astrophysics Data System (ADS)
Vincenti, H.; Vay, J.-L.
2016-03-01
Very high order or pseudo-spectral Maxwell solvers are the method of choice to reduce discretization effects (e.g. numerical dispersion) that are inherent to low order Finite-Difference Time-Domain (FDTD) schemes. However, due to their large stencils, these solvers are often subject to truncation errors in many electromagnetic simulations. These truncation errors come from non-physical modifications of Maxwell's equations in space that may generate spurious signals affecting the overall accuracy of the simulation results. Such modifications for instance occur when Perfectly Matched Layers (PMLs) are used at simulation domain boundaries to simulate open media. Another example is the use of arbitrary order Maxwell solver with domain decomposition technique that may under some condition involve stencil truncations at subdomain boundaries, resulting in small spurious errors that do eventually build up. In each case, a careful evaluation of the characteristics and magnitude of the errors resulting from these approximations, and their impact at any frequency and angle, requires detailed analytical and numerical studies. To this end, we present a general analytical approach that enables the evaluation of numerical errors of fully three-dimensional arbitrary order finite-difference Maxwell solver, with arbitrary modification of the local stencil in the simulation domain. The analytical model is validated against simulations of domain decomposition technique and PMLs, when these are used with very high-order Maxwell solver, as well as in the infinite order limit of pseudo-spectral solvers. Results confirm that the new analytical approach enables exact predictions in each case. It also confirms that the domain decomposition technique can be used with very high-order Maxwell solvers and a reasonably low number of guard cells with negligible effects on the whole accuracy of the simulation.
A matrix-form GSM-CFD solver for incompressible fluids and its application to hemodynamics
NASA Astrophysics Data System (ADS)
Yao, Jianyao; Liu, G. R.
2014-10-01
A GSM-CFD solver for incompressible flows is developed based on the gradient smoothing method (GSM). A matrix-form algorithm and corresponding data structure for GSM are devised to efficiently approximate the spatial gradients of field variables using the gradient smoothing operation. The calculated gradient values on various test fields show that the proposed GSM is capable of exactly reproducing linear field and of second order accuracy on all kinds of meshes. It is found that the GSM is much more robust to mesh deformation and therefore more suitable for problems with complicated geometries. Integrated with the artificial compressibility approach, the GSM is extended to solve the incompressible flows. As an example, the flow simulation of carotid bifurcation is carried out to show the effectiveness of the proposed GSM-CFD solver. The blood is modeled as incompressible Newtonian fluid and the vessel is treated as rigid wall in this paper.
jShyLU Scalable Hybrid Preconditioner and Solver
Energy Science and Technology Software Center (ESTSC)
2012-09-11
ShyLU is numerical software to solve sparse linear systems of equations. ShyLU uses a hybrid direct-iterative Schur complement method, and may be used either as a preconditioner or as a solver. ShyLU is parallel and optimized for a single compute Solver node. ShyLU will be a package in the Trilinos software framework.
Experiences with linear solvers for oil reservoir simulation problems
Joubert, W.; Janardhan, R.; Biswas, D.; Carey, G.
1996-12-31
This talk will focus on practical experiences with iterative linear solver algorithms used in conjunction with Amoco Production Company`s Falcon oil reservoir simulation code. The goal of this study is to determine the best linear solver algorithms for these types of problems. The results of numerical experiments will be presented.
Shape reanalysis and sensitivities utilizing preconditioned iterative boundary solvers
NASA Technical Reports Server (NTRS)
Guru Prasad, K.; Kane, J. H.
1992-01-01
The computational advantages associated with the utilization of preconditined iterative equation solvers are quantified for the reanalysis of perturbed shapes using continuum structural boundary element analysis (BEA). Both single- and multi-zone three-dimensional problems are examined. Significant reductions in computer time are obtained by making use of previously computed solution vectors and preconditioners in subsequent analyses. The effectiveness of this technique is demonstrated for the computation of shape response sensitivities required in shape optimization. Computer times and accuracies achieved using the preconditioned iterative solvers are compared with those obtained via direct solvers and implicit differentiation of the boundary integral equations. It is concluded that this approach employing preconditioned iterative equation solvers in reanalysis and sensitivity analysis can be competitive with if not superior to those involving direct solvers.
A multigrid fluid pressure solver handling separating solid boundary conditions.
Chentanez, Nuttapong; Müller-Fischer, Matthias
2012-08-01
We present a multigrid method for solving the linear complementarity problem (LCP) resulting from discretizing the Poisson equation subject to separating solid boundary conditions in an Eulerian liquid simulation’s pressure projection step. The method requires only a few small changes to a multigrid solver for linear systems. Our generalized solver is fast enough to handle 3D liquid simulations with separating boundary conditions in practical domain sizes. Previous methods could only handle relatively small 2D domains in reasonable time, because they used expensive quadratic programming (QP) solvers. We demonstrate our technique in several practical scenarios, including nonaxis-aligned containers and moving solids in which the omission of separating boundary conditions results in disturbing artifacts of liquid sticking to solids. Our measurements show, that the convergence rate of our LCP solver is close to that of a standard multigrid solver. PMID:22411885
A real-time impurity solver for DMFT
NASA Astrophysics Data System (ADS)
Kim, Hyungwon; Aron, Camille; Han, Jong E.; Kotliar, Gabriel
Dynamical mean-field theory (DMFT) offers a non-perturbative approach to problems with strongly correlated electrons. The method heavily relies on the ability to numerically solve an auxiliary Anderson-type impurity problem. While powerful Matsubara-frequency solvers have been developed over the past two decades to tackle equilibrium situations, the status of real-time impurity solvers that could compete with Matsubara-frequency solvers and be readily generalizable to non-equilibrium situations is still premature. We present a real-time solver which is based on a quantum Master equation description of the dissipative dynamics of the impurity and its exact diagonalization. As a benchmark, we illustrate the strengths of our solver in the context of the equilibrium Mott-insulator transition of the one-band Hubbard model and compare it with iterative perturbation theory (IPT) method. Finally, we discuss its direct application to a nonequilibrium situation.
Scalable Adaptive Multilevel Solvers for Multiphysics Problems
Xu, Jinchao
2014-12-01
In this project, we investigated adaptive, parallel, and multilevel methods for numerical modeling of various real-world applications, including Magnetohydrodynamics (MHD), complex fluids, Electromagnetism, Navier-Stokes equations, and reservoir simulation. First, we have designed improved mathematical models and numerical discretizaitons for viscoelastic fluids and MHD. Second, we have derived new a posteriori error estimators and extended the applicability of adaptivity to various problems. Third, we have developed multilevel solvers for solving scalar partial differential equations (PDEs) as well as coupled systems of PDEs, especially on unstructured grids. Moreover, we have integrated the study between adaptive method and multilevel methods, and made significant efforts and advances in adaptive multilevel methods of the multi-physics problems.
Optimising a parallel conjugate gradient solver
Field, M.R.
1996-12-31
This work arises from the introduction of a parallel iterative solver to a large structural analysis finite element code. The code is called FEX and it was developed at Hitachi`s Mechanical Engineering Laboratory. The FEX package can deal with a large range of structural analysis problems using a large number of finite element techniques. FEX can solve either stress or thermal analysis problems of a range of different types from plane stress to a full three-dimensional model. These problems can consist of a number of different materials which can be modelled by a range of material models. The structure being modelled can have the load applied at either a point or a surface, or by a pressure, a centrifugal force or just gravity. Alternatively a thermal load can be applied with a given initial temperature. The displacement of the structure can be constrained by having a fixed boundary or by prescribing the displacement at a boundary.
General purpose nonlinear system solver based on Newton-Krylov method.
Energy Science and Technology Software Center (ESTSC)
2013-12-01
KINSOL is part of a software family called SUNDIALS: SUite of Nonlinear and Differential/Algebraic equation Solvers [1]. KINSOL is a general-purpose nonlinear system solver based on Newton-Krylov and fixed-point solver technologies [2].
Comparison of open-source linear programming solvers.
Gearhart, Jared Lee; Adair, Kristin Lynn; Durfee, Justin D.; Jones, Katherine A.; Martin, Nathaniel; Detry, Richard Joseph
2013-10-01
When developing linear programming models, issues such as budget limitations, customer requirements, or licensing may preclude the use of commercial linear programming solvers. In such cases, one option is to use an open-source linear programming solver. A survey of linear programming tools was conducted to identify potential open-source solvers. From this survey, four open-source solvers were tested using a collection of linear programming test problems and the results were compared to IBM ILOG CPLEX Optimizer (CPLEX) [1], an industry standard. The solvers considered were: COIN-OR Linear Programming (CLP) [2], [3], GNU Linear Programming Kit (GLPK) [4], lp_solve [5] and Modular In-core Nonlinear Optimization System (MINOS) [6]. As no open-source solver outperforms CPLEX, this study demonstrates the power of commercial linear programming software. CLP was found to be the top performing open-source solver considered in terms of capability and speed. GLPK also performed well but cannot match the speed of CLP or CPLEX. lp_solve and MINOS were considerably slower and encountered issues when solving several test problems.
Robust large-scale parallel nonlinear solvers for simulations.
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write
Multi-GPU kinetic solvers using MPI and CUDA
NASA Astrophysics Data System (ADS)
Zabelok, Sergey; Arslanbekov, Robert; Kolobov, Vladimir
2014-12-01
This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. The main challenge of porting UFS to graphics processing units (GPUs) comes from the dynamically adapted mesh, which causes irregular data access. We describe the implementation of CUDA kernels for three modules in UFS: the direct Boltzmann solver using discrete velocity method (DVM), the DSMC module, and the Lattice Boltzmann Method (LBM) solver, all using octree Cartesian mesh with adaptive Mesh Refinement (AMR). Double digit speedup on single GPU and good scaling for multi-GPU has been demonstrated.
NASA Astrophysics Data System (ADS)
Kevlahan, N. N.; Vasilyev, O. V.; Yuen, D. A.
2003-12-01
An adaptive multilevel wavelet collocation method for solving multi-dimensional elliptic problems with localized structures is developed. The method is based on the general class of multi-dimensional second generation wavelets and is an extension of the dynamically adaptive second generation wavelet collocation method for evolution problems. Wavelet decomposition is used for grid adaptation and interpolation, while O(N) hierarchical finite difference scheme, which takes advantage of wavelet multilevel decomposition, is used for derivative calculations. The multilevel structure of the wavelet approximation provides a natural way to obtain the solution on a near optimal grid. In order to accelerate the convergence of the iterative solver, an iterative procedure analogous to the multigrid algorithm is developed. For the problems with slowly varying viscosity simple diagonal preconditioning works. For problems with large laterally varying viscosity contrasts either direct solver on shared-memory machines or multilevel iterative solver with incomplete LU preconditioner may be used. The method is demonstrated for the solution of a number of two-dimensional elliptic test problems with both constant and spatially varying viscosity with multiscale character.
A mimetic spectral element solver for the Grad-Shafranov equation
NASA Astrophysics Data System (ADS)
Palha, A.; Koren, B.; Felici, F.
2016-07-01
In this work we present a robust and accurate arbitrary order solver for the fixed-boundary plasma equilibria in toroidally axisymmetric geometries. To achieve this we apply the mimetic spectral element formulation presented in [56] to the solution of the Grad-Shafranov equation. This approach combines a finite volume discretization with the mixed finite element method. In this way the discrete differential operators (∇, ∇×, ∇ṡ) can be represented exactly and metric and all approximation errors are present in the constitutive relations. The result of this formulation is an arbitrary order method even on highly curved meshes. Additionally, the integral of the toroidal current Jϕ is exactly equal to the boundary integral of the poloidal field over the plasma boundary. This property can play an important role in the coupling between equilibrium and transport solvers. The proposed solver is tested on a varied set of plasma cross sections (smooth and with an X-point) and also for a wide range of pressure and toroidal magnetic flux profiles. Equilibria accurate up to machine precision are obtained. Optimal algebraic convergence rates of order p + 1 and geometric convergence rates are shown for Soloviev solutions (including high Shafranov shifts), field-reversed configuration (FRC) solutions and spheromak analytical solutions. The robustness of the method is demonstrated for non-linear test cases, in particular on an equilibrium solution with a pressure pedestal.
Flood simulation using an open source quadtree grid shallow water flow solver
NASA Astrophysics Data System (ADS)
An, H.; Yu, S.
2012-12-01
We carry out performance testing of Gerris for flood simulation. Gerris Flow Solver is open source software and has the capability of adaptive quadtree grid generation. In particular, the shallow water flow solver within Gerris Flow Solver implements second-order accurate Gudunov type numerical schemes, with preserving the balance of source and flux terms on quadtree cut cell grids. The combination of quadtree grids with the cut cell method improves the flexibility of quadtree grids for grid generation. In addition, the model has the capacity of adaptive meshing in an easy and effective way, which can improve computational efficiency in 2D modeling. Pre- and post-processors are already well equipped for users. Finally, an extension such as bed erosion or sediment transport can be added if needed. Two flood events, Malpasset dam break in France and Baeksan levee failure in Korea, are simulated using Gerris, with adaptively refining meshes near water fronts and the river boundary. Simulation results are compared with survey data, experimental data as well as simulation results by other researchers. The simulation results demonstrate that the adaptive quadtree model can save approximately 95% of the computational cost while preserving the accuracy. Gerris is a very attractive alternative for flood managers given the favorable features demonstrated in this paper.
An implicit compact scheme solver with application to chemically reacting flows
NASA Astrophysics Data System (ADS)
Noskov, Mikhail; Smooke, Mitchell D.
2005-03-01
A novel, stable, implicit compact scheme solver that is higher order in space, suitable for modeling steady-state and time-dependent phenomena on nonuniform grids for one-dimensional configurations, is presented. Several properties of compact scheme discretizations are introduced to develop efficient algorithms for Jacobian matrix generation and Jacobian-vector multiplication using a new component form for Jacobian operations. Composite nonuniform grids are introduced that enable the implicit compact scheme solver to achieve sixth order accuracy. A robust Newton's method is employed with explicit generation of Jacobian matrices. Superior resolution characteristics of the implicit compact scheme solver are demonstrated with several steady-state and time-dependent problems for the Burgers equation. The example of the solution of stiff flame problem is given. An analysis of spectral properties of Jacobian matrices is presented, which shows that the condition number and the eigenvalue distributions behave similarly to those found in Jacobians associated with low-order discretizations. Two sparsification strategies are developed for the systematic approximation of a dense Jacobian aimed at the practical implementation of linear system preconditioning through partial Jacobians.
NASA Astrophysics Data System (ADS)
Geng, Weihua; Krasny, Robert
2013-08-01
We present a treecode-accelerated boundary integral (TABI) solver for electrostatics of solvated biomolecules described by the linear Poisson-Boltzmann equation. The method employs a well-conditioned boundary integral formulation for the electrostatic potential and its normal derivative on the molecular surface. The surface is triangulated and the integral equations are discretized by centroid collocation. The linear system is solved by GMRES iteration and the matrix-vector product is carried out by a Cartesian treecode which reduces the cost from O(N2) to O(NlogN), where N is the number of faces in the triangulation. The TABI solver is applied to compute the electrostatic solvation energy in two cases, the Kirkwood sphere and a solvated protein. We present the error, CPU time, and memory usage, and compare results for the Poisson-Boltzmann and Poisson equations. We show that the treecode approximation error can be made smaller than the discretization error, and we compare two versions of the treecode, one with uniform clusters and one with non-uniform clusters adapted to the molecular surface. For the protein test case, we compare TABI results with those obtained using the grid-based APBS code, and we also present parallel TABI simulations using up to eight processors. We find that the TABI solver exhibits good serial and parallel performance combined with relatively simple implementation, efficient memory usage, and geometric adaptability.
NASA Astrophysics Data System (ADS)
Balsara, Dinshaw S.; Vides, Jeaniffer; Gurski, Katharine; Nkonga, Boniface; Dumbser, Michael; Garain, Sudip; Audit, Edouard
2016-01-01
Just as the quality of a one-dimensional approximate Riemann solver is improved by the inclusion of internal sub-structure, the quality of a multidimensional Riemann solver is also similarly improved. Such multidimensional Riemann problems arise when multiple states come together at the vertex of a mesh. The interaction of the resulting one-dimensional Riemann problems gives rise to a strongly-interacting state. We wish to endow this strongly-interacting state with physically-motivated sub-structure. The self-similar formulation of Balsara [16] proves especially useful for this purpose. While that work is based on a Galerkin projection, in this paper we present an analogous self-similar formulation that is based on a different interpretation. In the present formulation, we interpret the shock jumps at the boundary of the strongly-interacting state quite literally. The enforcement of the shock jump conditions is done with a least squares projection (Vides, Nkonga and Audit [67]). With that interpretation, we again show that the multidimensional Riemann solver can be endowed with sub-structure. However, we find that the most efficient implementation arises when we use a flux vector splitting and a least squares projection. An alternative formulation that is based on the full characteristic matrices is also presented. The multidimensional Riemann solvers that are demonstrated here use one-dimensional HLLC Riemann solvers as building blocks. Several stringent test problems drawn from hydrodynamics and MHD are presented to show that the method works. Results from structured and unstructured meshes demonstrate the versatility of our method. The reader is also invited to watch a video introduction to multidimensional Riemann solvers on http://www.nd.edu/~dbalsara/Numerical-PDE-Course.
Elliptic Solvers for Adaptive Mesh Refinement Grids
Quinlan, D.J.; Dendy, J.E., Jr.; Shapira, Y.
1999-06-03
We are developing multigrid methods that will efficiently solve elliptic problems with anisotropic and discontinuous coefficients on adaptive grids. The final product will be a library that provides for the simplified solution of such problems. This library will directly benefit the efforts of other Laboratory groups. The focus of this work is research on serial and parallel elliptic algorithms and the inclusion of our black-box multigrid techniques into this new setting. The approach applies the Los Alamos object-oriented class libraries that greatly simplify the development of serial and parallel adaptive mesh refinement applications. In the final year of this LDRD, we focused on putting the software together; in particular we completed the final AMR++ library, we wrote tutorials and manuals, and we built example applications. We implemented the Fast Adaptive Composite Grid method as the principal elliptic solver. We presented results at the Overset Grid Conference and other more AMR specific conferences. We worked on optimization of serial and parallel performance and published several papers on the details of this work. Performance remains an important issue and is the subject of continuing research work.
Advanced Multigrid Solvers for Fluid Dynamics
NASA Technical Reports Server (NTRS)
Brandt, Achi
1999-01-01
The main objective of this project has been to support the development of multigrid techniques in computational fluid dynamics that can achieve "textbook multigrid efficiency" (TME), which is several orders of magnitude faster than current industrial CFD solvers. Toward that goal we have assembled a detailed table which lists every foreseen kind of computational difficulty for achieving it, together with the possible ways for resolving the difficulty, their current state of development, and references. We have developed several codes to test and demonstrate, in the framework of simple model problems, several approaches for overcoming the most important of the listed difficulties that had not been resolved before. In particular, TME has been demonstrated for incompressible flows on one hand, and for near-sonic flows on the other hand. General approaches were advanced for the relaxation of stagnation points and boundary conditions under various situations. Also, new algebraic multigrid techniques were formed for treating unstructured grid formulations. More details on all these are given below.
NASA Technical Reports Server (NTRS)
Raju, Manthena S.
1998-01-01
Sprays occur in a wide variety of industrial and power applications and in the processing of materials. A liquid spray is a phase flow with a gas as the continuous phase and a liquid as the dispersed phase (in the form of droplets or ligaments). Interactions between the two phases, which are coupled through exchanges of mass, momentum, and energy, can occur in different ways at different times and locations involving various thermal, mass, and fluid dynamic factors. An understanding of the flow, combustion, and thermal properties of a rapidly vaporizing spray requires careful modeling of the rate-controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as other phenomena. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on sprays to unstructured grids and parallel computing. LSPRAY, which was developed by M.S. Raju of Nyma, Inc., is designed to be massively parallel and could easily be coupled with any existing gas-phase flow and/or Monte Carlo probability density function (PDF) solver. The LSPRAY solver accommodates the use of an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements in the gas-phase solvers. It is used specifically for fuel sprays within gas turbine combustors, but it has many other uses. The spray model used in LSPRAY provided favorable results when applied to stratified-charge rotary combustion (Wankel) engines and several other confined and unconfined spray flames. The source code will be available with the National Combustion Code (NCC) as a complete package.
Handling Vacuum Regions in a Hybrid Plasma Solver
NASA Astrophysics Data System (ADS)
Holmström, M.
2013-04-01
In a hybrid plasma solver (particle ions, fluid mass-less electrons) regions of vacuum, or very low charge density, can cause problems since the evaluation of the electric field involves division by charge density. This causes large electric fields in low density regions that can lead to numerical instabilities. Here we propose a self consistent handling of vacuum regions for hybrid solvers. Vacuum regions can be considered having infinite resistivity, and in this limit Faraday's law approaches a magnetic diffusion equation. We describe an algorithm that solves such a diffusion equation in regions with charge density below a threshold value. We also present an implementation of this algorithm in a hybrid plasma solver, and an application to the interaction between the Moon and the solar wind. We also discuss the implementation of hyperresistivity for smoothing the electric field in a PIC solver.
NASA Astrophysics Data System (ADS)
Alemi Ardakani, Hamid; Bridges, Thomas J.; Turner, Matthew R.
2016-06-01
A class of augmented approximate Riemann solvers due to George (2008) [12] is extended to solve the shallow-water equations in a moving vessel with variable bottom topography and variable cross-section with wetting and drying. A class of Roe-type upwind solvers for the system of balance laws is derived which respects the steady-state solutions. The numerical solutions of the new adapted augmented f-wave solvers are validated against the Roe-type solvers. The theory is extended to solve the shallow-water flows in moving vessels with arbitrary cross-section with influx-efflux boundary conditions motivated by the shallow-water sloshing in the ocean wave energy converter (WEC) proposed by Offshore Wave Energy Ltd. (OWEL) [1]. A fractional step approach is used to handle the time-dependent forcing functions. The numerical solutions are compared to an extended new Roe-type solver for the system of balance laws with a time-dependent source function. The shallow-water sloshing finite volume solver can be coupled to a Runge-Kutta integrator for the vessel motion.
Multilevel solvers of first-order system least-squares for Stokes equations
Lai, Chen-Yao G.
1996-12-31
Recently, The use of first-order system least squares principle for the approximate solution of Stokes problems has been extensively studied by Cai, Manteuffel, and McCormick. In this paper, we study multilevel solvers of first-order system least-squares method for the generalized Stokes equations based on the velocity-vorticity-pressure formulation in three dimensions. The least-squares functionals is defined to be the sum of the L{sup 2}-norms of the residuals, which is weighted appropriately by the Reynolds number. We develop convergence analysis for additive and multiplicative multilevel methods applied to the resulting discrete equations.
Benchmarking transport solvers for fracture flow problems
NASA Astrophysics Data System (ADS)
Olkiewicz, Piotr; Dabrowski, Marcin
2015-04-01
Fracture flow may dominate in rocks with low porosity and it can accompany both industrial and natural processes. Typical examples of such processes are natural flows in crystalline rocks and industrial flows in geothermal systems or hydraulic fracturing. Fracture flow provides an important mechanism for transporting mass and energy. For example, geothermal energy is primarily transported by the flow of the heated water or steam rather than by the thermal diffusion. The geometry of the fracture network and the distribution of the mean apertures of individual fractures are the key parameters with regard to the fracture network transmissivity. Transport in fractures can occur through the combination of advection and diffusion processes like in the case of dissolved chemical components. The local distribution of the fracture aperture may play an important role for both flow and transport processes. In this work, we benchmark various numerical solvers for flow and transport processes in a single fracture in 2D and 3D. Fracture aperture distributions are generated by a number of synthetic methods. We examine a single-phase flow of an incompressible viscous Newtonian fluid in the low Reynolds number limit. Periodic boundary conditions are used and a pressure difference is imposed in the background. The velocity field is primarly found using the Stokes equations. We systematically compare the obtained velocity field to the results obtained by solving the Reynolds equation. This allows us to examine the impact of the aperture distribution on the permeability of the medium and the local velocity distribution for two different mathematical descriptions of the fracture flow. Furthermore, we analyse the impact of aperture distribution on the front characteristics such as the standard deviation and the fractal dimension for systems in 2D and 3D.
A Comparative Study of Randomized Constraint Solvers for Random-Symbolic Testing
NASA Technical Reports Server (NTRS)
Takaki, Mitsuo; Cavalcanti, Diego; Gheyi, Rohit; Iyoda, Juliano; dAmorim, Marcelo; Prudencio, Ricardo
2009-01-01
The complexity of constraints is a major obstacle for constraint-based software verification. Automatic constraint solvers are fundamentally incomplete: input constraints often build on some undecidable theory or some theory the solver does not support. This paper proposes and evaluates several randomized solvers to address this issue. We compare the effectiveness of a symbolic solver (CVC3), a random solver, three hybrid solvers (i.e., mix of random and symbolic), and two heuristic search solvers. We evaluate the solvers on two benchmarks: one consisting of manually generated constraints and another generated with a concolic execution of 8 subjects. In addition to fully decidable constraints, the benchmarks include constraints with non-linear integer arithmetic, integer modulo and division, bitwise arithmetic, and floating-point arithmetic. As expected symbolic solving (in particular, CVC3) subsumes the other solvers for the concolic execution of subjects that only generate decidable constraints. For the remaining subjects the solvers are complementary.
Solving Upwind-Biased Discretizations. 2; Multigrid Solver Using Semicoarsening
NASA Technical Reports Server (NTRS)
Diskin, Boris
1999-01-01
This paper studies a novel multigrid approach to the solution for a second order upwind biased discretization of the convection equation in two dimensions. This approach is based on semi-coarsening and well balanced explicit correction terms added to coarse-grid operators to maintain on coarse-grid the same cross-characteristic interaction as on the target (fine) grid. Colored relaxation schemes are used on all the levels allowing a very efficient parallel implementation. The results of the numerical tests can be summarized as follows: 1) The residual asymptotic convergence rate of the proposed V(0, 2) multigrid cycle is about 3 per cycle. This convergence rate far surpasses the theoretical limit (4/3) predicted for standard multigrid algorithms using full coarsening. The reported efficiency does not deteriorate with increasing the cycle, depth (number of levels) and/or refining the target-grid mesh spacing. 2) The full multi-grid algorithm (FMG) with two V(0, 2) cycles on the target grid and just one V(0, 2) cycle on all the coarse grids always provides an approximate solution with the algebraic error less than the discretization error. Estimates of the total work in the FMG algorithm are ranged between 18 and 30 minimal work units (depending on the target (discretizatioin). Thus, the overall efficiency of the FMG solver closely approaches (if does not achieve) the goal of the textbook multigrid efficiency. 3) A novel approach to deriving a discrete solution approximating the true continuous solution with a relative accuracy given in advance is developed. An adaptive multigrid algorithm (AMA) using comparison of the solutions on two successive target grids to estimate the accuracy of the current target-grid solution is defined. A desired relative accuracy is accepted as an input parameter. The final target grid on which this accuracy can be achieved is chosen automatically in the solution process. the actual relative accuracy of the discrete solution approximation
Quantitative analysis of numerical solvers for oscillatory biomolecular system models
Quo, Chang F; Wang, May D
2008-01-01
Background This article provides guidelines for selecting optimal numerical solvers for biomolecular system models. Because various parameters of the same system could have drastically different ranges from 10-15 to 1010, the ODEs can be stiff and ill-conditioned, resulting in non-unique, non-existing, or non-reproducible modeling solutions. Previous studies have not examined in depth how to best select numerical solvers for biomolecular system models, which makes it difficult to experimentally validate the modeling results. To address this problem, we have chosen one of the well-known stiff initial value problems with limit cycle behavior as a test-bed system model. Solving this model, we have illustrated that different answers may result from different numerical solvers. We use MATLAB numerical solvers because they are optimized and widely used by the modeling community. We have also conducted a systematic study of numerical solver performances by using qualitative and quantitative measures such as convergence, accuracy, and computational cost (i.e. in terms of function evaluation, partial derivative, LU decomposition, and "take-off" points). The results show that the modeling solutions can be drastically different using different numerical solvers. Thus, it is important to intelligently select numerical solvers when solving biomolecular system models. Results The classic Belousov-Zhabotinskii (BZ) reaction is described by the Oregonator model and is used as a case study. We report two guidelines in selecting optimal numerical solver(s) for stiff, complex oscillatory systems: (i) for problems with unknown parameters, ode45 is the optimal choice regardless of the relative error tolerance; (ii) for known stiff problems, both ode113 and ode15s are good choices under strict relative tolerance conditions. Conclusions For any given biomolecular model, by building a library of numerical solvers with quantitative performance assessment metric, we show that it is possible
Euler/Navier-Stokes Solvers Applied to Ducted Fan Configurations
NASA Technical Reports Server (NTRS)
Keith, Theo G., Jr.; Srivastava, Rakesh
1997-01-01
Due to noise considerations, ultra high bypass ducted fans have become a more viable design. These ducted fans typically consist of a rotor stage containing a wide chord fan and a stator stage. One of the concerns for this design is the classical flutter that keeps occurring in various unducted fan blade designs. These flutter are catastrophic and are to be avoided in the flight envelope of the engine. Some numerical investigations by Williams, Cho and Dalton, have suggested that a duct around a propeller makes it more unstable. This needs to be further investigated. In order to design an engine to safely perform a set of desired tasks, accurate information of the stresses on the blade during the entire cycle of blade motion is required. This requirement in turn demands that accurate knowledge of steady and unsteady blade loading be available. Aerodynamic solvers based on unsteady three-dimensional analysis will provide accurate and fast solutions and are best suited for aeroelastic analysis. The Euler solvers capture significant physics of the flowfield and are reasonably fast. An aerodynamic solver Ref. based on Euler equations had been developed under a separate grant from NASA Lewis in the past. Under the current grant, this solver has been modified to calculate the aeroelastic characteristics of unducted and ducted rotors. Even though, the aeroelastic solver based on three-dimensional Euler equations is computationally efficient, it is still very expensive to investigate the effects of multiple stages on the aeroelastic characteristics. In order to investigate the effects of multiple stages, a two-dimensional multi stage aeroelastic solver was also developed under this task, in collaboration with Dr. T. S. R. Reddy of the University of Toledo. Both of these solvers were applied to several test cases and validated against experimental data, where available.
Performance Models for the Spike Banded Linear System Solver
Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; Grama, Ananth
2011-01-01
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated
Performance of Nonlinear Finite-Difference Poisson-Boltzmann Solvers.
Cai, Qin; Hsieh, Meng-Juei; Wang, Jun; Luo, Ray
2010-01-12
We implemented and optimized seven finite-difference solvers for the full nonlinear Poisson-Boltzmann equation in biomolecular applications, including four relaxation methods, one conjugate gradient method, and two inexact Newton methods. The performance of the seven solvers was extensively evaluated with a large number of nucleic acids and proteins. Worth noting is the inexact Newton method in our analysis. We investigated the role of linear solvers in its performance by incorporating the incomplete Cholesky conjugate gradient and the geometric multigrid into its inner linear loop. We tailored and optimized both linear solvers for faster convergence rate. In addition, we explored strategies to optimize the successive over-relaxation method to reduce its convergence failures without too much sacrifice in its convergence rate. Specifically we attempted to adaptively change the relaxation parameter and to utilize the damping strategy from the inexact Newton method to improve the successive over-relaxation method. Our analysis shows that the nonlinear methods accompanied with a functional-assisted strategy, such as the conjugate gradient method and the inexact Newton method, can guarantee convergence in the tested molecules. Especially the inexact Newton method exhibits impressive performance when it is combined with highly efficient linear solvers that are tailored for its special requirement. PMID:24723843
The novel high-performance 3-D MT inverse solver
NASA Astrophysics Data System (ADS)
Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey
2016-04-01
We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
Adaptive kinetic-fluid solvers for heterogeneous computing architectures
NASA Astrophysics Data System (ADS)
Zabelok, Sergey; Arslanbekov, Robert; Kolobov, Vladimir
2015-12-01
We show feasibility and benefits of porting an adaptive multi-scale kinetic-fluid code to CPU-GPU systems. Challenges are due to the irregular data access for adaptive Cartesian mesh, vast difference of computational cost between kinetic and fluid cells, and desire to evenly load all CPUs and GPUs during grid adaptation and algorithm refinement. Our Unified Flow Solver (UFS) combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. Using GPUs enables hybrid simulations of mixed rarefied-continuum flows with a million of Boltzmann cells each having a 24 × 24 × 24 velocity mesh. We describe the implementation of CUDA kernels for three modules in UFS: the direct Boltzmann solver using the discrete velocity method (DVM), the Direct Simulation Monte Carlo (DSMC) solver, and a mesoscopic solver based on the Lattice Boltzmann Method (LBM), all using adaptive Cartesian mesh. Double digit speedups on single GPU and good scaling for multi-GPUs have been demonstrated.
Continuous-time quantum Monte Carlo impurity solvers
NASA Astrophysics Data System (ADS)
Gull, Emanuel; Werner, Philipp; Fuchs, Sebastian; Surer, Brigitte; Pruschke, Thomas; Troyer, Matthias
2011-04-01
Continuous-time quantum Monte Carlo impurity solvers are algorithms that sample the partition function of an impurity model using diagrammatic Monte Carlo techniques. The present paper describes codes that implement the interaction expansion algorithm originally developed by Rubtsov, Savkin, and Lichtenstein, as well as the hybridization expansion method developed by Werner, Millis, Troyer, et al. These impurity solvers are part of the ALPS-DMFT application package and are accompanied by an implementation of dynamical mean-field self-consistency equations for (single orbital single site) dynamical mean-field problems with arbitrary densities of states. Program summaryProgram title: dmft Catalogue identifier: AEIL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: ALPS LIBRARY LICENSE version 1.1 No. of lines in distributed program, including test data, etc.: 899 806 No. of bytes in distributed program, including test data, etc.: 32 153 916 Distribution format: tar.gz Programming language: C++ Operating system: The ALPS libraries have been tested on the following platforms and compilers: Linux with GNU Compiler Collection (g++ version 3.1 and higher), and Intel C++ Compiler (icc version 7.0 and higher) MacOS X with GNU Compiler (g++ Apple-version 3.1, 3.3 and 4.0) IBM AIX with Visual Age C++ (xlC version 6.0) and GNU (g++ version 3.1 and higher) compilers Compaq Tru64 UNIX with Compq C++ Compiler (cxx) SGI IRIX with MIPSpro C++ Compiler (CC) HP-UX with HP C++ Compiler (aCC) Windows with Cygwin or coLinux platforms and GNU Compiler Collection (g++ version 3.1 and higher) RAM: 10 MB-1 GB Classification: 7.3 External routines: ALPS [1], BLAS/LAPACK, HDF5 Nature of problem: (See [2].) Quantum impurity models describe an atom or molecule embedded in a host material with which it can exchange electrons. They are basic to nanoscience as
General Equation Set Solver for Compressible and Incompressible Turbomachinery Flows
NASA Technical Reports Server (NTRS)
Sondak, Douglas L.; Dorney, Daniel J.
2002-01-01
Turbomachines for propulsion applications operate with many different working fluids and flow conditions. The flow may be incompressible, such as in the liquid hydrogen pump in a rocket engine, or supersonic, such as in the turbine which may drive the hydrogen pump. Separate codes have traditionally been used for incompressible and compressible flow solvers. The General Equation Set (GES) method can be used to solve both incompressible and compressible flows, and it is not restricted to perfect gases, as are many compressible-flow turbomachinery solvers. An unsteady GES turbomachinery flow solver has been developed and applied to both air and water flows through turbines. It has been shown to be an excellent alternative to maintaining two separate codes.
A multiple right hand side iterative solver for history matching
Killough, J.E.; Sharma, Y.; Dupuy, A.; Bissell, R.; Wallis, J.
1995-12-31
History matching of oil and gas reservoirs can be accelerated by directly calculating the gradients of observed quantities (e.g., well pressure) with respect to the adjustable reserve parameters (e.g., permeability). This leads to a set of linear equations which add a significant overhead to the full simulation run without gradients. Direct Gauss elimination solvers can be used to address this problem by performing the factorization of the matrix only once and then reusing the factor matrix for the solution of the multiple right hand sides. This is a limited technique, however. Experience has shown that problems with greater than few thousand cells may not be practical for direct solvers because of computation time and memory limitations. This paper discusses the implementation of a multiple right hand side iterative linear equation solver (MRHS) for a system of adjoint equations to significantly enhance the performance of a gradient simulator.
Gpu Implementation of a Viscous Flow Solver on Unstructured Grids
NASA Astrophysics Data System (ADS)
Xu, Tianhao; Chen, Long
2016-06-01
Graphics processing units have gained popularities in scientific computing over past several years due to their outstanding parallel computing capability. Computational fluid dynamics applications involve large amounts of calculations, therefore a latest GPU card is preferable of which the peak computing performance and memory bandwidth are much better than a contemporary high-end CPU. We herein focus on the detailed implementation of our GPU targeting Reynolds-averaged Navier-Stokes equations solver based on finite-volume method. The solver employs a vertex-centered scheme on unstructured grids for the sake of being capable of handling complex topologies. Multiple optimizations are carried out to improve the memory accessing performance and kernel utilization. Both steady and unsteady flow simulation cases are carried out using explicit Runge-Kutta scheme. The solver with GPU acceleration in this paper is demonstrated to have competitive advantages over the CPU targeting one.
Two Solvers for Tractable Temporal Constraints with Preferences
NASA Technical Reports Server (NTRS)
Rossi, F.; Khatib,L.; Morris, P.; Morris, R.; Clancy, Daniel (Technical Monitor)
2002-01-01
A number of reasoning problems involving the manipulation of temporal information can naturally be viewed as implicitly inducing an ordering of potential local decisions involving time on the basis of preferences. Soft temporal constraints problems allow to describe in a natural way scenarios where events happen over time and preferences are associated to event distances and durations. In general, solving soft temporal problems require exponential time in the worst case, but there are interesting subclasses of problems which are polynomially solvable. We describe two solvers based on two different approaches for solving the same tractable subclass. For each solver we present the theoretical results it stands on, a description of the algorithm and some experimental results. The random generator used to build the problems on which tests are performed is also described. Finally, we compare the two solvers highlighting the tradeoff between performance and representational power.
An evaluation of parallel multigrid as a solver and a preconditioner for singular perturbed problems
Oosterlee, C.W.; Washio, T.
1996-12-31
In this paper we try to achieve h-independent convergence with preconditioned GMRES and BiCGSTAB for 2D singular perturbed equations. Three recently developed multigrid methods are adopted as a preconditioner. They are also used as solution methods in order to compare the performance of the methods as solvers and as preconditioners. Two of the multigrid methods differ only in the transfer operators. One uses standard matrix- dependent prolongation operators from. The second uses {open_quotes}upwind{close_quotes} prolongation operators, developed. Both employ the Galerkin coarse grid approximation and an alternating zebra line Gauss-Seidel smoother. The third method is based on the block LU decomposition of a matrix and on an approximate Schur complement. This multigrid variant is presented in. All three multigrid algorithms are algebraic methods.
Rasin, A.
1994-04-01
We discuss the idea of approximate flavor symmetries. Relations between approximate flavor symmetries and natural flavor conservation and democracy models is explored. Implications for neutrino physics are also discussed.
Median Approximations for Genomes Modeled as Matrices.
Zanetti, Joao Paulo Pereira; Biller, Priscila; Meidanis, Joao
2016-04-01
The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a [Formula: see text]-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates. PMID:27072561
Fast Euler solver for transonic airfoils. I - Theory. II - Applications
NASA Technical Reports Server (NTRS)
Dadone, Andrea; Moretti, Gino
1988-01-01
Equations written in terms of generalized Riemann variables are presently integrated by inverting six bidiagonal matrices and two tridiagonal matrices, using an implicit Euler solver that is based on the lambda-formulation. The solution is found on a C-grid whose boundaries are very close to the airfoil. The fast solver is then applied to the computation of several flowfields on a NACA 0012 airfoil at various Mach number and alpha values, yielding results that are primarily concerned with transonic flows. The effects of grid fineness and boundary distances are analyzed; the code is found to be robust and accurate, as well as fast.
Numerical System Solver Developed for the National Cycle Program
NASA Technical Reports Server (NTRS)
Binder, Michael P.
1999-01-01
As part of the National Cycle Program (NCP), a powerful new numerical solver has been developed to support the simulation of aeropropulsion systems. This software uses a hierarchical object-oriented design. It can provide steady-state and time-dependent solutions to nonlinear and even discontinuous problems typically encountered when aircraft and spacecraft propulsion systems are simulated. It also can handle constrained solutions, in which one or more factors may limit the behavior of the engine system. Timedependent simulation capabilities include adaptive time-stepping and synchronization with digital control elements. The NCP solver is playing an important role in making the NCP a flexible, powerful, and reliable simulation package.
Coordinate Projection-based Solver for ODE with Invariants
Serban, Radu
2008-04-08
CPODES is a general purpose (serial and parallel) solver for systems of ordinary differential equation (ODE) with invariants. It implements a coordinate projection approach using different types of projection (orthogonal or oblique) and one of several methods for the decompositon of the Jacobian of the invariant equations.
Navier-Stokes Solvers and Generalizations for Reacting Flow Problems
Elman, Howard C
2013-01-27
This is an overview of our accomplishments during the final term of this grant (1 September 2008 -- 30 June 2012). These fall mainly into three categories: fast algorithms for linear eigenvalue problems; solution algorithms and modeling methods for partial differential equations with uncertain coefficients; and preconditioning methods and solvers for models of computational fluid dynamics (CFD).
Advances in computational fluid dynamics solvers for modern computing environments
NASA Astrophysics Data System (ADS)
Hertenstein, Daniel; Humphrey, John R.; Paolini, Aaron L.; Kelmelis, Eric J.
2013-05-01
EM Photonics has been investigating the application of massively multicore processors to a key problem area: Computational Fluid Dynamics (CFD). While the capabilities of CFD solvers have continually increased and improved to support features such as moving bodies and adjoint-based mesh adaptation, the software architecture has often lagged behind. This has led to poor scaling as core counts reach the tens of thousands. In the modern High Performance Computing (HPC) world, clusters with hundreds of thousands of cores are becoming the standard. In addition, accelerator devices such as NVIDIA GPUs and Intel Xeon Phi are being installed in many new systems. It is important for CFD solvers to take advantage of the new hardware as the computations involved are well suited for the massively multicore architecture. In our work, we demonstrate that new features in NVIDIA GPUs are able to empower existing CFD solvers by example using AVUS, a CFD solver developed by the Air Force Research Labratory (AFRL) and the Volcanic Ash Advisory Center (VAAC). The effort has resulted in increased performance and scalability without sacrificing accuracy. There are many well-known codes in the CFD space that can benefit from this work, such as FUN3D, OVERFLOW, and TetrUSS. Such codes are widely used in the commercial, government, and defense sectors.
Intellectual Abilities That Discriminate Good and Poor Problem Solvers.
ERIC Educational Resources Information Center
Meyer, Ruth Ann
1981-01-01
This study compared good and poor fourth-grade problem solvers on a battery of 19 "reference" tests for verbal, induction, numerical, word fluency, memory, perceptual speed, and simple visualization abilities. Results suggest verbal, numerical, and especially induction abilities are important to successful mathematical problem solving. (MP)
Two level scheme solvers for nuclear spectroscopy
NASA Astrophysics Data System (ADS)
Jansson, Kaj; DiJulio, Douglas; Cederkäll, Joakim
2011-10-01
A program for building level schemes from γ-spectroscopy coincidence data has been developed. The scheme builder was equipped with two different algorithms: a statistical one based on the Metropolis method and a more logical one, called REMP (REcurse, Merge and Permute), developed from scratch. These two methods are compared both on ideal cases and on experimental γ-ray data sets. The REMP algorithm is based on coincidences and transition energies. Using correct and complete coincidence data, it has solved approximately half a million schemes without failures. Also, for incomplete data and data with minor errors, the algorithm produces consistent sub-schemes when it is not possible to obtain a complete scheme from the provided data.
A strategy to suppress recurrence in grid-based Vlasov solvers
NASA Astrophysics Data System (ADS)
Einkemmer, Lukas; Ostermann, Alexander
2014-07-01
In this paper we propose a strategy to suppress the recurrence effect present in grid-based Vlasov solvers. This method is formulated by introducing a cutoff frequency in Fourier space. Since this cutoff only has to be performed after a number of time steps, the scheme can be implemented efficiently and can relatively easily be incorporated into existing Vlasov solvers. Furthermore, the scheme proposed retains the advantage of grid-based methods in that high accuracy can be achieved. This is due to the fact that in contrast to the scheme proposed by Abbasi et al. no statistical noise is introduced into the simulation. We will illustrate the utility of the method proposed by performing a number of numerical simulations, including the plasma echo phenomenon, using a discontinuous Galerkin approximation in space and a Strang splitting based time integration. Contribution to the Topical Issue "Theory and Applications of the Vlasov Equation", edited by Francesco Pegoraro, Francesco Califano, Giovanni Manfredi and Philip J. Morrison.
GPU accelerated flow solver for direct numerical simulation of turbulent flows
Salvadore, Francesco; Botti, Michela
2013-02-15
Graphical processing units (GPUs), characterized by significant computing performance, are nowadays very appealing for the solution of computationally demanding tasks in a wide variety of scientific applications. However, to run on GPUs, existing codes need to be ported and optimized, a procedure which is not yet standardized and may require non trivial efforts, even to high-performance computing specialists. In the present paper we accurately describe the porting to CUDA (Compute Unified Device Architecture) of a finite-difference compressible Navier–Stokes solver, suitable for direct numerical simulation (DNS) of turbulent flows. Porting and validation processes are illustrated in detail, with emphasis on computational strategies and techniques that can be applied to overcome typical bottlenecks arising from the porting of common computational fluid dynamics solvers. We demonstrate that a careful optimization work is crucial to get the highest performance from GPU accelerators. The results show that the overall speedup of one NVIDIA Tesla S2070 GPU is approximately 22 compared with one AMD Opteron 2352 Barcelona chip and 11 compared with one Intel Xeon X5650 Westmere core. The potential of GPU devices in the simulation of unsteady three-dimensional turbulent flows is proved by performing a DNS of a spatially evolving compressible mixing layer.
GPU accelerated flow solver for direct numerical simulation of turbulent flows
NASA Astrophysics Data System (ADS)
Salvadore, Francesco; Bernardini, Matteo; Botti, Michela
2013-02-01
Graphical processing units (GPUs), characterized by significant computing performance, are nowadays very appealing for the solution of computationally demanding tasks in a wide variety of scientific applications. However, to run on GPUs, existing codes need to be ported and optimized, a procedure which is not yet standardized and may require non trivial efforts, even to high-performance computing specialists. In the present paper we accurately describe the porting to CUDA (Compute Unified Device Architecture) of a finite-difference compressible Navier-Stokes solver, suitable for direct numerical simulation (DNS) of turbulent flows. Porting and validation processes are illustrated in detail, with emphasis on computational strategies and techniques that can be applied to overcome typical bottlenecks arising from the porting of common computational fluid dynamics solvers. We demonstrate that a careful optimization work is crucial to get the highest performance from GPU accelerators. The results show that the overall speedup of one NVIDIA Tesla S2070 GPU is approximately 22 compared with one AMD Opteron 2352 Barcelona chip and 11 compared with one Intel Xeon X5650 Westmere core. The potential of GPU devices in the simulation of unsteady three-dimensional turbulent flows is proved by performing a DNS of a spatially evolving compressible mixing layer.
Multiscale Universal Interface: A concurrent framework for coupling heterogeneous solvers
Tang, Yu-Hang; Kudo, Shuhei; Bian, Xin; Li, Zhen; Karniadakis, George Em
2015-09-15
Graphical abstract: - Abstract: Concurrently coupled numerical simulations using heterogeneous solvers are powerful tools for modeling multiscale phenomena. However, major modifications to existing codes are often required to enable such simulations, posing significant difficulties in practice. In this paper we present a C++ library, i.e. the Multiscale Universal Interface (MUI), which is capable of facilitating the coupling effort for a wide range of multiscale simulations. The library adopts a header-only form with minimal external dependency and hence can be easily dropped into existing codes. A data sampler concept is introduced, combined with a hybrid dynamic/static typing mechanism, to create an easily customizable framework for solver-independent data interpretation. The library integrates MPI MPMD support and an asynchronous communication protocol to handle inter-solver information exchange irrespective of the solvers' own MPI awareness. Template metaprogramming is heavily employed to simultaneously improve runtime performance and code flexibility. We validated the library by solving three different multiscale problems, which also serve to demonstrate the flexibility of the framework in handling heterogeneous models and solvers. In the first example, a Couette flow was simulated using two concurrently coupled Smoothed Particle Hydrodynamics (SPH) simulations of different spatial resolutions. In the second example, we coupled the deterministic SPH method with the stochastic Dissipative Particle Dynamics (DPD) method to study the effect of surface grafting on the hydrodynamics properties on the surface. In the third example, we consider conjugate heat transfer between a solid domain and a fluid domain by coupling the particle-based energy-conserving DPD (eDPD) method with the Finite Element Method (FEM)
Migration of vectorized iterative solvers to distributed memory architectures
Pommerell, C.; Ruehl, R.
1994-12-31
Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.
Multiscale Universal Interface: A concurrent framework for coupling heterogeneous solvers
NASA Astrophysics Data System (ADS)
Tang, Yu-Hang; Kudo, Shuhei; Bian, Xin; Li, Zhen; Karniadakis, George Em
2015-09-01
Concurrently coupled numerical simulations using heterogeneous solvers are powerful tools for modeling multiscale phenomena. However, major modifications to existing codes are often required to enable such simulations, posing significant difficulties in practice. In this paper we present a C++ library, i.e. the Multiscale Universal Interface (MUI), which is capable of facilitating the coupling effort for a wide range of multiscale simulations. The library adopts a header-only form with minimal external dependency and hence can be easily dropped into existing codes. A data sampler concept is introduced, combined with a hybrid dynamic/static typing mechanism, to create an easily customizable framework for solver-independent data interpretation. The library integrates MPI MPMD support and an asynchronous communication protocol to handle inter-solver information exchange irrespective of the solvers' own MPI awareness. Template metaprogramming is heavily employed to simultaneously improve runtime performance and code flexibility. We validated the library by solving three different multiscale problems, which also serve to demonstrate the flexibility of the framework in handling heterogeneous models and solvers. In the first example, a Couette flow was simulated using two concurrently coupled Smoothed Particle Hydrodynamics (SPH) simulations of different spatial resolutions. In the second example, we coupled the deterministic SPH method with the stochastic Dissipative Particle Dynamics (DPD) method to study the effect of surface grafting on the hydrodynamics properties on the surface. In the third example, we consider conjugate heat transfer between a solid domain and a fluid domain by coupling the particle-based energy-conserving DPD (eDPD) method with the Finite Element Method (FEM).
Decision Engines for Software Analysis Using Satisfiability Modulo Theories Solvers
NASA Technical Reports Server (NTRS)
Bjorner, Nikolaj
2010-01-01
The area of software analysis, testing and verification is now undergoing a revolution thanks to the use of automated and scalable support for logical methods. A well-recognized premise is that at the core of software analysis engines is invariably a component using logical formulas for describing states and transformations between system states. The process of using this information for discovering and checking program properties (including such important properties as safety and security) amounts to automatic theorem proving. In particular, theorem provers that directly support common software constructs offer a compelling basis. Such provers are commonly called satisfiability modulo theories (SMT) solvers. Z3 is a state-of-the-art SMT solver. It is developed at Microsoft Research. It can be used to check the satisfiability of logical formulas over one or more theories such as arithmetic, bit-vectors, lists, records and arrays. The talk describes some of the technology behind modern SMT solvers, including the solver Z3. Z3 is currently mainly targeted at solving problems that arise in software analysis and verification. It has been applied to various contexts, such as systems for dynamic symbolic simulation (Pex, SAGE, Vigilante), for program verification and extended static checking (Spec#/Boggie, VCC, HAVOC), for software model checking (Yogi, SLAM), model-based design (FORMULA), security protocol code (F7), program run-time analysis and invariant generation (VS3). We will describe how it integrates support for a variety of theories that arise naturally in the context of the applications. There are several new promising avenues and the talk will touch on some of these and the challenges related to SMT solvers. Proceedings
Efficient three-dimensional Poisson solvers in open rectangular conducting pipe
NASA Astrophysics Data System (ADS)
Qiang, Ji
2016-06-01
Three-dimensional (3D) Poisson solver plays an important role in the study of space-charge effects on charged particle beam dynamics in particle accelerators. In this paper, we propose three new 3D Poisson solvers for a charged particle beam in an open rectangular conducting pipe. These three solvers include a spectral integrated Green function (IGF) solver, a 3D spectral solver, and a 3D integrated Green function solver. These solvers effectively handle the longitudinal open boundary condition using a finite computational domain that contains the beam itself. This saves the computational cost of using an extra larger longitudinal domain in order to set up an appropriate finite boundary condition. Using an integrated Green function also avoids the need to resolve rapid variation of the Green function inside the beam. The numerical operational cost of the spectral IGF solver and the 3D IGF solver scales as O(N log(N)) , where N is the number of grid points. The cost of the 3D spectral solver scales as O(Nn N) , where Nn is the maximum longitudinal mode number. We compare these three solvers using several numerical examples and discuss the advantageous regime of each solver in the physical application.
NASA Technical Reports Server (NTRS)
Dutta, Soumitra
1988-01-01
A model for approximate spatial reasoning using fuzzy logic to represent the uncertainty in the environment is presented. Algorithms are developed which can be used to reason about spatial information expressed in the form of approximate linguistic descriptions similar to the kind of spatial information processed by humans. Particular attention is given to static spatial reasoning.
Development of advanced Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Yoon, Seokkwan
1994-01-01
The objective of research was to develop and validate new computational algorithms for solving the steady and unsteady Euler and Navier-Stokes equations. The end-products are new three-dimensional Euler and Navier-Stokes codes that are faster, more reliable, more accurate, and easier to use. The three-dimensional Euler and full/thin-layer Reynolds-averaged Navier-Stokes equations for compressible/incompressible flows are solved on structured hexahedral grids. The Baldwin-Lomax algebraic turbulence model is used for closure. The space discretization is based on a cell-centered finite-volume method augmented by a variety of numerical dissipation models with optional total variation diminishing limiters. The governing equations are integrated in time by an implicit method based on lower-upper factorization and symmetric Gauss-Seidel relaxation. The algorithm is vectorized on diagonal planes of sweep using two-dimensional indices in three dimensions. Convergence rates and the robustness of the codes are enhanced by the use of an implicit full approximation storage multigrid method.
2d PDE Linear Asymmetric Matrix Solver
Energy Science and Technology Software Center (ESTSC)
1983-10-01
ILUCG2 (Incomplete LU factorized Conjugate Gradient algorithm for 2d problems) was developed to solve a linear asymmetric matrix system arising from a 9-point discretization of two-dimensional elliptic and parabolic partial differential equations found in plasma physics applications, such as plasma diffusion, equilibria, and phase space transport (Fokker-Planck equation) problems. These equations share the common feature of being stiff and requiring implicit solution techniques. When these parabolic or elliptic PDE''s are discretized with finite-difference or finite-elementmore » methods, the resulting matrix system is frequently of block-tridiagonal form. To use ILUCG2, the discretization of the two-dimensional partial differential equation and its boundary conditions must result in a block-tridiagonal supermatrix composed of elementary tridiagonal matrices. A generalization of the incomplete Cholesky conjugate gradient algorithm is used to solve the matrix equation. Loops are arranged to vectorize on the Cray1 with the CFT compiler, wherever possible. Recursive loops, which cannot be vectorized, are written for optimum scalar speed. For problems having a symmetric matrix ICCG2 should be used since it runs up to four times faster and uses approximately 30% less storage. Similar methods in three dimensions are available in ICCG3 and ILUCG3. A general source, containing extensions and macros, which must be processed by a pre-compiler to obtain the standard FORTRAN source, is provided along with the standard FORTRAN source because it is believed to be more readable. The pre-compiler is not included, but pre-compilation may be performed by a text editor as described in the UCRL-88746 Preprint.« less
A two-dimensional fourth-order unstructured-meshed Euler solver based on the CESE method
NASA Astrophysics Data System (ADS)
Bilyeu, David L.; Yu, S.-T. John; Chen, Yung-Yu; Cambier, Jean-Luc
2014-01-01
In this paper, Chang's one-dimensional high-order CESE method [1] is extended to a two-dimensional, unstructured-triangular-meshed Euler solver. This fourth-order CESE method retains all favorable attributes of the original second-order CESE method, including: (i) flux conservation in space and time without using an approximated Riemann solver, (ii) genuine multi-dimensional algorithm without dimensional splitting, (iii) the CFL constraint for stable calculation remains to be ⩽1, (iv) the use of the most compact mesh stencil, involving only the immediate neighboring cells surrounding the cell where the solution at a new time step is sought, and (v) an explicit, unified space-time integration procedure without using a quadrature integration procedure. To demonstrate the new algorithm, three numerical examples are presented: (i) a moving vortex, (ii) acoustic wave interaction, and (iii) supersonic flow over a blunt body. Case 1 shows fourth-order convergence through mesh refinement. In Case 2, the nonlinear Euler solver is applied to simulate linear waves. In Case 3, superb shock capturing capabilities of the new fourth-order method without the carbuncle effect is demonstrated.
NASA Astrophysics Data System (ADS)
Wang, S.; De Hoop, M. V.; Xia, J.; Li, X.
2011-12-01
We consider the modeling of elastic seismic wave propagation on a rectangular domain via the discretization and solution of the inhomogeneous coupled Helmholtz equation in 3D, by exploiting a parallel multifrontal sparse direct solver equipped with Hierarchically Semi-Separable (HSS) structure to reduce the computational complexity and storage. In particular, we are concerned with solving this equation on a large domain, for a large number of different forcing terms in the context of seismic problems in general, and modeling in particular. We resort to a parsimonious mixed grid finite differences scheme for discretizing the Helmholtz operator and Perfect Matched Layer boundaries, resulting in a non-Hermitian matrix. We make use of a nested dissection based domain decomposition, and introduce an approximate direct solver by developing a parallel HSS matrix compression, factorization, and solution approach. We cast our massive parallelization in the framework of the multifrontal method. The assembly tree is partitioned into local trees and a global tree. The local trees are eliminated independently in each processor, while the global tree is eliminated through massive communication. The solver for the inhomogeneous equation is a parallel hybrid between multifrontal and HSS structure. The computational complexity associated with the factorization is almost linear with the size of the Helmholtz matrix. Our numerical approach can be compared with the spectral element method in 3D seismic applications.
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Calculator Function Approximation.
ERIC Educational Resources Information Center
Schelin, Charles W.
1983-01-01
The general algorithm used in most hand calculators to approximate elementary functions is discussed. Comments on tabular function values and on computer function evaluation are given first; then the CORDIC (Coordinate Rotation Digital Computer) scheme is described. (MNS)
NASA Technical Reports Server (NTRS)
Dutta, Soumitra
1988-01-01
Much of human reasoning is approximate in nature. Formal models of reasoning traditionally try to be precise and reject the fuzziness of concepts in natural use and replace them with non-fuzzy scientific explicata by a process of precisiation. As an alternate to this approach, it has been suggested that rather than regard human reasoning processes as themselves approximating to some more refined and exact logical process that can be carried out with mathematical precision, the essence and power of human reasoning is in its capability to grasp and use inexact concepts directly. This view is supported by the widespread fuzziness of simple everyday terms (e.g., near tall) and the complexity of ordinary tasks (e.g., cleaning a room). Spatial reasoning is an area where humans consistently reason approximately with demonstrably good results. Consider the case of crossing a traffic intersection. We have only an approximate idea of the locations and speeds of various obstacles (e.g., persons and vehicles), but we nevertheless manage to cross such traffic intersections without any harm. The details of our mental processes which enable us to carry out such intricate tasks in such apparently simple manner are not well understood. However, it is that we try to incorporate such approximate reasoning techniques in our computer systems. Approximate spatial reasoning is very important for intelligent mobile agents (e.g., robots), specially for those operating in uncertain or unknown or dynamic domains.
Approximate kernel competitive learning.
Wu, Jian-Sheng; Zheng, Wei-Shi; Lai, Jian-Huang
2015-03-01
Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches. PMID:25528318
Parallel CFD Algorithms for Aerodynamical Flow Solvers on Unstructured Meshes. Parts 1 and 2
NASA Technical Reports Server (NTRS)
Barth, Timothy J.; Kwak, Dochan (Technical Monitor)
1995-01-01
The Advisory Group for Aerospace Research and Development (AGARD) has requested my participation in the lecture series entitled Parallel Computing in Computational Fluid Dynamics to be held at the von Karman Institute in Brussels, Belgium on May 15-19, 1995. In addition, a request has been made from the US Coordinator for AGARD at the Pentagon for NASA Ames to hold a repetition of the lecture series on October 16-20, 1995. I have been asked to be a local coordinator for the Ames event. All AGARD lecture series events have attendance limited to NATO allied countries. A brief of the lecture series is provided in the attached enclosure. Specifically, I have been asked to give two lectures of approximately 75 minutes each on the subject of parallel solution techniques for the fluid flow equations on unstructured meshes. The title of my lectures is "Parallel CFD Algorithms for Aerodynamical Flow Solvers on Unstructured Meshes" (Parts I-II). The contents of these lectures will be largely review in nature and will draw upon previously published work in this area. Topics of my lectures will include: (1) Mesh partitioning algorithms. Recursive techniques based on coordinate bisection, Cuthill-McKee level structures, and spectral bisection. (2) Newton's method for large scale CFD problems. Size and complexity estimates for Newton's method, modifications for insuring global convergence. (3) Techniques for constructing the Jacobian matrix. Analytic and numerical techniques for Jacobian matrix-vector products, constructing the transposed matrix, extensions to optimization and homotopy theories. (4) Iterative solution algorithms. Practical experience with GIVIRES and BICG-STAB matrix solvers. (5) Parallel matrix preconditioning. Incomplete Lower-Upper (ILU) factorization, domain-decomposed ILU, approximate Schur complement strategies.
LDRD report : parallel repartitioning for optimal solver performance.
Heaphy, Robert; Devine, Karen Dragon; Preis, Robert; Hendrickson, Bruce Alan; Heroux, Michael Allen; Boman, Erik Gunnar
2004-02-01
We have developed infrastructure, utilities and partitioning methods to improve data partitioning in linear solvers and preconditioners. Our efforts included incorporation of data repartitioning capabilities from the Zoltan toolkit into the Trilinos solver framework, (allowing dynamic repartitioning of Trilinos matrices); implementation of efficient distributed data directories and unstructured communication utilities in Zoltan and Trilinos; development of a new multi-constraint geometric partitioning algorithm (which can generate one decomposition that is good with respect to multiple criteria); and research into hypergraph partitioning algorithms (which provide up to 56% reduction of communication volume compared to graph partitioning for a number of emerging applications). This report includes descriptions of the infrastructure and algorithms developed, along with results demonstrating the effectiveness of our approaches.
Verification and Validation Studies for the LAVA CFD Solver
NASA Technical Reports Server (NTRS)
Moini-Yekta, Shayan; Barad, Michael F; Sozer, Emre; Brehm, Christoph; Housman, Jeffrey A.; Kiris, Cetin C.
2013-01-01
The verification and validation of the Launch Ascent and Vehicle Aerodynamics (LAVA) computational fluid dynamics (CFD) solver is presented. A modern strategy for verification and validation is described incorporating verification tests, validation benchmarks, continuous integration and version control methods for automated testing in a collaborative development environment. The purpose of the approach is to integrate the verification and validation process into the development of the solver and improve productivity. This paper uses the Method of Manufactured Solutions (MMS) for the verification of 2D Euler equations, 3D Navier-Stokes equations as well as turbulence models. A method for systematic refinement of unstructured grids is also presented. Verification using inviscid vortex propagation and flow over a flat plate is highlighted. Simulation results using laminar and turbulent flow past a NACA 0012 airfoil and ONERA M6 wing are validated against experimental and numerical data.
A functional implementation of the Jacobi eigen-solver
Boehm, A.P.W.; Hiromoto, R.E.
1993-02-01
In this paper, we describe the systematic development of two implementations of the Jacobi eigen-solver and give performance results for the MIT/Motorola Monsoon dataflow machine. Our study is carried out using MINT, the MIT Monsoon simulator. The design of these implementations follows from the mathematics of the Jacobi method, and not from a translation of an existing sequential code. The functional semantics with respect to array updates, which cause excessive array copying, has lead us to a new implementation of a parallel ``group-rotations`` algorithm first described by Sameh. Our version of this algorithm requires 0(n{sup 3}) operations, whereas Sameh`s original version requires 0(n{sup 4}) operations. The implementations are programmed in the language Id, and although Id has non-functional features, we have restricted the development of our eigen-solvers to the functional sub-set of the language.
A functional implementation of the Jacobi eigen-solver
Boehm, A.P.W. . Dept. of Computer Science); Hiromoto, R.E. )
1993-01-01
In this paper, we describe the systematic development of two implementations of the Jacobi eigen-solver and give performance results for the MIT/Motorola Monsoon dataflow machine. Our study is carried out using MINT, the MIT Monsoon simulator. The design of these implementations follows from the mathematics of the Jacobi method, and not from a translation of an existing sequential code. The functional semantics with respect to array updates, which cause excessive array copying, has lead us to a new implementation of a parallel group-rotations'' algorithm first described by Sameh. Our version of this algorithm requires 0(n[sup 3]) operations, whereas Sameh's original version requires 0(n[sup 4]) operations. The implementations are programmed in the language Id, and although Id has non-functional features, we have restricted the development of our eigen-solvers to the functional sub-set of the language.
An Upwind Solver for the National Combustion Code
NASA Technical Reports Server (NTRS)
Sockol, Peter M.
2011-01-01
An upwind solver is presented for the unstructured grid National Combustion Code (NCC). The compressible Navier-Stokes equations with time-derivative preconditioning and preconditioned flux-difference splitting of the inviscid terms are used. First order derivatives are computed on cell faces and used to evaluate the shear stresses and heat fluxes. A new flux limiter uses these same first order derivatives in the evaluation of left and right states used in the flux-difference splitting. The k-epsilon turbulence equations are solved with the same second-order method. The new solver has been installed in a recent version of NCC and the resulting code has been tested successfully in 2D on two laminar cases with known solutions and one turbulent case with experimental data.
CASTRO: A NEW COMPRESSIBLE ASTROPHYSICAL SOLVER. II. GRAY RADIATION HYDRODYNAMICS
Zhang, W.; Almgren, A.; Bell, J.; Howell, L.; Burrows, A.
2011-10-01
We describe the development of a flux-limited gray radiation solver for the compressible astrophysics code, CASTRO. CASTRO uses an Eulerian grid with block-structured adaptive mesh refinement based on a nested hierarchy of logically rectangular variable-sized grids with simultaneous refinement in both space and time. The gray radiation solver is based on a mixed-frame formulation of radiation hydrodynamics. In our approach, the system is split into two parts, one part that couples the radiation and fluid in a hyperbolic subsystem, and another parabolic part that evolves radiation diffusion and source-sink terms. The hyperbolic subsystem is solved explicitly with a high-order Godunov scheme, whereas the parabolic part is solved implicitly with a first-order backward Euler method.
Parallel Auxiliary Space AMG Solver for $H(div)$ Problems
Kolev, Tzanio V.; Vassilevski, Panayot S.
2012-12-18
We present a family of scalable preconditioners for matrices arising in the discretization of $H(div)$ problems using the lowest order Raviart--Thomas finite elements. Our approach belongs to the class of “auxiliary space''--based methods and requires only the finite element stiffness matrix plus some minimal additional discretization information about the topology and orientation of mesh entities. Also, we provide a detailed algebraic description of the theory, parallel implementation, and different variants of this parallel auxiliary space divergence solver (ADS) and discuss its relations to the Hiptmair--Xu (HX) auxiliary space decomposition of $H(div)$ [SIAM J. Numer. Anal., 45 (2007), pp. 2483--2509] and to the auxiliary space Maxwell solver AMS [J. Comput. Math., 27 (2009), pp. 604--623]. Finally, an extensive set of numerical experiments demonstrates the robustness and scalability of our implementation on large-scale $H(div)$ problems with large jumps in the material coefficients.
Scalable Out-of-Core Solvers on Xeon Phi Cluster
D'Azevedo, Ed F; Chan, Ki Shing; Su, Shiquan; Wong, Kwai
2015-01-01
This paper documents the implementation of a distributive out-of-core (OOC) solver for performing LU and Cholesky factorizations of a large dense matrix on clusters of many-core programmable co-processors. The out-of- core algorithm combines both the left-looking and right-looking schemes aimed to minimize the movement of data between the CPU host and the co-processor, optimizing data locality as well as computing throughput. The OOC solver is built to align with the format of the ScaLAPACK software library, making it readily portable to any existing codes using ScaLAPACK. A runtime analysis conducted on Beacon (an Intel Xeon plus Intel Xeon Phi cluster which composed of 48 nodes of multi-core CPU and MIC) at the Na- tional Institute for Computational Sciences is presented. Comparison of the performance on the Intel Xeon Phi and GPU clusters are also provided.
On improving linear solver performance: a block variant of GMRES
Baker, A H; Dennis, J M; Jessup, E R
2004-05-10
The increasing gap between processor performance and memory access time warrants the re-examination of data movement in iterative linear solver algorithms. For this reason, we explore and establish the feasibility of modifying a standard iterative linear solver algorithm in a manner that reduces the movement of data through memory. In particular, we present an alternative to the restarted GMRES algorithm for solving a single right-hand side linear system Ax = b based on solving the block linear system AX = B. Algorithm performance, i.e. time to solution, is improved by using the matrix A in operations on groups of vectors. Experimental results demonstrate the importance of implementation choices on data movement as well as the effectiveness of the new method on a variety of problems from different application areas.
A Nonlinear Modal Aeroelastic Solver for FUN3D
NASA Technical Reports Server (NTRS)
Goldman, Benjamin D.; Bartels, Robert E.; Biedron, Robert T.; Scott, Robert C.
2016-01-01
A nonlinear structural solver has been implemented internally within the NASA FUN3D computational fluid dynamics code, allowing for some new aeroelastic capabilities. Using a modal representation of the structure, a set of differential or differential-algebraic equations are derived for general thin structures with geometric nonlinearities. ODEPACK and LAPACK routines are linked with FUN3D, and the nonlinear equations are solved at each CFD time step. The existing predictor-corrector method is retained, whereby the structural solution is updated after mesh deformation. The nonlinear solver is validated using a test case for a flexible aeroshell at transonic, supersonic, and hypersonic flow conditions. Agreement with linear theory is seen for the static aeroelastic solutions at relatively low dynamic pressures, but structural nonlinearities limit deformation amplitudes at high dynamic pressures. No flutter was found at any of the tested trajectory points, though LCO may be possible in the transonic regime.
Brittle Solvers: Lessons and insights into effective solvers for visco-plasticity in geodynamics
NASA Astrophysics Data System (ADS)
Spiegelman, M. W.; May, D.; Wilson, C. R.
2014-12-01
Plasticity/Fracture and rock failure are essential ingredients in geodynamic models as terrestrial rocks do not possess an infinite yield strength. Numerous physical mechanisms have been proposed to limit the strength of rocks, including low temperature plasticity and brittle fracture. While ductile and creep behavior of rocks at depth is largely accepted, the constitutive relations associated with brittle failure, or shear localisation, are more controversial. Nevertheless, there are really only a few macroscopic constitutive laws for visco-plasticity that are regularly used in geodynamics models. Independent of derivation, all of these can be cast as simple effective viscosities which act as stress limiters with different choices for yield surfaces; the most common being a von Mises (constant yield stress) or Drucker-Prager (pressure dependent yield-stress) criterion. The choice of plasticity model, however, can have significant consequences for the degree of non-linearity in a problem and the choice and efficiency of non-linear solvers. Here we describe a series of simplified 2 and 3-D model problems to elucidate several issues associated with obtaining accurate description and solution of visco-plastic problems. We demonstrate that1) Picard/Successive substitution schemes for solution of the non-linear problems can often stall at large values of the non-linear residual, thus producing spurious solutions2) Combined Picard/Newton schemes can be effective for a range of plasticity models, however, they can produce serious convergence problems for strongly pressure dependent plasticity models such as Drucker-Prager.3) Nevertheless, full Drucker-Prager may not be the plasticity model of choice for strong materials as the dynamic pressures produced in these layers can develop pathological behavior with Drucker-Prager, leading to stress strengthening rather than stress weakening behavior.4) In general, for any incompressible Stoke's problem, it is highly advisable to
A Discontinuous Galerkin Chimera Overset Solver
NASA Astrophysics Data System (ADS)
Galbraith, Marshall Christopher
geometries. The large stencil associated with these high-order schemes can significantly complicate the inter-grid communication and hole cutting processes. Unlike these high-order schemes, the DG method always retains a small stencil regardless of the order of approximation. The small stencil of the DG method simplifies the inter-grid communication scheme as well as hole cutting procedures. The DG-Chimera scheme does not require a separate interpolation method because the DG scheme represents the solution as cell local polynomials. Hence, the DG-Chimera method does not require fringe points to maintain the interior stencil across inter-grid boundaries. Thus, inter-grid communication can be established as long as the receiving boundary is enclosed by or abuts the donor mesh. This makes the inter-grid communication procedure applicable to both Chimera and zonal meshes. The small stencil implies hole cutting can be performed without regard to maintaining a minimum stencil and thereby greatly simplifies hole cutting. Hence, the DG-Chimera scheme has the potential to greatly simplify the overset grid generation process. Furthermore, the DG-Chimera scheme is capable of using curved cells to represent geometric features. The curved cells resolve issues associated with linear Chimera viscous meshes used for finite volume and finite difference schemes. Finally, the convergence rate of the Chimera schemes is dramatically increased by linearization of the inter-grid communication.
Menu-Driven Solver Of Linear-Programming Problems
NASA Technical Reports Server (NTRS)
Viterna, L. A.; Ferencz, D.
1992-01-01
Program assists inexperienced user in formulating linear-programming problems. A Linear Program Solver (ALPS) computer program is full-featured LP analysis program. Solves plain linear-programming problems as well as more-complicated mixed-integer and pure-integer programs. Also contains efficient technique for solution of purely binary linear-programming problems. Written entirely in IBM's APL2/PC software, Version 1.01. Packed program contains licensed material, property of IBM (copyright 1988, all rights reserved).
An automatic ordering method for incomplete factorization iterative solvers
Forsyth, P.A.; Tang, W.P. . Dept. of Computer Science); D'Azevedo, E.F.D. )
1991-01-01
The minimum discarded fill (MDF) ordering strategy for incomplete factorization iterative solvers is developed. MDF ordering is demonstrated for several model son-symmetric problems, as well as a water-flooding simulation which uses an unstructured grid. The model problems show a three to five fold decrease in the number of iterations compared to natural orderings. Greater than twofold improvement was observed for the waterflooding simulation. 26 refs., 7 figs., 3 tabs.
A chemical reaction network solver for the astrophysics code NIRVANA
NASA Astrophysics Data System (ADS)
Ziegler, U.
2016-02-01
Context. Chemistry often plays an important role in astrophysical gases. It regulates thermal properties by changing species abundances and via ionization processes. This way, time-dependent cooling mechanisms and other chemistry-related energy sources can have a profound influence on the dynamical evolution of an astrophysical system. Modeling those effects with the underlying chemical kinetics in realistic magneto-gasdynamical simulations provide the basis for a better link to observations. Aims: The present work describes the implementation of a chemical reaction network solver into the magneto-gasdynamical code NIRVANA. For this purpose a multispecies structure is installed, and a new module for evolving the rate equations of chemical kinetics is developed and coupled to the dynamical part of the code. A small chemical network for a hydrogen-helium plasma was constructed including associated thermal processes which is used in test problems. Methods: Evolving a chemical network within time-dependent simulations requires the additional solution of a set of coupled advection-reaction equations for species and gas temperature. Second-order Strang-splitting is used to separate the advection part from the reaction part. The ordinary differential equation (ODE) system representing the reaction part is solved with a fourth-order generalized Runge-Kutta method applicable for stiff systems inherent to astrochemistry. Results: A series of tests was performed in order to check the correctness of numerical and technical implementation. Tests include well-known stiff ODE problems from the mathematical literature in order to confirm accuracy properties of the solver used as well as problems combining gasdynamics and chemistry. Overall, very satisfactory results are achieved. Conclusions: The NIRVANA code is now ready to handle astrochemical processes in time-dependent simulations. An easy-to-use interface allows implementation of complex networks including thermal processes
Scaling Algebraic Multigrid Solvers: On the Road to Exascale
Baker, A H; Falgout, R D; Gamblin, T; Kolev, T; Schulz, M; Yang, U M
2010-12-12
Algebraic Multigrid (AMG) solvers are an essential component of many large-scale scientific simulation codes. Their continued numerical scalability and efficient implementation is critical for preparing these codes for exascale. Our experiences on modern multi-core machines show that significant challenges must be addressed for AMG to perform well on such machines. We discuss our experiences and describe the techniques we have used to overcome scalability challenges for AMG on hybrid architectures in preparation for exascale.
Boltzmann Solver with Adaptive Mesh in Velocity Space
Kolobov, Vladimir I.; Arslanbekov, Robert R.; Frolova, Anna A.
2011-05-20
We describe the implementation of direct Boltzmann solver with Adaptive Mesh in Velocity Space (AMVS) using quad/octree data structure. The benefits of the AMVS technique are demonstrated for the charged particle transport in weakly ionized plasmas where the collision integral is linear. We also describe the implementation of AMVS for the nonlinear Boltzmann collision integral. Test computations demonstrate both advantages and deficiencies of the current method for calculations of narrow-kernel distributions.
A contribution to the great Riemann solver debate
NASA Technical Reports Server (NTRS)
Quirk, James J.
1992-01-01
The aims of this paper are threefold: to increase the level of awareness within the shock capturing community to the fact that many Godunov-type methods contain subtle flaws that can cause spurious solutions to be computed; to identify one mechanism that might thwart attempts to produce very high resolution simulations; and to proffer a simple strategy for overcoming the specific failings of individual Riemann solvers.
Transonic Drag Prediction Using an Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Levy, David W.
2001-01-01
This paper summarizes the results obtained with the NSU-3D unstructured multigrid solver for the AIAA Drag Prediction Workshop held in Anaheim, CA, June 2001. The test case for the workshop consists of a wing-body configuration at transonic flow conditions. Flow analyses for a complete test matrix of lift coefficient values and Mach numbers at a constant Reynolds number are performed, thus producing a set of drag polars and drag rise curves which are compared with experimental data. Results were obtained independently by both authors using an identical baseline grid and different refined grids. Most cases were run in parallel on commodity cluster-type machines while the largest cases were run on an SGI Origin machine using 128 processors. The objective of this paper is to study the accuracy of the subject unstructured grid solver for predicting drag in the transonic cruise regime, to assess the efficiency of the method in terms of convergence, cpu time, and memory, and to determine the effects of grid resolution on this predictive ability and its computational efficiency. A good predictive ability is demonstrated over a wide range of conditions, although accuracy was found to degrade for cases at higher Mach numbers and lift values where increasing amounts of flow separation occur. The ability to rapidly compute large numbers of cases at varying flow conditions using an unstructured solver on inexpensive clusters of commodity computers is also demonstrated.
A Survey of Solver-Related Geometry and Meshing Issues
NASA Technical Reports Server (NTRS)
Masters, James; Daniel, Derick; Gudenkauf, Jared; Hine, David; Sideroff, Chris
2016-01-01
There is a concern in the computational fluid dynamics community that mesh generation is a significant bottleneck in the CFD workflow. This is one of several papers that will help set the stage for a moderated panel discussion addressing this issue. Although certain general "rules of thumb" and a priori mesh metrics can be used to ensure that some base level of mesh quality is achieved, inadequate consideration is often given to the type of solver or particular flow regime on which the mesh will be utilized. This paper explores how an analyst may want to think differently about a mesh based on considerations such as if a flow is compressible vs. incompressible or hypersonic vs. subsonic or if the solver is node-centered vs. cell-centered. This paper is a high-level investigation intended to provide general insight into how considering the nature of the solver or flow when performing mesh generation has the potential to increase the accuracy and/or robustness of the solution and drive the mesh generation process to a state where it is no longer a hindrance to the analysis process.
A Godunov-Type Solver for Gravitational Flows: Towards a Time-Implicit Version in the HERACLES Code
NASA Astrophysics Data System (ADS)
Vides, J.; Van Criekingen, S.; Audit, E.; Szydlarski, M.
2014-09-01
We study the Euler equations with gravitational source terms derived from a potential which satisfies Poisson's equation for gravity. An adequate treatment of the source terms is achieved by introducing their discretization into an approximate Riemann solver, relying on a relaxation strategy. The associated numerical scheme is then presented and its performance demonstrated. The new method provides a straightforward extension to multidimensions and is applied to different types of problems under gravitational influence, including a one dimensional hydrostatic atmosphere and a three-dimensional Rayleigh-Taylor instability. We show the first results for the implicit version of the scheme, essential for many applications of physical interest and implemented in the code HERACLES.
Fast approximate motif statistics.
Nicodème, P
2001-01-01
We present in this article a fast approximate method for computing the statistics of a number of non-self-overlapping matches of motifs in a random text in the nonuniform Bernoulli model. This method is well suited for protein motifs where the probability of self-overlap of motifs is small. For 96% of the PROSITE motifs, the expectations of occurrences of the motifs in a 7-million-amino-acids random database are computed by the approximate method with less than 1% error when compared with the exact method. Processing of the whole PROSITE takes about 30 seconds with the approximate method. We apply this new method to a comparison of the C. elegans and S. cerevisiae proteomes. PMID:11535175
The Guiding Center Approximation
NASA Astrophysics Data System (ADS)
Pedersen, Thomas Sunn
The guiding center approximation for charged particles in strong magnetic fields is introduced here. This approximation is very useful in situations where the charged particles are very well magnetized, such that the gyration (Larmor) radius is small compared to relevant length scales of the confinement device, and the gyration is fast relative to relevant timescales in an experiment. The basics of motion in a straight, uniform, static magnetic field are reviewed, and are used as a starting point for analyzing more complicated situations where more forces are present, as well as inhomogeneities in the magnetic field -- magnetic curvature as well as gradients in the magnetic field strength. The first and second adiabatic invariant are introduced, and slowly time-varying fields are also covered. As an example of the use of the guiding center approximation, the confinement concept of the cylindrical magnetic mirror is analyzed.
Covariant approximation averaging
NASA Astrophysics Data System (ADS)
Shintani, Eigo; Arthur, Rudy; Blum, Thomas; Izubuchi, Taku; Jung, Chulwoo; Lehner, Christoph
2015-06-01
We present a new class of statistical error reduction techniques for Monte Carlo simulations. Using covariant symmetries, we show that correlation functions can be constructed from inexpensive approximations without introducing any systematic bias in the final result. We introduce a new class of covariant approximation averaging techniques, known as all-mode averaging (AMA), in which the approximation takes account of contributions of all eigenmodes through the inverse of the Dirac operator computed from the conjugate gradient method with a relaxed stopping condition. In this paper we compare the performance and computational cost of our new method with traditional methods using correlation functions and masses of the pion, nucleon, and vector meson in Nf=2 +1 lattice QCD using domain-wall fermions. This comparison indicates that AMA significantly reduces statistical errors in Monte Carlo calculations over conventional methods for the same cost.
Fisher, A. C.; Bailey, D. S.; Kaiser, T. B.; Eder, D. C.; Gunney, B. T. N.; Masters, N. D.; Koniges, A. E.; Anderson, R. W.
2015-02-01
Here, we present a novel method for the solution of the diffusion equation on a composite AMR mesh. This approach is suitable for including diffusion based physics modules to hydrocodes that support ALE and AMR capabilities. To illustrate, we proffer our implementations of diffusion based radiation transport and heat conduction in a hydrocode called ALE-AMR. Numerical experiments conducted with the diffusion solver and associated physics packages yield 2nd order convergence in the L_{2} norm.
Monotone Boolean approximation
Hulme, B.L.
1982-12-01
This report presents a theory of approximation of arbitrary Boolean functions by simpler, monotone functions. Monotone increasing functions can be expressed without the use of complements. Nonconstant monotone increasing functions are important in their own right since they model a special class of systems known as coherent systems. It is shown here that when Boolean expressions for noncoherent systems become too large to treat exactly, then monotone approximations are easily defined. The algorithms proposed here not only provide simpler formulas but also produce best possible upper and lower monotone bounds for any Boolean function. This theory has practical application for the analysis of noncoherent fault trees and event tree sequences.
FIESTA 2: Parallelizeable multiloop numerical calculations
NASA Astrophysics Data System (ADS)
Smirnov, A. V.; Smirnov, V. A.; Tentyukov, M.
2011-03-01
The program FIESTA has been completely rewritten. Now it can be used not only as a tool to evaluate Feynman integrals numerically, but also to expand Feynman integrals automatically in limits of momenta and masses with the use of sector decompositions and Mellin-Barnes representations. Other important improvements to the code are complete parallelization (even to multiple computers), high-precision arithmetics (allowing to calculate integrals which were undoable before), new integrators, Speer sectors as a strategy, the possibility to evaluate more general parametric integrals. Program summaryProgram title:FIESTA 2 Catalogue identifier: AECP_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECP_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPL version 2 No. of lines in distributed program, including test data, etc.: 39 783 No. of bytes in distributed program, including test data, etc.: 6 154 515 Distribution format: tar.gz Programming language: Wolfram Mathematica 6.0 (or higher) and C Computer: From a desktop PC to a supercomputer Operating system: Unix, Linux, Windows, Mac OS X Has the code been vectorised or parallelized?: Yes, the code has been parallelized for use on multi-kernel computers as well as clusters via Mathlink over the TCP/IP protocol. The program can work successfully with a single processor, however, it is ready to work in a parallel environment and the use of multi-kernel processor and multi-processor computers significantly speeds up the calculation; on clusters the calculation speed can be improved even further. RAM: Depends on the complexity of the problem Classification: 4.4, 4.12, 5, 6.5 Catalogue identifier of previous version: AECP_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 735 External routines: QLink [1], Cuba library [2], MPFR [3] Does the new version supersede the previous version?: Yes Nature of problem: The sector decomposition approach to evaluating Feynman integrals falls apart into the sector decomposition itself, where one has to minimize the number of sectors; the pole resolution and epsilon expansion; and the numerical integration of the resulting expression. Solution method: The sector decomposition is based on a new strategy as well as on classical strategies such as Speer sectors. The sector decomposition, pole resolution and epsilon-expansion are performed in Wolfram Mathematica 6.0 or, preferably, 7.0 (enabling parallelization) [4]. The data is stored on hard disk via a special program, QLink [1]. The expression for integration is passed to the C-part of the code, that parses the string and performs the integration by one of the algorithms in the Cuba library package [2]. This part of the evaluation is perfectly parallelized on multi-kernel computers.
A High-Order Accurate Parallel Solver for Maxwell's Equations on Overlapping Grids
Henshaw, W D
2005-09-23
A scheme for the solution of the time dependent Maxwell's equations on composite overlapping grids is described. The method uses high-order accurate approximations in space and time for Maxwell's equations written as a second-order vector wave equation. High-order accurate symmetric difference approximations to the generalized Laplace operator are constructed for curvilinear component grids. The modified equation approach is used to develop high-order accurate approximations that only use three time levels and have the same time-stepping restriction as the second-order scheme. Discrete boundary conditions for perfect electrical conductors and for material interfaces are developed and analyzed. The implementation is optimized for component grids that are Cartesian, resulting in a fast and efficient method. The solver runs on parallel machines with each component grid distributed across one or more processors. Numerical results in two- and three-dimensions are presented for the fourth-order accurate version of the method. These results demonstrate the accuracy and efficiency of the approach.
Multicriteria approximation through decomposition
Burch, C.; Krumke, S.; Marathe, M.; Phillips, C.; Sundberg, E.
1998-06-01
The authors propose a general technique called solution decomposition to devise approximation algorithms with provable performance guarantees. The technique is applicable to a large class of combinatorial optimization problems that can be formulated as integer linear programs. Two key ingredients of their technique involve finding a decomposition of a fractional solution into a convex combination of feasible integral solutions and devising generic approximation algorithms based on calls to such decompositions as oracles. The technique is closely related to randomized rounding. Their method yields as corollaries unified solutions to a number of well studied problems and it provides the first approximation algorithms with provable guarantees for a number of new problems. The particular results obtained in this paper include the following: (1) the authors demonstrate how the technique can be used to provide more understanding of previous results and new algorithms for classical problems such as Multicriteria Spanning Trees, and Suitcase Packing; (2) they also show how the ideas can be extended to apply to multicriteria optimization problems, in which they wish to minimize a certain objective function subject to one or more budget constraints. As corollaries they obtain first non-trivial multicriteria approximation algorithms for problems including the k-Hurdle and the Network Inhibition problems.
Approximating Integrals Using Probability
ERIC Educational Resources Information Center
Maruszewski, Richard F., Jr.; Caudle, Kyle A.
2005-01-01
As part of a discussion on Monte Carlo methods, which outlines how to use probability expectations to approximate the value of a definite integral. The purpose of this paper is to elaborate on this technique and then to show several examples using visual basic as a programming tool. It is an interesting method because it combines two branches of…
Multicriteria approximation through decomposition
Burch, C. |; Krumke, S.; Marathe, M.; Phillips, C.; Sundberg, E. |
1997-12-01
The authors propose a general technique called solution decomposition to devise approximation algorithms with provable performance guarantees. The technique is applicable to a large class of combinatorial optimization problems that can be formulated as integer linear programs. Two key ingredients of the technique involve finding a decomposition of a fractional solution into a convex combination of feasible integral solutions and devising generic approximation algorithms based on calls to such decompositions as oracles. The technique is closely related to randomized rounding. The method yields as corollaries unified solutions to a number of well studied problems and it provides the first approximation algorithms with provable guarantees for a number of new problems. The particular results obtained in this paper include the following: (1) The authors demonstrate how the technique can be used to provide more understanding of previous results and new algorithms for classical problems such as Multicriteria Spanning Trees, and Suitcase Packing. (2) They show how the ideas can be extended to apply to multicriteria optimization problems, in which they wish to minimize a certain objective function subject to one or more budget constraints. As corollaries they obtain first non-trivial multicriteria approximation algorithms for problems including the k-Hurdle and the Network Inhibition problems.
NASA Astrophysics Data System (ADS)
Müller, Lucas O.; Blanco, Pablo J.
2015-11-01
We present a methodology for the high order approximation of hyperbolic conservation laws in networks by using the Dumbser-Enaux-Toro solver and exact solvers for the classical Riemann problem at junctions. The proposed strategy can be applied to any hyperbolic system, conservative or non-conservative, and possibly with flux functions containing discontinuous parameters, as long as an exact or approximate Riemann problem solver is available. The methodology is implemented for a one-dimensional blood flow model that considers discontinuous variations of mechanical and geometrical properties of vessels. The achievement of formal order of accuracy, as well as the robustness of the resulting numerical scheme, is verified through the simulation of both, academic tests and physiological flows.
Optimizing the Zeldovich approximation
NASA Technical Reports Server (NTRS)
Melott, Adrian L.; Pellman, Todd F.; Shandarin, Sergei F.
1994-01-01
We have recently learned that the Zeldovich approximation can be successfully used for a far wider range of gravitational instability scenarios than formerly proposed; we study here how to extend this range. In previous work (Coles, Melott and Shandarin 1993, hereafter CMS) we studied the accuracy of several analytic approximations to gravitational clustering in the mildly nonlinear regime. We found that what we called the 'truncated Zeldovich approximation' (TZA) was better than any other (except in one case the ordinary Zeldovich approximation) over a wide range from linear to mildly nonlinear (sigma approximately 3) regimes. TZA was specified by setting Fourier amplitudes equal to zero for all wavenumbers greater than k(sub nl), where k(sub nl) marks the transition to the nonlinear regime. Here, we study the cross correlation of generalized TZA with a group of n-body simulations for three shapes of window function: sharp k-truncation (as in CMS), a tophat in coordinate space, or a Gaussian. We also study the variation in the crosscorrelation as a function of initial truncation scale within each type. We find that k-truncation, which was so much better than other things tried in CMS, is the worst of these three window shapes. We find that a Gaussian window e(exp(-k(exp 2)/2k(exp 2, sub G))) applied to the initial Fourier amplitudes is the best choice. It produces a greatly improved crosscorrelation in those cases which most needed improvement, e.g. those with more small-scale power in the initial conditions. The optimum choice of kG for the Gaussian window is (a somewhat spectrum-dependent) 1 to 1.5 times k(sub nl). Although all three windows produce similar power spectra and density distribution functions after application of the Zeldovich approximation, the agreement of the phases of the Fourier components with the n-body simulation is better for the Gaussian window. We therefore ascribe the success of the best-choice Gaussian window to its superior treatment
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report
Saad, Yousef
2014-01-16
The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners for solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the
A Fast and Robust Poisson-Boltzmann Solver Based on Adaptive Cartesian Grids.
Boschitsch, Alexander H; Fenley, Marcia O
2011-05-10
An adaptive Cartesian grid (ACG) concept is presented for the fast and robust numerical solution of the 3D Poisson-Boltzmann Equation (PBE) governing the electrostatic interactions of large-scale biomolecules and highly charged multi-biomolecular assemblies such as ribosomes and viruses. The ACG offers numerous advantages over competing grid topologies such as regular 3D lattices and unstructured grids. For very large biological molecules and multi-biomolecule assemblies, the total number of grid-points is several orders of magnitude less than that required in a conventional lattice grid used in the current PBE solvers thus allowing the end user to obtain accurate and stable nonlinear PBE solutions on a desktop computer. Compared to tetrahedral-based unstructured grids, ACG offers a simpler hierarchical grid structure, which is naturally suited to multigrid, relieves indirect addressing requirements and uses fewer neighboring nodes in the finite difference stencils. Construction of the ACG and determination of the dielectric/ionic maps are straightforward, fast and require minimal user intervention. Charge singularities are eliminated by reformulating the problem to produce the reaction field potential in the molecular interior and the total electrostatic potential in the exterior ionic solvent region. This approach minimizes grid-dependency and alleviates the need for fine grid spacing near atomic charge sites. The technical portion of this paper contains three parts. First, the ACG and its construction for general biomolecular geometries are described. Next, a discrete approximation to the PBE upon this mesh is derived. Finally, the overall solution procedure and multigrid implementation are summarized. Results obtained with the ACG-based PBE solver are presented for: (i) a low dielectric spherical cavity, containing interior point charges, embedded in a high dielectric ionic solvent - analytical solutions are available for this case, thus allowing rigorous
Large-scale linear system solver using secondary storage: Self-energy in hybrid nanostructures
NASA Astrophysics Data System (ADS)
Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.
2011-02-01
-energy potential in dielectrically mismatched semiconductor quantum dots. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. The self-energy solver relies on an induced charge computation method. The differential equation is discretized to yield linear systems of equations, which we then solve by calling the HDSS library. Restrictions: Simple precision. For the self-energy solver, axially symmetric systems must be considered. Running time: About 32 minutes to solve a system with approximately 100 000 equations and more than 6000 right-hand side vectors using a four-node commodity cluster with a total of 32 Intel cores.
User documentation for KINSOL, a nonlinear solver for sequential and parallel computers
Taylor, A. G., LLNL
1998-07-01
KINSOL is a general purpose nonlinear system solver callable from either C or Fortran programs It is based on NKSOL [3], but is written in ANSI-standard C rather than Fortran77 Its most notable feature is that it uses Krylov Inexact Newton techniques in the system`s approximate solution, thus sharing significant modules previously written within CASC at LLNL to support CVODE[6, 7]/PVODE[9, 5] It also requires almost no matrix storage for solving the Newton equations as compared to direct methods The name KINSOL is derived from those techniques Krylov Inexact Newton SOLver The package was arranged so that selecting one of two forms of a single module in the compilation process will allow the entire package to be created in either sequential (serial) or parallel form The parallel version of KINSOL uses MPI (Message-Passing Interface) [8] and an appropriately revised version of the vector module NVECTOR, as mentioned above, to achieve parallelism and portability KINSOL in parallel form is intended for the SPMD (Single Program Multiple Data) model with distributed memory, in which all vectors are identically distributed across processors In particular, the vector module NVECTOR is designed to help the user assign a contiguous segment of a given vector to each of the processors for parallel computation Several primitives were added to NVECTOR as originally written for PVODE to implement KINSOL KINSOL has been run on a Cray-T3D, an eight- processor DEC ALPHA and a cluster of workstations It is currently being used in a simulation of tokamak edge plasmas and in groundwater two-phase flow studies at LLNL The remainder of this paper is organized as follows Section 2 sets the mathematical notation and summarizes the basic methods Section 3 summarizes the organization of the KINSOL solver, while Section 4 summarizes its usage Section 5 describes a preconditioner module, Section 6 describes a set of Fortran/C interfaces, Section 7 describes an example problem, and Section 8
Application of Aeroelastic Solvers Based on Navier Stokes Equations
NASA Technical Reports Server (NTRS)
Keith, Theo G., Jr.; Srivastava, Rakesh
2001-01-01
The propulsion element of the NASA Advanced Subsonic Technology (AST) initiative is directed towards increasing the overall efficiency of current aircraft engines. This effort requires an increase in the efficiency of various components, such as fans, compressors, turbines etc. Improvement in engine efficiency can be accomplished through the use of lighter materials, larger diameter fans and/or higher-pressure ratio compressors. However, each of these has the potential to result in aeroelastic problems such as flutter or forced response. To address the aeroelastic problems, the Structural Dynamics Branch of NASA Glenn has been involved in the development of numerical capabilities for analyzing the aeroelastic stability characteristics and forced response of wide chord fans, multi-stage compressors and turbines. In order to design an engine to safely perform a set of desired tasks, accurate information of the stresses on the blade during the entire cycle of blade motion is required. This requirement in turn demands that accurate knowledge of steady and unsteady blade loading is available. To obtain the steady and unsteady aerodynamic forces for the complex flows around the engine components, for the flow regimes encountered by the rotor, an advanced compressible Navier-Stokes solver is required. A finite volume based Navier-Stokes solver has been developed at Mississippi State University (MSU) for solving the flow field around multistage rotors. The focus of the current research effort, under NASA Cooperative Agreement NCC3- 596 was on developing an aeroelastic analysis code (entitled TURBO-AE) based on the Navier-Stokes solver developed by MSU. The TURBO-AE code has been developed for flutter analysis of turbomachine components and delivered to NASA and its industry partners. The code has been verified. validated and is being applied by NASA Glenn and by aircraft engine manufacturers to analyze the aeroelastic stability characteristics of modem fans, compressors
Preconditioned CG-solvers and finite element grids
Bauer, R.; Selberherr, S.
1994-12-31
To extract parasitic capacitances in wiring structures of integrated circuits the authors developed the two- and three-dimensional finite element program SCAP (Smart Capacitance Analysis Program). The program computes the task of the electrostatic field from a solution of Poisson`s equation via finite elements and calculates the energies from which the capacitance matrix is extracted. The unknown potential vector, which has for three-dimensional applications 5000-50000 unknowns, is computed by a ICCG solver. Currently three- and six-node triangular, four- and ten-node tetrahedronal elements are supported.
Reformulation of the Fourier-Bessel steady state mode solver
NASA Astrophysics Data System (ADS)
Gauthier, Robert C.
2016-09-01
The Fourier-Bessel resonator state mode solver is reformulated using Maxwell's field coupled curl equations. The matrix generating expressions are greatly simplified as well as a reduction in the number of pre-computed tables making the technique simpler to implement on a desktop computer. The reformulation maintains the theoretical equivalence of the permittivity and permeability and as such structures containing both electric and magnetic properties can be examined. Computation examples are presented for a surface nanoscale axial photonic resonator and hybrid { ε , μ } quasi-crystal resonator.
Algorithms for parallel flow solvers on message passing architectures
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1995-01-01
The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those
A Simple Quantum Integro-Differential Solver (SQuIDS)
NASA Astrophysics Data System (ADS)
Argüelles Delgado, Carlos A.; Salvado, Jordi; Weaver, Christopher N.
2015-11-01
Simple Quantum Integro-Differential Solver (SQuIDS) is a C++ code designed to solve semi-analytically the evolution of a set of density matrices and scalar functions. This is done efficiently by expressing all operators in an SU(N) basis. SQuIDS provides a base class from which users can derive new classes to include new non-trivial terms from the right hand sides of density matrix equations. The code was designed in the context of solving neutrino oscillation problems, but can be applied to any problem that involves solving the quantum evolution of a collection of particles with Hilbert space of dimension up to six.
Object-Oriented Design for Sparse Direct Solvers
NASA Technical Reports Server (NTRS)
Dobrian, Florin; Kumfert, Gary; Pothen, Alex
1999-01-01
We discuss the object-oriented design of a software package for solving sparse, symmetric systems of equations (positive definite and indefinite) by direct methods. At the highest layers, we decouple data structure classes from algorithmic classes for flexibility. We describe the important structural and algorithmic classes in our design, and discuss the trade-offs we made for high performance. The kernels at the lower layers were optimized by hand. Our results show no performance loss from our object-oriented design, while providing flexibility, case of use, and extensibility over solvers using procedural design.
FDIPS: Finite Difference Iterative Potential-field Solver
NASA Astrophysics Data System (ADS)
Toth, Gabor; van der Holst, Bartholomeus; Huang, Zhenguang
2016-06-01
FDIPS is a finite difference iterative potential-field solver that can generate the 3D potential magnetic field solution based on a magnetogram. It is offered as an alternative to the spherical harmonics approach, as when the number of spherical harmonics is increased, using the raw magnetogram data given on a grid that is uniform in the sine of the latitude coordinate can result in inaccurate and unreliable results, especially in the polar regions close to the Sun. FDIPS is written in Fortran 90 and uses the MPI library for parallel execution.
Performance issues for iterative solvers in device simulation
NASA Technical Reports Server (NTRS)
Fan, Qing; Forsyth, P. A.; Mcmacken, J. R. F.; Tang, Wei-Pai
1994-01-01
Due to memory limitations, iterative methods have become the method of choice for large scale semiconductor device simulation. However, it is well known that these methods still suffer from reliability problems. The linear systems which appear in numerical simulation of semiconductor devices are notoriously ill-conditioned. In order to produce robust algorithms for practical problems, careful attention must be given to many implementation issues. This paper concentrates on strategies for developing robust preconditioners. In addition, effective data structures and convergence check issues are also discussed. These algorithms are compared with a standard direct sparse matrix solver on a variety of problems.
High Energy Boundary Conditions for a Cartesian Mesh Euler Solver
NASA Technical Reports Server (NTRS)
Pandya, Shishir A.; Murman, Scott M.; Aftosmis, Michael J.
2004-01-01
Inlets and exhaust nozzles are often omitted or fared over in aerodynamic simulations of aircraft due to the complexities involving in the modeling of engine details such as complex geometry and flow physics. However, the assumption is often improper as inlet or plume flows have a substantial effect on vehicle aerodynamics. A tool for specifying inlet and exhaust plume conditions through the use of high-energy boundary conditions in an established inviscid flow solver is presented. The effects of the plume on the flow fields near the inlet and plume are discussed.
A fast solver for the Ornstein-Zernike equations
NASA Astrophysics Data System (ADS)
Kelley, C. T.; Pettitt, B. Montgomery
2004-07-01
In this paper, we report on the design and analysis of a multilevel method for the solution of the Ornstein-Zernike Equations and related systems of integro-algebraic equations. Our approach is based on an extension of the Atkinson-Brakhage method, with Newton-GMRES used as the coarse mesh solver. We report on several numerical experiments to illustrate the effectiveness of the method. The problems chosen are related to simple short ranged fluids with continuous potentials. Speedups over traditional methods for a given accuracy are reported. The new multilevel method is roughly six times faster than Newton-GMRES and 40 times faster than Picard.