Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Halem, Milton (Technical Monitor)
2000-01-01
We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.
NASA Technical Reports Server (NTRS)
Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)
1990-01-01
Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
An Artificial Neural Networks Method for Solving Partial Differential Equations
NASA Astrophysics Data System (ADS)
Alharbi, Abir
2010-09-01
While there already exists many analytical and numerical techniques for solving PDEs, this paper introduces an approach using artificial neural networks. The approach consists of a technique developed by combining the standard numerical method, finite-difference, with the Hopfield neural network. The method is denoted Hopfield-finite-difference (HFD). The architecture of the nets, energy function, updating equations, and algorithms are developed for the method. The HFD method has been used successfully to approximate the solution of classical PDEs, such as the Wave, Heat, Poisson and the Diffusion equations, and on a system of PDEs. The software Matlab is used to obtain the results in both tabular and graphical form. The results are similar in terms of accuracy to those obtained by standard numerical methods. In terms of speed, the parallel nature of the Hopfield nets methods makes them easier to implement on fast parallel computers while some numerical methods need extra effort for parallelization.
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
NASA Astrophysics Data System (ADS)
Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji
2013-03-01
This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
A parallel orbital-updating based plane-wave basis method for electronic structure calculations
NASA Astrophysics Data System (ADS)
Pan, Yan; Dai, Xiaoying; de Gironcoli, Stefano; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui
2017-11-01
Motivated by the recently proposed parallel orbital-updating approach in real space method [1], we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr
2014-12-15
In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less
NASA Technical Reports Server (NTRS)
Sharma, Naveen
1992-01-01
In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
NASA Astrophysics Data System (ADS)
Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
NASA Astrophysics Data System (ADS)
Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu
2015-07-01
The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
A 3D staggered-grid finite difference scheme for poroelastic wave equation
NASA Astrophysics Data System (ADS)
Zhang, Yijie; Gao, Jinghuai
2014-10-01
Three dimensional numerical modeling has been a viable tool for understanding wave propagation in real media. The poroelastic media can better describe the phenomena of hydrocarbon reservoirs than acoustic and elastic media. However, the numerical modeling in 3D poroelastic media demands significantly more computational capacity, including both computational time and memory. In this paper, we present a 3D poroelastic staggered-grid finite difference (SFD) scheme. During the procedure, parallel computing is implemented to reduce the computational time. Parallelization is based on domain decomposition, and communication between processors is performed using message passing interface (MPI). Parallel analysis shows that the parallelized SFD scheme significantly improves the simulation efficiency and 3D decomposition in domain is the most efficient. We also analyze the numerical dispersion and stability condition of the 3D poroelastic SFD method. Numerical results show that the 3D numerical simulation can provide a real description of wave propagation.
Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation
NASA Astrophysics Data System (ADS)
Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab
2015-05-01
3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.
NASA Technical Reports Server (NTRS)
Wang, P.; Li, P.
1998-01-01
A high-resolution numerical study on parallel systems is reported on three-dimensional, time-dependent, thermal convective flows. A parallel implentation on the finite volume method with a multigrid scheme is discussed, and a parallel visualization systemm is developed on distributed systems for visualizing the flow.
Parallelized Stochastic Cutoff Method for Long-Range Interacting Systems
NASA Astrophysics Data System (ADS)
Endo, Eishin; Toga, Yuta; Sasaki, Munetaka
2015-07-01
We present a method of parallelizing the stochastic cutoff (SCO) method, which is a Monte-Carlo method for long-range interacting systems. After interactions are eliminated by the SCO method, we subdivide a lattice into noninteracting interpenetrating sublattices. This subdivision enables us to parallelize the Monte-Carlo calculation in the SCO method. Such subdivision is found by numerically solving the vertex coloring of a graph created by the SCO method. We use an algorithm proposed by Kuhn and Wattenhofer to solve the vertex coloring by parallel computation. This method was applied to a two-dimensional magnetic dipolar system on an L × L square lattice to examine its parallelization efficiency. The result showed that, in the case of L = 2304, the speed of computation increased about 102 times by parallel computation with 288 processors.
Summary of research in applied mathematics, numerical analysis, and computer sciences
NASA Technical Reports Server (NTRS)
1986-01-01
The major categories of current ICASE research programs addressed include: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effective numerical methods; computational problems in engineering and physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and computer systems and software, especially vector and parallel computers.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
An asymptotic induced numerical method for the convection-diffusion-reaction equation
NASA Technical Reports Server (NTRS)
Scroggs, Jeffrey S.; Sorensen, Danny C.
1988-01-01
A parallel algorithm for the efficient solution of a time dependent reaction convection diffusion equation with small parameter on the diffusion term is presented. The method is based on a domain decomposition that is dictated by singular perturbation analysis. The analysis is used to determine regions where certain reduced equations may be solved in place of the full equation. Parallelism is evident at two levels. Domain decomposition provides parallelism at the highest level, and within each domain there is ample opportunity to exploit parallelism. Run time results demonstrate the viability of the method.
Numerical techniques in radiative heat transfer for general, scattering, plane-parallel media
NASA Technical Reports Server (NTRS)
Sharma, A.; Cogley, A. C.
1982-01-01
The study of radiative heat transfer with scattering usually leads to the solution of singular Fredholm integral equations. The present paper presents an accurate and efficient numerical method to solve certain integral equations that govern radiative equilibrium problems in plane-parallel geometry for both grey and nongrey, anisotropically scattering media. In particular, the nongrey problem is represented by a spectral integral of a system of nonlinear integral equations in space, which has not been solved previously. The numerical technique is constructed to handle this unique nongrey governing equation as well as the difficulties caused by singular kernels. Example problems are solved and the method's accuracy and computational speed are analyzed.
NASA Astrophysics Data System (ADS)
Kwon, Deuk-Chul; Shin, Sung-Sik; Yu, Dong-Hun
2017-10-01
In order to reduce the computing time in simulation of radio frequency (rf) plasma sources, various numerical schemes were developed. It is well known that the upwind, exponential, and power-law schemes can efficiently overcome the limitation on the grid size for fluid transport simulations of high density plasma discharges. Also, the semi-implicit method is a well-known numerical scheme to overcome on the simulation time step. However, despite remarkable advances in numerical techniques and computing power over the last few decades, efficient multi-dimensional modeling of low temperature plasma discharges has remained a considerable challenge. In particular, there was a difficulty on parallelization in time for the time periodic steady state problems such as capacitively coupled plasma discharges and rf sheath dynamics because values of plasma parameters in previous time step are used to calculate new values each time step. Therefore, we present a parallelization method for the time periodic steady state problems by using period-slices. In order to evaluate the efficiency of the developed method, one-dimensional fluid simulations are conducted for describing rf sheath dynamics. The result shows that speedup can be achieved by using a multithreading method.
Parallel pivoting combined with parallel reduction
NASA Technical Reports Server (NTRS)
Alaghband, Gita
1987-01-01
Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
A method for data handling numerical results in parallel OpenFOAM simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anton, Alin; Muntean, Sebastian
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications
NASA Technical Reports Server (NTRS)
Povitsky, Alex; Morris, Philip J.
1999-01-01
In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
Numerical computation of solar neutrino flux attenuated by the MSW mechanism
NASA Astrophysics Data System (ADS)
Kim, Jai Sam; Chae, Yoon Sang; Kim, Jung Dae
1999-07-01
We compute the survival probability of an electron neutrino in its flight through the solar core experiencing the Mikheyev-Smirnov-Wolfenstein effect with all three neutrino species considered. We adopted a hybrid method that uses an accurate approximation formula in the non-resonance region and numerical integration in the non-adiabatic resonance region. The key of our algorithm is to use the importance sampling method for sampling the neutrino creation energy and position and to find the optimum radii to start and stop numerical integration. We further developed a parallel algorithm for a message passing parallel computer. By using an idea of job token, we have developed a dynamical load balancing mechanism which is effective under any irregular load distributions
A parallel time integrator for noisy nonlinear oscillatory systems
NASA Astrophysics Data System (ADS)
Subber, Waad; Sarkar, Abhijit
2018-06-01
In this paper, we adapt a parallel time integration scheme to track the trajectories of noisy non-linear dynamical systems. Specifically, we formulate a parallel algorithm to generate the sample path of nonlinear oscillator defined by stochastic differential equations (SDEs) using the so-called parareal method for ordinary differential equations (ODEs). The presence of Wiener process in SDEs causes difficulties in the direct application of any numerical integration techniques of ODEs including the parareal algorithm. The parallel implementation of the algorithm involves two SDEs solvers, namely a fine-level scheme to integrate the system in parallel and a coarse-level scheme to generate and correct the required initial conditions to start the fine-level integrators. For the numerical illustration, a randomly excited Duffing oscillator is investigated in order to study the performance of the stochastic parallel algorithm with respect to a range of system parameters. The distributed implementation of the algorithm exploits Massage Passing Interface (MPI).
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG
NASA Astrophysics Data System (ADS)
Griffiths, M. K.; Fedun, V.; Erdélyi, R.
2015-03-01
Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
RT DDA: A hybrid method for predicting the scattering properties by densely packed media
NASA Astrophysics Data System (ADS)
Ramezan Pour, B.; Mackowski, D.
2017-12-01
The most accurate approaches to predicting the scattering properties of particulate media are based on exact solutions of the Maxwell's equations (MEs), such as the T-matrix and discrete dipole methods. Applying these techniques for optically thick targets is challenging problem due to the large-scale computations and are usually substituted by phenomenological radiative transfer (RT) methods. On the other hand, the RT technique is of questionable validity in media with large particle packing densities. In recent works, we used numerically exact ME solvers to examine the effects of particle concentration on the polarized reflection properties of plane parallel random media. The simulations were performed for plane parallel layers of wavelength-sized spherical particles, and results were compared with RT predictions. We have shown that RTE results monotonically converge to the exact solution as the particle volume fraction becomes smaller and one can observe a nearly perfect fit for packing densities of 2%-5%. This study describes the hybrid technique composed of exact and numerical scalar RT methods. The exact methodology in this work is the plane parallel discrete dipole approximation whereas the numerical method is based on the adding and doubling method. This approach not only decreases the computational time owing to the RT method but also includes the interference and multiple scattering effects, so it may be applicable to large particle density conditions.
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O. (Editor); Housner, Jerrold M. (Editor)
1993-01-01
Computing speed is leaping forward by several orders of magnitude each decade. Engineers and scientists gathered at a NASA Langley symposium to discuss these exciting trends as they apply to parallel computational methods for large-scale structural analysis and design. Among the topics discussed were: large-scale static analysis; dynamic, transient, and thermal analysis; domain decomposition (substructuring); and nonlinear and numerical methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.
2014-09-01
We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less
Parallel implementation of geometrical shock dynamics for two dimensional converging shock waves
NASA Astrophysics Data System (ADS)
Qiu, Shi; Liu, Kuang; Eliasson, Veronica
2016-10-01
Geometrical shock dynamics (GSD) theory is an appealing method to predict the shock motion in the sense that it is more computationally efficient than solving the traditional Euler equations, especially for converging shock waves. However, to solve and optimize large scale configurations, the main bottleneck is the computational cost. Among the existing numerical GSD schemes, there is only one that has been implemented on parallel computers, with the purpose to analyze detonation waves. To extend the computational advantage of the GSD theory to more general applications such as converging shock waves, a numerical implementation using a spatial decomposition method has been coupled with a front tracking approach on parallel computers. In addition, an efficient tridiagonal system solver for massively parallel computers has been applied to resolve the most expensive function in this implementation, resulting in an efficiency of 0.93 while using 32 HPCC cores. Moreover, symmetric boundary conditions have been developed to further reduce the computational cost, achieving a speedup of 19.26 for a 12-sided polygonal converging shock.
Numerical simulation of h-adaptive immersed boundary method for freely falling disks
NASA Astrophysics Data System (ADS)
Zhang, Pan; Xia, Zhenhua; Cai, Qingdong
2018-05-01
In this work, a freely falling disk with aspect ratio 1/10 is directly simulated by using an adaptive numerical model implemented on a parallel computation framework JASMIN. The adaptive numerical model is a combination of the h-adaptive mesh refinement technique and the implicit immersed boundary method (IBM). Our numerical results agree well with the experimental results in all of the six degrees of freedom of the disk. Furthermore, very similar vortex structures observed in the experiment were also obtained.
Advancing MODFLOW Applying the Derived Vector Space Method
NASA Astrophysics Data System (ADS)
Herrera, G. S.; Herrera, I.; Lemus-García, M.; Hernandez-Garcia, G. D.
2015-12-01
The most effective domain decomposition methods (DDM) are non-overlapping DDMs. Recently a new approach, the DVS-framework, based on an innovative discretization method that uses a non-overlapping system of nodes (the derived-nodes), was introduced and developed by I. Herrera et al. [1, 2]. Using the DVS-approach a group of four algorithms, referred to as the 'DVS-algorithms', which fulfill the DDM-paradigm (i.e. the solution of global problems is obtained by resolution of local problems exclusively) has been derived. Such procedures are applicable to any boundary-value problem, or system of such equations, for which a standard discretization method is available and then software with a high degree of parallelization can be constructed. In a parallel talk, in this AGU Fall Meeting, Ismael Herrera will introduce the general DVS methodology. The application of the DVS-algorithms has been demonstrated in the solution of several boundary values problems of interest in Geophysics. Numerical examples for a single-equation, for the cases of symmetric, non-symmetric and indefinite problems were demonstrated before [1,2]. For these problems DVS-algorithms exhibited significantly improved numerical performance with respect to standard versions of DDM algorithms. In view of these results our research group is in the process of applying the DVS method to a widely used simulator for the first time, here we present the advances of the application of this method for the parallelization of MODFLOW. Efficiency results for a group of tests will be presented. References [1] I. Herrera, L.M. de la Cruz and A. Rosas-Medina. Non overlapping discretization methods for partial differential equations, Numer Meth Part D E, (2013). [2] Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
NASA Astrophysics Data System (ADS)
Popov, Igor; Sukov, Sergey
2018-02-01
A modification of the adaptive artificial viscosity (AAV) method is considered. This modification is based on one stage time approximation and is adopted to calculation of gasdynamics problems on unstructured grids with an arbitrary type of grid elements. The proposed numerical method has simplified logic, better performance and parallel efficiency compared to the implementation of the original AAV method. Computer experiments evidence the robustness and convergence of the method to difference solution.
A Parallel Stochastic Framework for Reservoir Characterization and History Matching
Thomas, Sunil G.; Klie, Hector M.; Rodriguez, Adolfo A.; ...
2011-01-01
The spatial distribution of parameters that characterize the subsurface is never known to any reasonable level of accuracy required to solve the governing PDEs of multiphase flow or species transport through porous media. This paper presents a numerically cheap, yet efficient, accurate and parallel framework to estimate reservoir parameters, for example, medium permeability, using sensor information from measurements of the solution variables such as phase pressures, phase concentrations, fluxes, and seismic and well log data. Numerical results are presented to demonstrate the method.
Semiannual report, 1 April - 30 September 1991
NASA Technical Reports Server (NTRS)
1991-01-01
The major categories of the current Institute for Computer Applications in Science and Engineering (ICASE) research program are: (1) numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; (2) control and parameter identification problems, with emphasis on effective numerical methods; (3) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and (4) computer systems and software for parallel computers. Research in these areas is discussed.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
A bibliography on parallel and vector numerical algorithms
NASA Technical Reports Server (NTRS)
Ortega, James M.; Voigt, Robert G.; Romine, Charles H.
1988-01-01
This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.
A bibliography on parallel and vector numerical algorithms
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1987-01-01
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also.
A bibliography on parallel and vector numerical algorithms
NASA Technical Reports Server (NTRS)
Ortega, James M.; Voigt, Robert G.; Romine, Charles H.
1990-01-01
This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.
Acoustic simulation in architecture with parallel algorithm
NASA Astrophysics Data System (ADS)
Li, Xiaohong; Zhang, Xinrong; Li, Dan
2004-03-01
In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
NASA Astrophysics Data System (ADS)
Tolba, Khaled Ibrahim; Morgenthal, Guido
2018-01-01
This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
New Parallel Algorithms for Landscape Evolution Model
NASA Astrophysics Data System (ADS)
Jin, Y.; Zhang, H.; Shi, Y.
2017-12-01
Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
A parallel algorithm for the eigenvalues and eigenvectors for a general complex matrix
NASA Technical Reports Server (NTRS)
Shroff, Gautam
1989-01-01
A new parallel Jacobi-like algorithm is developed for computing the eigenvalues of a general complex matrix. Most parallel methods for this parallel typically display only linear convergence. Sequential norm-reducing algorithms also exit and they display quadratic convergence in most cases. The new algorithm is a parallel form of the norm-reducing algorithm due to Eberlein. It is proven that the asymptotic convergence rate of this algorithm is quadratic. Numerical experiments are presented which demonstrate the quadratic convergence of the algorithm and certain situations where the convergence is slow are also identified. The algorithm promises to be very competitive on a variety of parallel architectures.
Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP
NASA Technical Reports Server (NTRS)
Long, Lyle N.; Brentner, Kenneth S.
2000-01-01
This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.
A parallel direct-forcing fictitious domain method for simulating microswimmers
NASA Astrophysics Data System (ADS)
Gao, Tong; Lin, Zhaowu
2017-11-01
We present a 3D parallel direct-forcing fictitious domain method for simulating swimming micro-organisms at small Reynolds numbers. We treat the motile micro-swimmers as spherical rigid particles using the ``Squirmer'' model. The particle dynamics are solved on the moving Larangian meshes that overlay upon a fixed Eulerian mesh for solving the fluid motion, and the momentum exchange between the two phases is resolved by distributing pseudo body-forces over the particle interior regions which constrain the background fictitious fluids to follow the particle movement. While the solid and fluid subproblems are solved separately, no inner-iterations are required to enforce numerical convergence. We demonstrate the accuracy and robustness of the method by comparing our results with the existing analytical and numerical studies for various cases of single particle dynamics and particle-particle interactions. We also perform a series of numerical explorations to obtain statistical and rheological measurements to characterize the dynamics and structures of Squirmer suspensions. NSF DMS 1619960.
Programming Probabilistic Structural Analysis for Parallel Processing Computer
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.
1991-01-01
The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Graphics applications utilizing parallel processing
NASA Technical Reports Server (NTRS)
Rice, John R.
1990-01-01
The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.
Introduction to Numerical Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schoonover, Joseph A.
2016-06-14
These are slides for a lecture for the Parallel Computing Summer Research Internship at the National Security Education Center. This gives an introduction to numerical methods. Repetitive algorithms are used to obtain approximate solutions to mathematical problems, using sorting, searching, root finding, optimization, interpolation, extrapolation, least squares regresion, Eigenvalue problems, ordinary differential equations, and partial differential equations. Many equations are shown. Discretizations allow us to approximate solutions to mathematical models of physical systems using a repetitive algorithm and introduce errors that can lead to numerical instabilities if we are not careful.
Three numerical algorithms were compared to provide a solution of a radiative transfer equation (RTE) for plane albedo (hemispherical reflectance) in semi-infinite one-dimensional plane-parallel layer. Algorithms were based on the invariant imbedding method and two different var...
Parallel Algorithm Solves Coupled Differential Equations
NASA Technical Reports Server (NTRS)
Hayashi, A.
1987-01-01
Numerical methods adapted to concurrent processing. Algorithm solves set of coupled partial differential equations by numerical integration. Adapted to run on hypercube computer, algorithm separates problem into smaller problems solved concurrently. Increase in computing speed with concurrent processing over that achievable with conventional sequential processing appreciable, especially for large problems.
Development and Application of a Parallel LCAO Cluster Method
NASA Astrophysics Data System (ADS)
Patton, David C.
1997-08-01
CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Numerical Modelling of Foundation Slabs with use of Schur Complement Method
NASA Astrophysics Data System (ADS)
Koktan, Jiří; Brožovský, Jiří
2017-10-01
The paper discusses numerical modelling of foundation slabs with use of advanced numerical approaches, which are suitable for parallel processing. The solution is based on the Finite Element Method with the slab-type elements. The subsoil is modelled with use of Winklertype contact model (as an alternative a multi-parameter model can be used). The proposed modelling approach uses the Schur Complement method to speed-up the computations of the problem. The method is based on a special division of the analyzed model to several substructures. It adds some complexity to the numerical procedures, especially when subsoil models are used inside the finite element method solution. In other hand, this method makes possible a fast solution of large models but it introduces further problems to the process. Thus, the main aim of this paper is to verify that such method can be successfully used for this type of problem. The most suitable finite elements will be discussed, there will be also discussion related to finite element mesh and limitations of its construction for such problem. The core approaches of the implementation of the Schur Complement Method for this type of the problem will be also presented. The proposed approach was implemented in the form of a computer program, which will be also briefly introduced. There will be also presented results of example computations, which prove the speed-up of the solution - there will be shown important speed-up of solution even in the case of on-parallel processing and the ability of bypass size limitations of numerical models with use of the discussed approach.
Analysis of Serial and Parallel Algorithms for Use in Molecular Dynamics.. Review and Proposals
NASA Astrophysics Data System (ADS)
Mazzone, A. M.
This work analyzes the stability and accuracy of multistep methods, either for serial or parallel calculations, applied to molecular dynamics simulations. Numerical testing is made by evaluating the equilibrium configurations of mono-elemental crystalline lattices of metallic and semiconducting type (Ag and Si, respectively) and of a cubic CuY compound.
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, T.
Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.
Accelerated numerical processing of electronically recorded holograms with reduced speckle noise.
Trujillo, Carlos; Garcia-Sucerquia, Jorge
2013-09-01
The numerical reconstruction of digitally recorded holograms suffers from speckle noise. An accelerated method that uses general-purpose computing in graphics processing units to reduce that noise is shown. The proposed methodology utilizes parallelized algorithms to record, reconstruct, and superimpose multiple uncorrelated holograms of a static scene. For the best tradeoff between reduction of the speckle noise and processing time, the method records, reconstructs, and superimposes six holograms of 1024 × 1024 pixels in 68 ms; for this case, the methodology reduces the speckle noise by 58% compared with that exhibited by a single hologram. The fully parallelized method running on a commodity graphics processing unit is one order of magnitude faster than the same technique implemented on a regular CPU using its multithreading capabilities. Experimental results are shown to validate the proposal.
Parallel heat transport in integrable and chaotic magnetic fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Del-Castillo-Negrete, Diego B; Chacon, Luis
2012-01-01
The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly chal- lenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), , and the perpendicular, , conductivities ( / may exceed 1010 in fusion plasmas); (ii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates; and (iii) Nonlocal parallel transport in the limit of small collisionality. Motivated by these issues, we present a Lagrangian Green s function method to solve themore » local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geom- etry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island chain), weakly chaotic (devil s staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local closures, is non-diffusive, thus casting doubts on the appropriateness of the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.« less
Wave Number Selection for Incompressible Parallel Jet Flows Periodic in Space
NASA Technical Reports Server (NTRS)
Miles, Jeffrey Hilton
1997-01-01
The temporal instability of a spatially periodic parallel flow of an incompressible inviscid fluid for various jet velocity profiles is studied numerically using Floquet Analysis. The transition matrix at the end of a period is evaluated by direct numerical integration. For verification, a method based on approximating a continuous function by a series of step functions was used. Unstable solutions were found only over a limited range of wave numbers and have a band type structure. The results obtained are analogous to the behavior observed in systems exhibiting complexity at the edge of order and chaos.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, C.; Yu, G.; Wang, K.
The physical designs of the new concept reactors which have complex structure, various materials and neutronic energy spectrum, have greatly improved the requirements to the calculation methods and the corresponding computing hardware. Along with the widely used parallel algorithm, heterogeneous platforms architecture has been introduced into numerical computations in reactor physics. Because of the natural parallel characteristics, the CPU-FPGA architecture is often used to accelerate numerical computation. This paper studies the application and features of this kind of heterogeneous platforms used in numerical calculation of reactor physics through practical examples. After the designed neutron diffusion module based on CPU-FPGA architecturemore » achieves a 11.2 speed up factor, it is proved to be feasible to apply this kind of heterogeneous platform into reactor physics. (authors)« less
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.
1996-01-01
We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields
del-Castillo-Negrete, Diego; Blazevski, Daniel
2016-04-01
Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in 3-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands and remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. The key parameter ismore » $$\\gamma=\\sqrt{\\omega/2 \\chi_\\parallel}$$ that determines the length scale, $$1/\\gamma$$, of the heat wave penetration along the magnetic field line. For large perturbation frequencies, $$\\omega \\gg 1$$, or small parallel thermal conductivities, $$\\chi_\\parallel \\ll 1$$, parallel heat transport is strongly damped and the magnetic field partial barriers act as robust barriers where the heat wave amplitude vanishes and its phase speed slows down to a halt. On the other hand, in the limit of small $$\\gamma$$, parallel heat transport is largely unimpeded, global transport is observed and the radial amplitude and phase speed of the heat wave remain finite. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in LHD and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude and the time delay of modulated heat pulses.« less
Parallelized implicit propagators for the finite-difference Schrödinger equation
NASA Astrophysics Data System (ADS)
Parker, Jonathan; Taylor, K. T.
1995-08-01
We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
NASA Astrophysics Data System (ADS)
Blakely, Christopher D.
This dissertation thesis has three main goals: (1) To explore the anatomy of meshless collocation approximation methods that have recently gained attention in the numerical analysis community; (2) Numerically demonstrate why the meshless collocation method should clearly become an attractive alternative to standard finite-element methods due to the simplicity of its implementation and its high-order convergence properties; (3) Propose a meshless collocation method for large scale computational geophysical fluid dynamics models. We provide numerical verification and validation of the meshless collocation scheme applied to the rotational shallow-water equations on the sphere and demonstrate computationally that the proposed model can compete with existing high performance methods for approximating the shallow-water equations such as the SEAM (spectral-element atmospheric model) developed at NCAR. A detailed analysis of the parallel implementation of the model, along with the introduction of parallel algorithmic routines for the high-performance simulation of the model will be given. We analyze the programming and computational aspects of the model using Fortran 90 and the message passing interface (mpi) library along with software and hardware specifications and performance tests. Details from many aspects of the implementation in regards to performance, optimization, and stabilization will be given. In order to verify the mathematical correctness of the algorithms presented and to validate the performance of the meshless collocation shallow-water model, we conclude the thesis with numerical experiments on some standardized test cases for the shallow-water equations on the sphere using the proposed method.
Numerical study of the vortex tube reconnection using vortex particle method on many graphics cards
NASA Astrophysics Data System (ADS)
Kudela, Henryk; Kosior, Andrzej
2014-08-01
Vortex Particle Methods are one of the most convenient ways of tracking the vorticity evolution. In the article we presented numerical recreation of the real life experiment concerning head-on collision of two vortex rings. In the experiment the evolution and reconnection of the vortex structures is tracked with passive markers (paint particles) which in viscous fluid does not follow the evolution of vorticity field. In numerical computations we showed the difference between vorticity evolution and movement of passive markers. The agreement with the experiment was very good. Due to problems with very long time of computations on a single processor the Vortex-in-Cell method was implemented on the multicore architecture of the graphics cards (GPUs). Vortex Particle Methods are very well suited for parallel computations. As there are myriads of particles in the flow and for each of them the same equations of motion have to be solved the SIMD architecture used in GPUs seems to be perfect. The main disadvantage in this case is the small amount of the RAM memory. To overcome this problem we created a multiGPU implementation of the VIC method. Some remarks on parallel computing are given in the article.
Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil
2012-01-01
The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480
High-Order Methods for Incompressible Fluid Flow
NASA Astrophysics Data System (ADS)
Deville, M. O.; Fischer, P. F.; Mund, E. H.
2002-08-01
High-order numerical methods provide an efficient approach to simulating many physical problems. This book considers the range of mathematical, engineering, and computer science topics that form the foundation of high-order numerical methods for the simulation of incompressible fluid flows in complex domains. Introductory chapters present high-order spatial and temporal discretizations for one-dimensional problems. These are extended to multiple space dimensions with a detailed discussion of tensor-product forms, multi-domain methods, and preconditioners for iterative solution techniques. Numerous discretizations of the steady and unsteady Stokes and Navier-Stokes equations are presented, with particular sttention given to enforcement of imcompressibility. Advanced discretizations. implementation issues, and parallel and vector performance are considered in the closing sections. Numerous examples are provided throughout to illustrate the capabilities of high-order methods in actual applications.
NASA Astrophysics Data System (ADS)
Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang
2018-04-01
The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Local and nonlocal parallel heat transport in general magnetic fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Del-Castillo-Negrete, Diego B; Chacon, Luis
2011-01-01
A novel approach for the study of parallel transport in magnetized plasmas is presented. The method avoids numerical pollution issues of grid-based formulations and applies to integrable and chaotic magnetic fields with local or nonlocal parallel closures. In weakly chaotic fields, the method gives the fractal structure of the devil's staircase radial temperature profile. In fully chaotic fields, the temperature exhibits self-similar spatiotemporal evolution with a stretched-exponential scaling function for local closures and an algebraically decaying one for nonlocal closures. It is shown that, for both closures, the effective radial heat transport is incompatible with the quasilinear diffusion model.
Computation of free energy profiles with parallel adaptive dynamics
NASA Astrophysics Data System (ADS)
Lelièvre, Tony; Rousset, Mathias; Stoltz, Gabriel
2007-04-01
We propose a formulation of an adaptive computation of free energy differences, in the adaptive biasing force or nonequilibrium metadynamics spirit, using conditional distributions of samples of configurations which evolve in time. This allows us to present a truly unifying framework for these methods, and to prove convergence results for certain classes of algorithms. From a numerical viewpoint, a parallel implementation of these methods is very natural, the replicas interacting through the reconstructed free energy. We demonstrate how to improve this parallel implementation by resorting to some selection mechanism on the replicas. This is illustrated by computations on a model system of conformational changes.
Efficient parallel implicit methods for rotary-wing aerodynamics calculations
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.
Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
High order parallel numerical schemes for solving incompressible flows
NASA Technical Reports Server (NTRS)
Lin, Avi; Milner, Edward J.; Liou, May-Fun; Belch, Richard A.
1992-01-01
The use of parallel computers for numerically solving flow fields has gained much importance in recent years. This paper introduces a new high order numerical scheme for computational fluid dynamics (CFD) specifically designed for parallel computational environments. A distributed MIMD system gives the flexibility of treating different elements of the governing equations with totally different numerical schemes in different regions of the flow field. The parallel decomposition of the governing operator to be solved is the primary parallel split. The primary parallel split was studied using a hypercube like architecture having clusters of shared memory processors at each node. The approach is demonstrated using examples of simple steady state incompressible flows. Future studies should investigate the secondary split because, depending on the numerical scheme that each of the processors applies and the nature of the flow in the specific subdomain, it may be possible for a processor to seek better, or higher order, schemes for its particular subcase.
A Green's function method for local and non-local parallel transport in general magnetic fields
NASA Astrophysics Data System (ADS)
Del-Castillo-Negrete, Diego; Chacón, Luis
2009-11-01
The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion and astrophysics research. Three issues make this problem particularly challenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), χ, and the perpendicular, χ, conductivities (χ/χ may exceed 10^10 in fusion plasmas); (ii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates; and (iii) Nonlocal parallel transport in the limit of small collisionality. Motivated by these issues, we present a Lagrangian Green's function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields. The numerical implementation employs a volume-preserving field-line integrator [Finn and Chac'on, Phys. Plasmas, 12 (2005)] for an accurate representation of the magnetic field lines regardless of the level of stochasticity. The general formalism and its algorithmic properties are discussed along with illustrative analytical and numerical examples. Problems of particular interest include: the departures from the Rochester--Rosenbluth diffusive scaling in the weak magnetic chaos regime, the interplay between non-locality and chaos, and the robustness of transport barriers in reverse shear configurations.
NASA Astrophysics Data System (ADS)
Chen, M.; Wei, S.
2016-12-01
The serious damage of Mexico City caused by the 1985 Michoacan earthquake 400 km away indicates that urban areas may be affected by remote earthquakes. To asses earthquake risk of urban areas imposed by distant earthquakes, we developed a hybrid Frequency Wavenumber (FK) and Finite Difference (FD) code implemented with MPI, since the computation of seismic wave propagation from a distant earthquake using a single numerical method (e.g. Finite Difference, Finite Element or Spectral Element) is very expensive. In our approach, we compute the incident wave field (ud) at the boundaries of the excitation box, which surrounding the local structure, using a paralleled FK method (Zhu and Rivera, 2002), and compute the total wave field (u) within the excitation box using a parallelled 2D FD method. We apply perfectly matched layer (PML) absorbing condition to the diffracted wave field (u-ud). Compared to previous Generalized Ray Theory and Finite Difference (Wen and Helmberger, 1998), Frequency Wavenumber and Spectral Element (Tong et al., 2014), and Direct Solution Method and Spectral Element hybrid method (Monteiller et al., 2013), our absorbing boundary condition dramatically suppress the numerical noise. The MPI implementation of our method can greatly speed up the calculation. Besides, our hybrid method also has a potential use in high resolution array imaging similar to Tong et al. (2014).
Hu, Shaoxing; Xu, Shike; Wang, Duhu; Zhang, Aiwu
2015-11-11
Aiming at addressing the problem of high computational cost of the traditional Kalman filter in SINS/GPS, a practical optimization algorithm with offline-derivation and parallel processing methods based on the numerical characteristics of the system is presented in this paper. The algorithm exploits the sparseness and/or symmetry of matrices to simplify the computational procedure. Thus plenty of invalid operations can be avoided by offline derivation using a block matrix technique. For enhanced efficiency, a new parallel computational mechanism is established by subdividing and restructuring calculation processes after analyzing the extracted "useful" data. As a result, the algorithm saves about 90% of the CPU processing time and 66% of the memory usage needed in a classical Kalman filter. Meanwhile, the method as a numerical approach needs no precise-loss transformation/approximation of system modules and the accuracy suffers little in comparison with the filter before computational optimization. Furthermore, since no complicated matrix theories are needed, the algorithm can be easily transplanted into other modified filters as a secondary optimization method to achieve further efficiency.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kusoglu Sarikaya, C.; Rafatov, I., E-mail: rafatov@metu.edu.tr; Kudryavtsev, A. A.
2016-06-15
The work deals with the Particle in Cell/Monte Carlo Collision (PIC/MCC) analysis of the problem of detection and identification of impurities in the nonlocal plasma of gas discharge using the Plasma Electron Spectroscopy (PLES) method. For this purpose, 1d3v PIC/MCC code for numerical simulation of glow discharge with nonlocal electron energy distribution function is developed. The elastic, excitation, and ionization collisions between electron-neutral pairs and isotropic scattering and charge exchange collisions between ion-neutral pairs and Penning ionizations are taken into account. Applicability of the numerical code is verified under the Radio-Frequency capacitively coupled discharge conditions. The efficiency of the codemore » is increased by its parallelization using Open Message Passing Interface. As a demonstration of the PLES method, parallel PIC/MCC code is applied to the direct current glow discharge in helium doped with a small amount of argon. Numerical results are consistent with the theoretical analysis of formation of nonlocal EEDF and existing experimental data.« less
The multigrid preconditioned conjugate gradient method
NASA Technical Reports Server (NTRS)
Tatebe, Osamu
1993-01-01
A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
Research in applied mathematics, numerical analysis, and computer science
NASA Technical Reports Server (NTRS)
1984-01-01
Research conducted at the Institute for Computer Applications in Science and Engineering (ICASE) in applied mathematics, numerical analysis, and computer science is summarized and abstracts of published reports are presented. The major categories of the ICASE research program are: (1) numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; (2) control and parameter identification; (3) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and (4) computer systems and software, especially vector and parallel computers.
Explicit finite-difference simulation of optical integrated devices on massive parallel computers.
Sterkenburgh, T; Michels, R M; Dress, P; Franke, H
1997-02-20
An explicit method for the numerical simulation of optical integrated circuits by means of the finite-difference time-domain (FDTD) method is presented. This method, based on an explicit solution of Maxwell's equations, is well established in microwave technology. Although the simulation areas are small, we verified the behavior of three interesting problems, especially nonparaxial problems, with typical aspects of integrated optical devices. Because numerical losses are within acceptable limits, we suggest the use of the FDTD method to achieve promising quantitative simulation results.
NASA Astrophysics Data System (ADS)
Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao
2017-01-01
Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.
Statistical properties of Charney-Hasegawa-Mima zonal flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Johan, E-mail: anderson.johan@gmail.com; Botha, G. J. J.
2015-05-15
A theoretical interpretation of numerically generated probability density functions (PDFs) of intermittent plasma transport events in unforced zonal flows is provided within the Charney-Hasegawa-Mima (CHM) model. The governing equation is solved numerically with various prescribed density gradients that are designed to produce different configurations of parallel and anti-parallel streams. Long-lasting vortices form whose flow is governed by the zonal streams. It is found that the numerically generated PDFs can be matched with analytical predictions of PDFs based on the instanton method by removing the autocorrelations from the time series. In many instances, the statistics generated by the CHM dynamics relaxesmore » to Gaussian distributions for both the electrostatic and vorticity perturbations, whereas in areas with strong nonlinear interactions it is found that the PDFs are exponentially distributed.« less
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zheng, Xiang; Yang, Chao; State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190
2015-03-15
We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracymore » (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.« less
Computational Issues in Damping Identification for Large Scale Problems
NASA Technical Reports Server (NTRS)
Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.
1997-01-01
Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.
Zhan, X.
2005-01-01
A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.
Construction and comparison of parallel implicit kinetic solvers in three spatial dimensions
NASA Astrophysics Data System (ADS)
Titarev, Vladimir; Dumbser, Michael; Utyuzhnikov, Sergey
2014-01-01
The paper is devoted to the further development and systematic performance evaluation of a recent deterministic framework Nesvetay-3D for modelling three-dimensional rarefied gas flows. Firstly, a review of the existing discretization and parallelization strategies for solving numerically the Boltzmann kinetic equation with various model collision integrals is carried out. Secondly, a new parallelization strategy for the implicit time evolution method is implemented which improves scaling on large CPU clusters. Accuracy and scalability of the methods are demonstrated on a pressure-driven rarefied gas flow through a finite-length circular pipe as well as an external supersonic flow over a three-dimensional re-entry geometry of complicated aerodynamic shape.
NASA Technical Reports Server (NTRS)
Fahrenthold, Eric P.; Shivarama, Ravishankar
2004-01-01
The hybrid particle-finite element method of Fahrenthold and Horban, developed for the simulation of hypervelocity impact problems, has been extended to include new formulations of the particle-element kinematics, additional constitutive models, and an improved numerical implementation. The extended formulation has been validated in three dimensional simulations of published impact experiments. The test cases demonstrate good agreement with experiment, good parallel speedup, and numerical convergence of the simulation results.
Efficient solution of parabolic equations by Krylov approximation methods
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1990-01-01
Numerical techniques for solving parabolic equations by the method of lines is addressed. The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace. Thus, the resulting approximation consists of applying an evolution operator of a very small dimension to a known vector which is, in turn, computed accurately by exploiting well-known rational approximations to the exponential. Because the rational approximation is only applied to a small matrix, the only operations required with the original large matrix are matrix-by-vector multiplications, and as a result the algorithm can easily be parallelized and vectorized. Some relevant approximation and stability issues are discussed. We present some numerical experiments with the method and compare its performance with a few explicit and implicit algorithms.
A numeric investigation of co-flowing liquid streams using the Lattice Boltzmann Method
NASA Astrophysics Data System (ADS)
Somogyi, Andy; Tagg, Randall
2007-11-01
We present a numerical investigation of co-flowing immiscible liquid streams using the Lattice Boltzmann Method (LBM) for multi component, dissimilar viscosity, immiscible fluid flow. When a liquid is injected into another immiscible liquid, the flow will eventually transition from jetting to dripping due to interfacial tension. Our implementation of LBM models the interfacial tension through a variety of techniques. Parallelization is also straightforward for both single and multi component models as only near local interaction is required. We compare the results of our numerical investigation using LBM to several recent physical experiments.
Solution of partial differential equations on vector and parallel computers
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1985-01-01
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
NASA Astrophysics Data System (ADS)
Katsaounis, T. D.
2005-02-01
The scope of this book is to present well known simple and advanced numerical methods for solving partial differential equations (PDEs) and how to implement these methods using the programming environment of the software package Diffpack. A basic background in PDEs and numerical methods is required by the potential reader. Further, a basic knowledge of the finite element method and its implementation in one and two space dimensions is required. The authors claim that no prior knowledge of the package Diffpack is required, which is true, but the reader should be at least familiar with an object oriented programming language like C++ in order to better comprehend the programming environment of Diffpack. Certainly, a prior knowledge or usage of Diffpack would be a great advantage to the reader. The book consists of 15 chapters, each one written by one or more authors. Each chapter is basically divided into two parts: the first part is about mathematical models described by PDEs and numerical methods to solve these models and the second part describes how to implement the numerical methods using the programming environment of Diffpack. Each chapter closes with a list of references on its subject. The first nine chapters cover well known numerical methods for solving the basic types of PDEs. Further, programming techniques on the serial as well as on the parallel implementation of numerical methods are also included in these chapters. The last five chapters are dedicated to applications, modelled by PDEs, in a variety of fields. The first chapter is an introduction to parallel processing. It covers fundamentals of parallel processing in a simple and concrete way and no prior knowledge of the subject is required. Examples of parallel implementation of basic linear algebra operations are presented using the Message Passing Interface (MPI) programming environment. Here, some knowledge of MPI routines is required by the reader. Examples solving in parallel simple PDEs using Diffpack and MPI are also presented. Chapter 2 presents the overlapping domain decomposition method for solving PDEs. It is well known that these methods are suitable for parallel processing. The first part of the chapter covers the mathematical formulation of the method as well as algorithmic and implementational issues. The second part presents a serial and a parallel implementational framework within the programming environment of Diffpack. The chapter closes by showing how to solve two application examples with the overlapping domain decomposition method using Diffpack. Chapter 3 is a tutorial about how to incorporate the multigrid solver in Diffpack. The method is illustrated by examples such as a Poisson solver, a general elliptic problem with various types of boundary conditions and a nonlinear Poisson type problem. In chapter 4 the mixed finite element is introduced. Technical issues concerning the practical implementation of the method are also presented. The main difficulties of the efficient implementation of the method, especially in two and three space dimensions on unstructured grids, are presented and addressed in the framework of Diffpack. The implementational process is illustrated by two examples, namely the system formulation of the Poisson problem and the Stokes problem. Chapter 5 is closely related to chapter 4 and addresses the problem of how to solve efficiently the linear systems arising by the application of the mixed finite element method. The proposed method is block preconditioning. Efficient techniques for implementing the method within Diffpack are presented. Optimal block preconditioners are used to solve the system formulation of the Poisson problem, the Stokes problem and the bidomain model for the electrical activity in the heart. The subject of chapter 6 is systems of PDEs. Linear and nonlinear systems are discussed. Fully implicit and operator splitting methods are presented. Special attention is paid to how existing solvers for scalar equations in Diffpack can be used to derive fully implicit solvers for systems. The proposed techniques are illustrated in terms of two applications, namely a system of PDEs modelling pipeflow and a two-phase porous media flow. Stochastic PDEs is the topic of chapter 7. The first part of the chapter is a simple introduction to stochastic PDEs; basic analytical properties are presented for simple models like transport phenomena and viscous drag forces. The second part considers the numerical solution of stochastic PDEs. Two basic techniques are presented, namely Monte Carlo and perturbation methods. The last part explains how to implement and incorporate these solvers into Diffpack. Chapter 8 describes how to operate Diffpack from Python scripts. The main goal here is to provide all the programming and technical details in order to glue the programming environment of Diffpack with visualization packages through Python and in general take advantage of the Python interfaces. Chapter 9 attempts to show how to use numerical experiments to measure the performance of various PDE solvers. The authors gathered a rather impressive list, a total of 14 PDE solvers. Solvers for problems like Poisson, Navier--Stokes, elasticity, two-phase flows and methods such as finite difference, finite element, multigrid, and gradient type methods are presented. The authors provide a series of numerical results combining various solvers with various methods in order to gain insight into their computational performance and efficiency. In Chapter 10 the authors consider a computationally challenging problem, namely the computation of the electrical activity of the human heart. After a brief introduction on the biology of the problem the authors present the mathematical models involved and a numerical method for solving them within the framework of Diffpack. Chapter 11 and 12 are closely related; actually they could have been combined in a single chapter. Chapter 11 introduces several mathematical models used in finance, based on the Black--Scholes equation. Chapter 12 considers several numerical methods like Monte Carlo, lattice methods, finite difference and finite element methods. Implementation of these methods within Diffpack is presented in the last part of the chapter. Chapter 13 presents how the finite element method is used for the modelling and analysis of elastic structures. The authors describe the structural elements of Diffpack which include popular elements such as beams and plates and examples are presented on how to use them to simulate elastic structures. Chapter 14 describes an application problem, namely the extrusion of aluminum. This is a rather\\endcolumn complicated process which involves non-Newtonian flow, heat transfer and elasticity. The authors describe the systems of PDEs modelling the underlying process and use a finite element method to obtain a numerical solution. The implementation of the numerical method in Diffpack is presented along with some applications. The last chapter, chapter 15, focuses on mathematical and numerical models of systems of PDEs governing geological processes in sedimentary basins. The underlying mathematical model is solved using the finite element method within a fully implicit scheme. The authors discuss the implementational issues involved within Diffpack and they present results from several examples. In summary, the book focuses on the computational and implementational issues involved in solving partial differential equations. The potential reader should have a basic knowledge of PDEs and the finite difference and finite element methods. The examples presented are solved within the programming framework of Diffpack and the reader should have prior experience with the particular software in order to take full advantage of the book. Overall the book is well written, the subject of each chapter is well presented and can serve as a reference for graduate students, researchers and engineers who are interested in the numerical solution of partial differential equations modelling various applications.
Parallel heuristics for scalable community detection
Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth
2015-08-14
Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
NASA Astrophysics Data System (ADS)
Park, George Ilhwan; Moin, Parviz
2016-01-01
This paper focuses on numerical and practical aspects associated with a parallel implementation of a two-layer zonal wall model for large-eddy simulation (LES) of compressible wall-bounded turbulent flows on unstructured meshes. A zonal wall model based on the solution of unsteady three-dimensional Reynolds-averaged Navier-Stokes (RANS) equations on a separate near-wall grid is implemented in an unstructured, cell-centered finite-volume LES solver. The main challenge in its implementation is to couple two parallel, unstructured flow solvers for efficient boundary data communication and simultaneous time integrations. A coupling strategy with good load balancing and low processors underutilization is identified. Face mapping and interpolation procedures at the coupling interface are explained in detail. The method of manufactured solution is used for verifying the correct implementation of solver coupling, and parallel performance of the combined wall-modeled LES (WMLES) solver is investigated. The method has successfully been applied to several attached and separated flows, including a transitional flow over a flat plate and a separated flow over an airfoil at an angle of attack.
A mixed finite difference/Galerkin method for three-dimensional Rayleigh-Benard convection
NASA Technical Reports Server (NTRS)
Buell, Jeffrey C.
1988-01-01
A fast and accurate numerical method, for nonlinear conservation equation systems whose solutions are periodic in two of the three spatial dimensions, is presently implemented for the case of Rayleigh-Benard convection between two rigid parallel plates in the parameter region where steady, three-dimensional convection is known to be stable. High-order streamfunctions secure the reduction of the system of five partial differential equations to a system of only three. Numerical experiments are presented which verify both the expected convergence rates and the absolute accuracy of the method.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation
NASA Technical Reports Server (NTRS)
O'Connell, Matthew; Karman, Steve L.
2015-01-01
Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
NASA Astrophysics Data System (ADS)
Lin, S. T.; Liou, T. S.
2017-12-01
Numerical simulation of groundwater flow in anisotropic aquifers usually suffers from the lack of accuracy of calculating groundwater flux across grid blocks. Conventional two-point flux approximation (TPFA) can only obtain the flux normal to the grid interface but completely neglects the one parallel to it. Furthermore, the hydraulic gradient in a grid block estimated from TPFA can only poorly represent the hydraulic condition near the intersection of grid blocks. These disadvantages are further exacerbated when the principal axes of hydraulic conductivity, global coordinate system, and grid boundary are not parallel to one another. In order to refine the estimation the in-grid hydraulic gradient, several multiple-point flux approximation (MPFA) methods have been developed for two-dimensional groundwater flow simulations. For example, the MPFA-O method uses the hydraulic head at the junction node as an auxiliary variable which is then eliminated using the head and flux continuity conditions. In this study, a three-dimensional MPFA method will be developed for numerical simulation of groundwater flow in three-dimensional and strongly anisotropic aquifers. This new MPFA method first discretizes the simulation domain into hexahedrons. Each hexahedron is further decomposed into a certain number of tetrahedrons. The 2D MPFA-O method is then extended to these tetrahedrons, using the unknown head at the intersection of hexahedrons as an auxiliary variable along with the head and flux continuity conditions to solve for the head at the center of each hexahedron. Numerical simulations using this new MPFA method have been successfully compared with those obtained from a modified version of TOUGH2.
Current distribution on a cylindrical antenna with parallel orientation in a lossy magnetoplasma
NASA Technical Reports Server (NTRS)
Klein, C. A.; Klock, P. W.; Deschamps, G. A.
1972-01-01
The current distribution and impedance of a thin cylindrical antenna with parallel orientation to the static magnetic field of a lossy magnetoplasma is calculated with the method of moments. The electric field produced by an infinitesimal current source is first derived. Results are presented for a wide range of plasma parameters. Reasonable answers are obtained for all cases except for the overdense hyperbolic case. A discussion of the numerical stability is included which not only applies to this problem but other applications of the method of moments.
Evaluation of Proteus as a Tool for the Rapid Development of Models of Hydrologic Systems
NASA Astrophysics Data System (ADS)
Weigand, T. M.; Farthing, M. W.; Kees, C. E.; Miller, C. T.
2013-12-01
Models of modern hydrologic systems can be complex and involve a variety of operators with varying character. The goal is to implement approximations of such models that are both efficient for the developer and computationally efficient, which is a set of naturally competing objectives. Proteus is a Python-based toolbox that supports prototyping of model formulations as well as a wide variety of modern numerical methods and parallel computing. We used Proteus to develop numerical approximations for three models: Richards' equation, a brine flow model derived using the Thermodynamically Constrained Averaging Theory (TCAT), and a multiphase TCAT-based tumor growth model. For Richards' equation, we investigated discontinuous Galerkin solutions with higher order time integration based on the backward difference formulas. The TCAT brine flow model was implemented using Proteus and a variety of numerical methods were compared to hand coded solutions. Finally, an existing tumor growth model was implemented in Proteus to introduce more advanced numerics and allow the code to be run in parallel. From these three example models, Proteus was found to be an attractive open-source option for rapidly developing high quality code for solving existing and evolving computational science models.
NASA Astrophysics Data System (ADS)
Abramov, G. V.; Gavrilov, A. N.
2018-03-01
The article deals with the numerical solution of the mathematical model of the particles motion and interaction in multicomponent plasma by the example of electric arc synthesis of carbon nanostructures. The high order of the particles and the number of their interactions requires a significant input of machine resources and time for calculations. Application of the large particles method makes it possible to reduce the amount of computation and the requirements for hardware resources without affecting the accuracy of numerical calculations. The use of technology of GPGPU parallel computing using the Nvidia CUDA technology allows organizing all General purpose computation on the basis of the graphical processor graphics card. The comparative analysis of different approaches to parallelization of computations to speed up calculations with the choice of the algorithm in which to calculate the accuracy of the solution shared memory is used. Numerical study of the influence of particles density in the macro particle on the motion parameters and the total number of particle collisions in the plasma for different modes of synthesis has been carried out. The rational range of the coherence coefficient of particle in the macro particle is computed.
Hybrid Particle-Element Simulation of Impact on Composite Orbital Debris Shields
NASA Technical Reports Server (NTRS)
Fahrenthold, Eric P.
2004-01-01
This report describes the development of new numerical methods and new constitutive models for the simulation of hypervelocity impact effects on spacecraft. The research has included parallel implementation of the numerical methods and material models developed under the project. Validation work has included both one dimensional simulations, for comparison with exact solutions, and three dimensional simulations of published hypervelocity impact experiments. The validated formulations have been applied to simulate impact effects in a velocity and kinetic energy regime outside the capabilities of current experimental methods. The research results presented here allow for the expanded use of numerical simulation, as a complement to experimental work, in future design of spacecraft for hypervelocity impact effects.
A Fourier collocation time domain method for numerically solving Maxwell's equations
NASA Technical Reports Server (NTRS)
Shebalin, John V.
1991-01-01
A new method for solving Maxwell's equations in the time domain for arbitrary values of permittivity, conductivity, and permeability is presented. Spatial derivatives are found by a Fourier transform method and time integration is performed using a second order, semi-implicit procedure. Electric and magnetic fields are collocated on the same grid points, rather than on interleaved points, as in the Finite Difference Time Domain (FDTD) method. Numerical results are presented for the propagation of a 2-D Transverse Electromagnetic (TEM) mode out of a parallel plate waveguide and into a dielectric and conducting medium.
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, Toshio
1999-01-01
This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
Comparison of Implicit Collocation Methods for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Jezequel, Fabienne; Zukor, Dorothy (Technical Monitor)
2001-01-01
We combine a high-order compact finite difference scheme to approximate spatial derivatives arid collocation techniques for the time component to numerically solve the two dimensional heat equation. We use two approaches to implement the collocation methods. The first one is based on an explicit computation of the coefficients of polynomials and the second one relies on differential quadrature. We compare them by studying their merits and analyzing their numerical performance. All our computations, based on parallel algorithms, are carried out on the CRAY SV1.
NASA Astrophysics Data System (ADS)
Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.
2016-05-01
This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Domain decomposition methods for the parallel computation of reacting flows
NASA Technical Reports Server (NTRS)
Keyes, David E.
1988-01-01
Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
NASA Astrophysics Data System (ADS)
Lee, Ho-Young; Lee, Se-Hee
2017-08-01
Mechanical deformation, bending deformation, and distributive magnetic loads were evaluated numerically and experimentally for conducting materials excited with high current. Until now, many research works have extensively studied the area of magnetic force and mechanical deformation by using coupled approaches such as multiphysics solvers. In coupled analysis for magnetoelastic problems, some articles and commercial software have presented the resultant mechanical deformation and stress on the body. To evaluate the mechanical deformation, the Lorentz force density method (LZ) and the Maxwell stress tensor method (MX) have been widely used for conducting materials. However, it is difficult to find any experimental verification regarding mechanical deformation or bending deformation due to magnetic force density. Therefore, we compared our numerical results to those from experiments with two parallel conducting bars to verify our numerical setup for bending deformation. Before showing this, the basic and interesting coupled simulation was conducted to test the mechanical deformations by the LZ (body force density) and the MX (surface force density) methods. This resulted in MX gave the same total force as LZ, but the local force distribution in MX introduced an incorrect mechanical deformation in the simulation of a solid conductor.
Kranc: a Mathematica package to generate numerical codes for tensorial evolution equations
NASA Astrophysics Data System (ADS)
Husa, Sascha; Hinder, Ian; Lechner, Christiane
2006-06-01
We present a suite of Mathematica-based computer-algebra packages, termed "Kranc", which comprise a toolbox to convert certain (tensorial) systems of partial differential evolution equations to parallelized C or Fortran code for solving initial boundary value problems. Kranc can be used as a "rapid prototyping" system for physicists or mathematicians handling very complicated systems of partial differential equations, but through integration into the Cactus computational toolkit we can also produce efficient parallelized production codes. Our work is motivated by the field of numerical relativity, where Kranc is used as a research tool by the authors. In this paper we describe the design and implementation of both the Mathematica packages and the resulting code, we discuss some example applications, and provide results on the performance of an example numerical code for the Einstein equations. Program summaryTitle of program: Kranc Catalogue identifier: ADXS_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXS_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computer for which the program is designed and others on which it has been tested: General computers which run Mathematica (for code generation) and Cactus (for numerical simulations), tested under Linux Programming language used: Mathematica, C, Fortran 90 Memory required to execute with typical data: This depends on the number of variables and gridsize, the included ADM example requires 4308 KB Has the code been vectorized or parallelized: The code is parallelized based on the Cactus framework. Number of bytes in distributed program, including test data, etc.: 1 578 142 Number of lines in distributed program, including test data, etc.: 11 711 Nature of physical problem: Solution of partial differential equations in three space dimensions, which are formulated as an initial value problem. In particular, the program is geared towards handling very complex tensorial equations as they appear, e.g., in numerical relativity. The worked out examples comprise the Klein-Gordon equations, the Maxwell equations, and the ADM formulation of the Einstein equations. Method of solution: The method of numerical solution is finite differencing and method of lines time integration, the numerical code is generated through a high level Mathematica interface. Restrictions on the complexity of the program: Typical numerical relativity applications will contain up to several dozen evolution variables and thousands of source terms, Cactus applications have shown scaling up to several thousand processors and grid sizes exceeding 500 3. Typical running time: This depends on the number of variables and the grid size: the included ADM example takes approximately 100 seconds on a 1600 MHz Intel Pentium M processor. Unusual features of the program: based on Mathematica and Cactus
A parallel adaptive mesh refinement algorithm
NASA Technical Reports Server (NTRS)
Quirk, James J.; Hanebutte, Ulf R.
1993-01-01
Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.
Variational Algorithms for Test Particle Trajectories
NASA Astrophysics Data System (ADS)
Ellison, C. Leland; Finn, John M.; Qin, Hong; Tang, William M.
2015-11-01
The theory of variational integration provides a novel framework for constructing conservative numerical methods for magnetized test particle dynamics. The retention of conservation laws in the numerical time advance captures the correct qualitative behavior of the long time dynamics. For modeling the Lorentz force system, new variational integrators have been developed that are both symplectic and electromagnetically gauge invariant. For guiding center test particle dynamics, discretization of the phase-space action principle yields multistep variational algorithms, in general. Obtaining the desired long-term numerical fidelity requires mitigation of the multistep method's parasitic modes or applying a discretization scheme that possesses a discrete degeneracy to yield a one-step method. Dissipative effects may be modeled using Lagrange-D'Alembert variational principles. Numerical results will be presented using a new numerical platform that interfaces with popular equilibrium codes and utilizes parallel hardware to achieve reduced times to solution. This work was supported by DOE Contract DE-AC02-09CH11466.
NASA Technical Reports Server (NTRS)
Zhang, Jun; Ge, Lixin; Kouatchou, Jules
2000-01-01
A new fourth order compact difference scheme for the three dimensional convection diffusion equation with variable coefficients is presented. The novelty of this new difference scheme is that it Only requires 15 grid points and that it can be decoupled with two colors. The entire computational grid can be updated in two parallel subsweeps with the Gauss-Seidel type iterative method. This is compared with the known 19 point fourth order compact differenCe scheme which requires four colors to decouple the computational grid. Numerical results, with multigrid methods implemented on a shared memory parallel computer, are presented to compare the 15 point and the 19 point fourth order compact schemes.
NASA Astrophysics Data System (ADS)
Macomber, B.; Woollands, R. M.; Probe, A.; Younes, A.; Bai, X.; Junkins, J.
2013-09-01
Modified Chebyshev Picard Iteration (MCPI) is an iterative numerical method for approximating solutions of linear or non-linear Ordinary Differential Equations (ODEs) to obtain time histories of system state trajectories. Unlike other step-by-step differential equation solvers, the Runge-Kutta family of numerical integrators for example, MCPI approximates long arcs of the state trajectory with an iterative path approximation approach, and is ideally suited to parallel computation. Orthogonal Chebyshev Polynomials are used as basis functions during each path iteration; the integrations of the Picard iteration are then done analytically. Due to the orthogonality of the Chebyshev basis functions, the least square approximations are computed without matrix inversion; the coefficients are computed robustly from discrete inner products. As a consequence of discrete sampling and weighting adopted for the inner product definition, Runge phenomena errors are minimized near the ends of the approximation intervals. The MCPI algorithm utilizes a vector-matrix framework for computational efficiency. Additionally, all Chebyshev coefficients and integrand function evaluations are independent, meaning they can be simultaneously computed in parallel for further decreased computational cost. Over an order of magnitude speedup from traditional methods is achieved in serial processing, and an additional order of magnitude is achievable in parallel architectures. This paper presents a new MCPI library, a modular toolset designed to allow MCPI to be easily applied to a wide variety of ODE systems. Library users will not have to concern themselves with the underlying mathematics behind the MCPI method. Inputs are the boundary conditions of the dynamical system, the integrand function governing system behavior, and the desired time interval of integration, and the output is a time history of the system states over the interval of interest. Examples from the field of astrodynamics are presented to compare the output from the MCPI library to current state-of-practice numerical integration methods. It is shown that MCPI is capable of out-performing the state-of-practice in terms of computational cost and accuracy.
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harrison, Robert J.; Beylkin, Gregory; Bischoff, Florian A.
2016-01-01
MADNESS (multiresolution adaptive numerical environment for scientific simulation) is a high-level software environment for solving integral and differential equations in many dimensions that uses adaptive and fast harmonic analysis methods with guaranteed precision based on multiresolution analysis and separated representations. Underpinning the numerical capabilities is a powerful petascale parallel programming environment that aims to increase both programmer productivity and code scalability. This paper describes the features and capabilities of MADNESS and briefly discusses some current applications in chemistry and several areas of physics.
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Linearly exact parallel closures for slab geometry
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun
2013-08-01
Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids
NASA Astrophysics Data System (ADS)
Ma, Xinrong; Duan, Zhijian
2018-04-01
High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Cumulative reports and publications through December 31, 1989
NASA Technical Reports Server (NTRS)
1990-01-01
A complete list of reports from the Institute for Computer Applications in Science and Engineering (ICASE) is presented. The major categories of the current ICASE research program are: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effectual numerical methods; computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, structural analysis, and chemistry; computer systems and software, especially vector and parallel computers, microcomputers, and data management. Since ICASE reports are intended to be preprints of articles that will appear in journals or conference proceedings, the published reference is included when it is available.
Parallel language constructs for tensor product computations on loosely coupled architectures
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Van Rosendale, John
1989-01-01
A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
A solution to neural field equations by a recurrent neural network method
NASA Astrophysics Data System (ADS)
Alharbi, Abir
2012-09-01
Neural field equations (NFE) are used to model the activity of neurons in the brain, it is introduced from a single neuron 'integrate-and-fire model' starting point. The neural continuum is spatially discretized for numerical studies, and the governing equations are modeled as a system of ordinary differential equations. In this article the recurrent neural network approach is used to solve this system of ODEs. This consists of a technique developed by combining the standard numerical method of finite-differences with the Hopfield neural network. The architecture of the net, energy function, updating equations, and algorithms are developed for the NFE model. A Hopfield Neural Network is then designed to minimize the energy function modeling the NFE. Results obtained from the Hopfield-finite-differences net show excellent performance in terms of accuracy and speed. The parallelism nature of the Hopfield approaches may make them easier to implement on fast parallel computers and give them the speed advantage over the traditional methods.
The development of a revised version of multi-center molecular Ornstein-Zernike equation
NASA Astrophysics Data System (ADS)
Kido, Kentaro; Yokogawa, Daisuke; Sato, Hirofumi
2012-04-01
Ornstein-Zernike (OZ)-type theory is a powerful tool to obtain 3-dimensional solvent distribution around solute molecule. Recently, we proposed multi-center molecular OZ method, which is suitable for parallel computing of 3D solvation structure. The distribution function in this method consists of two components, namely reference and residue parts. Several types of the function were examined as the reference part to investigate the numerical robustness of the method. As the benchmark, the method is applied to water, benzene in aqueous solution and single-walled carbon nanotube in chloroform solution. The results indicate that fully-parallelization is achieved by utilizing the newly proposed reference functions.
Parallelization of implicit finite difference schemes in computational fluid dynamics
NASA Technical Reports Server (NTRS)
Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel
1990-01-01
Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
Comparison between four dissimilar solar panel configurations
NASA Astrophysics Data System (ADS)
Suleiman, K.; Ali, U. A.; Yusuf, Ibrahim; Koko, A. D.; Bala, S. I.
2017-12-01
Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series-parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged in parallel with 24 V each, configuration II consists of four sub-components arranged logically in parallel with 12 V each, configuration III consists of four sub-components arranged in series-parallel with 8 V each, and configuration IV has six sub-components with 6 V each arranged in series-parallel. Comparative analysis was made using Chapman Kolmogorov's method. The derivation for explicit expression of mean time to system failure, steady state availability and cost benefit analysis were performed, based on the comparison. Ranking method was used to determine the optimal configuration of the systems. The results of analytical and numerical solutions of system availability and mean time to system failure were determined and it was found that configuration I is the optimal configuration.
Numerical Analysis of Dusty-Gas Flows
NASA Astrophysics Data System (ADS)
Saito, T.
2002-02-01
This paper presents the development of a numerical code for simulating unsteady dusty-gas flows including shock and rarefaction waves. The numerical results obtained for a shock tube problem are used for validating the accuracy and performance of the code. The code is then extended for simulating two-dimensional problems. Since the interactions between the gas and particle phases are calculated with the operator splitting technique, we can choose numerical schemes independently for the different phases. A semi-analytical method is developed for the dust phase, while the TVD scheme of Harten and Yee is chosen for the gas phase. Throughout this study, computations are carried out on SGI Origin2000, a parallel computer with multiple of RISC based processors. The efficient use of the parallel computer system is an important issue and the code implementation on Origin2000 is also described. Flow profiles of both the gas and solid particles behind the steady shock wave are calculated by integrating the steady conservation equations. The good agreement between the pseudo-stationary solutions and those from the current numerical code validates the numerical approach and the actual coding. The pseudo-stationary shock profiles can also be used as initial conditions of unsteady multidimensional simulations.
NASA Technical Reports Server (NTRS)
Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)
2002-01-01
The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
NASA Astrophysics Data System (ADS)
Vera, N. C.; GMMC
2013-05-01
In this paper we present the results of macrohybrid mixed Darcian flow in porous media in a general three-dimensional domain. The global problem is solved as a set of local subproblems which are posed using a domain decomposition method. Unknown fields of local problems, velocity and pressure are approximated using mixed finite elements. For this application, a general three-dimensional domain is considered which is discretized using tetrahedra. The discrete domain is decomposed into subdomains and reformulated the original problem as a set of subproblems, communicated through their interfaces. To solve this set of subproblems, we use finite element mixed and parallel computing. The parallelization of a problem using this methodology can, in principle, to fully exploit a computer equipment and also provides results in less time, two very important elements in modeling. Referencias G.Alduncin and N.Vera-Guzmán Parallel proximal-point algorithms for mixed _nite element models of _ow in the subsurface, Commun. Numer. Meth. Engng 2004; 20:83-104 (DOI: 10.1002/cnm.647) Z. Chen, G.Huan and Y. Ma Computational Methods for Multiphase Flows in Porous Media, SIAM, Society for Industrial and Applied Mathematics, Philadelphia, 2006. A. Quarteroni and A. Valli, Numerical Approximation of Partial Differential Equations, Springer-Verlag, Berlin, 1994. Brezzi F, Fortin M. Mixed and Hybrid Finite Element Methods. Springer: New York, 1991.
A highly parallel multigrid-like method for the solution of the Euler equations
NASA Technical Reports Server (NTRS)
Tuminaro, Ray S.
1989-01-01
We consider a highly parallel multigrid-like method for the solution of the two dimensional steady Euler equations. The new method, introduced as filtering multigrid, is similar to a standard multigrid scheme in that convergence on the finest grid is accelerated by iterations on coarser grids. In the filtering method, however, additional fine grid subproblems are processed concurrently with coarse grid computations to further accelerate convergence. These additional problems are obtained by splitting the residual into a smooth and an oscillatory component. The smooth component is then used to form a coarse grid problem (similar to standard multigrid) while the oscillatory component is used for a fine grid subproblem. The primary advantage in the filtering approach is that fewer iterations are required and that most of the additional work per iteration can be performed in parallel with the standard coarse grid computations. We generalize the filtering algorithm to a version suitable for nonlinear problems. We emphasize that this generalization is conceptually straight-forward and relatively easy to implement. In particular, no explicit linearization (e.g., formation of Jacobians) needs to be performed (similar to the FAS multigrid approach). We illustrate the nonlinear version by applying it to the Euler equations, and presenting numerical results. Finally, a performance evaluation is made based on execution time models and convergence information obtained from numerical experiments.
Numerical solution of the Navier-Stokes equations by discontinuous Galerkin method
NASA Astrophysics Data System (ADS)
Krasnov, M. M.; Kuchugov, P. A.; E Ladonkina, M.; E Lutsky, A.; Tishkin, V. F.
2017-02-01
Detailed unstructured grids and numerical methods of high accuracy are frequently used in the numerical simulation of gasdynamic flows in areas with complex geometry. Galerkin method with discontinuous basis functions or Discontinuous Galerkin Method (DGM) works well in dealing with such problems. This approach offers a number of advantages inherent to both finite-element and finite-difference approximations. Moreover, the present paper shows that DGM schemes can be viewed as Godunov method extension to piecewise-polynomial functions. As is known, DGM involves significant computational complexity, and this brings up the question of ensuring the most effective use of all the computational capacity available. In order to speed up the calculations, operator programming method has been applied while creating the computational module. This approach makes possible compact encoding of mathematical formulas and facilitates the porting of programs to parallel architectures, such as NVidia CUDA and Intel Xeon Phi. With the software package, based on DGM, numerical simulations of supersonic flow past solid bodies has been carried out. The numerical results are in good agreement with the experimental ones.
NASA Astrophysics Data System (ADS)
Zhou, Wenhe; He, Xuan; Wu, Jianyun; Wang, Liangbi; Wang, Liangcheng
2017-07-01
The parallel plate capacitive humidity sensor based on the grid upper electrode is considered to be a promising one in some fields which require a humidity sensor with better dynamic characteristics. To strengthen the structure and balance the electric charge of the grid upper electrode, a strip is needed. However, it is the strip that keeps the dynamic characteristics of the sensor from being further improved. The numerical method is time- and cost-saving, but the numerical study on the response time of the sensor is just of bits and pieces. The numerical models presented by these studies did not consider the porosity effect of the polymer film on the dynamic characteristics. To overcome the defect of the grid upper electrode, a new structure of the upper electrode is provided by this paper first, and then a model considering the porosity effects of the polymer film on the dynamic characteristics is presented and validated. Finally, with the help of software FLUENT, parameter effects on the response time of the humidity sensor based on the microhole upper electrode are studied by the numerical method. The numerical results show that the response time of the microhole upper electrode sensor is 86% better than that of the grid upper electrode sensor, the response time of humidity sensor can be improved by reducing the hole spacing, increasing the aperture, reducing film thickness, and reasonably enlarging the porosity of the film.
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion
NASA Technical Reports Server (NTRS)
Bailey, David H.; Ferguson, Helaman R. P.
1988-01-01
Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
Impact of new computing systems on computational mechanics and flight-vehicle structures technology
NASA Technical Reports Server (NTRS)
Noor, A. K.; Storaasli, O. O.; Fulton, R. E.
1984-01-01
Advances in computer technology which may have an impact on computational mechanics and flight vehicle structures technology were reviewed. The characteristics of supersystems, highly parallel systems, and small systems are summarized. The interrelations of numerical algorithms and software with parallel architectures are discussed. A scenario for future hardware/software environment and engineering analysis systems is presented. Research areas with potential for improving the effectiveness of analysis methods in the new environment are identified.
Applications of Massive Mathematical Computations
1990-04-01
particles from the first principles of QCD . This problem is under intensive numerical study 11-6 using special purpose parallel supercomputers in...several places around the world. The method used here is the Monte Carlo integration for a fixed 3-D plus time lattices . Reliable results are still years...mathematical and theoretical physics, but its most promising applications are in the numerical realization of QCD computations. Our programs for the solution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
Parallel computation of three-dimensional aeroelastic fluid-structure interaction
NASA Astrophysics Data System (ADS)
Sadeghi, Mani
This dissertation presents a numerical method for the parallel computation of aeroelasticity (ParCAE). A flow solver is coupled to a structural solver by use of a fluid-structure interface method. The integration of the three-dimensional unsteady Navier-Stokes equations is performed in the time domain, simultaneously to the integration of a modal three-dimensional structural model. The flow solution is accelerated by using a multigrid method and a parallel multiblock approach. Fluid-structure coupling is achieved by subiteration. A grid-deformation algorithm is developed to interpolate the deformation of the structural boundaries onto the flow grid. The code is formulated to allow application to general, three-dimensional, complex configurations with multiple independent structures. Computational results are presented for various configurations, such as turbomachinery blade rows and aircraft wings. Investigations are performed on vortex-induced vibrations, effects of cascade mistuning on flutter, and cases of nonlinear cascade and wing flutter.
NASA Technical Reports Server (NTRS)
Logan, Terry G.
1994-01-01
The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.
Parallelization of elliptic solver for solving 1D Boussinesq model
NASA Astrophysics Data System (ADS)
Tarwidi, D.; Adytia, D.
2018-03-01
In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Plasma Jet Simulations Using a Generalized Ohm's Law
NASA Technical Reports Server (NTRS)
Ebersohn, Frans; Shebalin, John V.; Girimaji, Sharath S.
2012-01-01
Plasma jets are important physical phenomena in astrophysics and plasma propulsion devices. A currently proposed dual jet plasma propulsion device to be used for ISS experiments strongly resembles a coronal loop and further draws a parallel between these physical systems [1]. To study plasma jets we use numerical methods that solve the compressible MHD equations using the generalized Ohm s law [2]. Here, we will discuss the crucial underlying physics of these systems along with the numerical procedures we utilize to study them. Recent results from our numerical experiments will be presented and discussed.
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation
Harrison, Robert J.; Beylkin, Gregory; Bischoff, Florian A.; ...
2016-01-01
We present MADNESS (multiresolution adaptive numerical environment for scientific simulation) that is a high-level software environment for solving integral and differential equations in many dimensions that uses adaptive and fast harmonic analysis methods with guaranteed precision that are based on multiresolution analysis and separated representations. Underpinning the numerical capabilities is a powerful petascale parallel programming environment that aims to increase both programmer productivity and code scalability. This paper describes the features and capabilities of MADNESS and briefly discusses some current applications in chemistry and several areas of physics.
Analysis of composite ablators using massively parallel computation
NASA Technical Reports Server (NTRS)
Shia, David
1995-01-01
In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both problems. Results from ablative composite plate problems are compared with previous numerical results which did not include the gas storage term. It is found that the through-thickness temperature distribution is not affected much by the gas storage term. However, the through-thickness pressure and stress distributions, and the extent of chemical reactions are different from the previous numerical results. Two types of chemical reaction models are used in the restrained thermal growth testing problem: (1) pressure-independent Arrhenius type rate equations and (2) pressure-dependent Arrhenius type rate equations. The numerical results are compared to experimental results and the pressure-dependent model is able to capture the trend better than the pressure-independent one. Finally, a performance study is done on the hybrid algorithm using the ablative composite plate problem. It is found that there is a good speedup of performance on the CM-5. For 32 CPU's, the speedup of performance is 20. The efficiency of the algorithm is found to be a function of the size and execution time of a given problem and the effective parallelization of the algorithm. It also seems that there is an optimum number of CPU's to use for a given problem.
ICASE semiannual report, April 1 - September 30, 1989
NASA Technical Reports Server (NTRS)
1990-01-01
The Institute conducts unclassified basic research in applied mathematics, numerical analysis, and computer science in order to extend and improve problem-solving capabilities in science and engineering, particularly in aeronautics and space. The major categories of the current Institute for Computer Applications in Science and Engineering (ICASE) research program are: (1) numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; (2) control and parameter identification problems, with emphasis on effective numerical methods; (3) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and (4) computer systems and software, especially vector and parallel computers. ICASE reports are considered to be primarily preprints of manuscripts that have been submitted to appropriate research journals or that are to appear in conference proceedings.
NASA Astrophysics Data System (ADS)
Stellmach, Stephan; Hansen, Ulrich
2008-05-01
Numerical simulations of the process of convection and magnetic field generation in planetary cores still fail to reach geophysically realistic control parameter values. Future progress in this field depends crucially on efficient numerical algorithms which are able to take advantage of the newest generation of parallel computers. Desirable features of simulation algorithms include (1) spectral accuracy, (2) an operation count per time step that is small and roughly proportional to the number of grid points, (3) memory requirements that scale linear with resolution, (4) an implicit treatment of all linear terms including the Coriolis force, (5) the ability to treat all kinds of common boundary conditions, and (6) reasonable efficiency on massively parallel machines with tens of thousands of processors. So far, algorithms for fully self-consistent dynamo simulations in spherical shells do not achieve all these criteria simultaneously, resulting in strong restrictions on the possible resolutions. In this paper, we demonstrate that local dynamo models in which the process of convection and magnetic field generation is only simulated for a small part of a planetary core in Cartesian geometry can achieve the above goal. We propose an algorithm that fulfills the first five of the above criteria and demonstrate that a model implementation of our method on an IBM Blue Gene/L system scales impressively well for up to O(104) processors. This allows for numerical simulations at rather extreme parameter values.
GRAVIDY, a GPU modular, parallel direct-summation N-body integrator: dynamics with softening
NASA Astrophysics Data System (ADS)
Maureira-Fredes, Cristián; Amaro-Seoane, Pau
2018-01-01
A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and sources of gravitational radiation. The direct-summation of N gravitational forces is a complex problem with no analytical solution and can only be tackled with approximations and numerical methods. To this end, the Hermite scheme is a widely used integration method. With different numerical techniques and special-purpose hardware, it can be used to speed up the calculations. But these methods tend to be computationally slow and cumbersome to work with. We present a new graphics processing unit (GPU), direct-summation N-body integrator written from scratch and based on this scheme, which includes relativistic corrections for sources of gravitational radiation. GRAVIDY has high modularity, allowing users to readily introduce new physics, it exploits available computational resources and will be maintained by regular updates. GRAVIDY can be used in parallel on multiple CPUs and GPUs, with a considerable speed-up benefit. The single-GPU version is between one and two orders of magnitude faster than the single-CPU version. A test run using four GPUs in parallel shows a speed-up factor of about 3 as compared to the single-GPU version. The conception and design of this first release is aimed at users with access to traditional parallel CPU clusters or computational nodes with one or a few GPU cards.
Tempest - Efficient Computation of Atmospheric Flows Using High-Order Local Discretization Methods
NASA Astrophysics Data System (ADS)
Ullrich, P. A.; Guerra, J. E.
2014-12-01
The Tempest Framework composes several compact numerical methods to easily facilitate intercomparison of atmospheric flow calculations on the sphere and in rectangular domains. This framework includes the implementations of Spectral Elements, Discontinuous Galerkin, Flux Reconstruction, and Hybrid Finite Element methods with the goal of achieving optimal accuracy in the solution of atmospheric problems. Several advantages of this approach are discussed such as: improved pressure gradient calculation, numerical stability by vertical/horizontal splitting, arbitrary order of accuracy, etc. The local numerical discretization allows for high performance parallel computation and efficient inclusion of parameterizations. These techniques are used in conjunction with a non-conformal, locally refined, cubed-sphere grid for global simulations and standard Cartesian grids for simulations at the mesoscale. A complete implementation of the methods described is demonstrated in a non-hydrostatic setting.
NASA Technical Reports Server (NTRS)
Wigton, Larry
1996-01-01
Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach
NASA Technical Reports Server (NTRS)
Mak, Victor W. K.
1986-01-01
Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A
2012-01-01
At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM.more » To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.« less
Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castillo-Negrete, Diego del; Blazevski, Daniel
2016-04-15
Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in three-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands andmore » remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in large helical device and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude of modulated heat pulses.« less
Parallel solution of the symmetric tridiagonal eigenproblem. Research report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1989-10-01
This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel solution of the symmetric tridiagonal eigenproblem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1989-01-01
This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Implementation of a partitioned algorithm for simulation of large CSI problems
NASA Technical Reports Server (NTRS)
Alvin, Kenneth F.; Park, K. C.
1991-01-01
The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
Parallel fast multipole boundary element method applied to computational homogenization
NASA Astrophysics Data System (ADS)
Ptaszny, Jacek
2018-01-01
In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
OpenMP performance for benchmark 2D shallow water equations using LBM
NASA Astrophysics Data System (ADS)
Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry
2018-03-01
Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.
Topography Modeling in Atmospheric Flows Using the Immersed Boundary Method
NASA Technical Reports Server (NTRS)
Ackerman, A. S.; Senocak, I.; Mansour, N. N.; Stevens, D. E.
2004-01-01
Numerical simulation of flow over complex geometry needs accurate and efficient computational methods. Different techniques are available to handle complex geometry. The unstructured grid and multi-block body-fitted grid techniques have been widely adopted for complex geometry in engineering applications. In atmospheric applications, terrain fitted single grid techniques have found common use. Although these are very effective techniques, their implementation, coupling with the flow algorithm, and efficient parallelization of the complete method are more involved than a Cartesian grid method. The grid generation can be tedious and one needs to pay special attention in numerics to handle skewed cells for conservation purposes. Researchers have long sought for alternative methods to ease the effort involved in simulating flow over complex geometry.
Asymptotic-preserving Lagrangian approach for modeling anisotropic transport in magnetized plasmas
NASA Astrophysics Data System (ADS)
Chacon, Luis; Del-Castillo-Negrete, Diego
2012-03-01
Modeling electron transport in magnetized plasmas is extremely challenging due to the extreme anisotropy between parallel (to the magnetic field) and perpendicular directions (the transport-coefficient ratio χ/χ˜10^10 in fusion plasmas). Recently, a novel Lagrangian Green's function method has been proposedfootnotetextD. del-Castillo-Negrete, L. Chac'on, PRL, 106, 195004 (2011); D. del-Castillo-Negrete, L. Chac'on, Phys. Plasmas, submitted (2011) to solve the local and non-local purely parallel transport equation in general 3D magnetic fields. The approach avoids numerical pollution, is inherently positivity-preserving, and is scalable algorithmically (i.e., work per degree-of-freedom is grid-independent). In this poster, we discuss the extension of the Lagrangian Green's function approach to include perpendicular transport terms and sources. We present an asymptotic-preserving numerical formulation, which ensures a consistent numerical discretization temporally and spatially for arbitrary χ/χ ratios. We will demonstrate the potential of the approach with various challenging configurations, including the case of transport across a magnetic island in cylindrical geometry.
A software architecture for multidisciplinary applications: Integrating task and data parallelism
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Vanrosendale, John; Zima, Hans
1994-01-01
Data parallel languages such as Vienna Fortran and HPF can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are of a multidisciplinary and heterogeneous nature and thus do not fit well into the data parallel paradigm. In this paper we present new Fortran 90 language extensions to fill this gap. Tasks can be spawned as asynchronous activities in a homogeneous or heterogeneous computing environment; they interact by sharing access to Shared Data Abstractions (SDA's). SDA's are an extension of Fortran 90 modules, representing a pool of common data, together with a set of Methods for controlled access to these data and a mechanism for providing persistent storage. Our language supports the integration of data and task parallelism as well as nested task parallelism and thus can be used to express multidisciplinary applications in a natural and efficient way.
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.
Catarinucci, Luca; Tarricone, Luciano
2009-01-01
The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
Numerical Simulation of Transit-Time Ultrasonic Flowmeters by a Direct Approach.
Luca, Adrian; Marchiano, Regis; Chassaing, Jean-Camille
2016-06-01
This paper deals with the development of a computational code for the numerical simulation of wave propagation through domains with a complex geometry consisting in both solids and moving fluids. The emphasis is on the numerical simulation of ultrasonic flowmeters (UFMs) by modeling the wave propagation in solids with the equations of linear elasticity (ELE) and in fluids with the linearized Euler equations (LEEs). This approach requires high performance computing because of the high number of degrees of freedom and the long propagation distances. Therefore, the numerical method should be chosen with care. In order to minimize the numerical dissipation which may occur in this kind of configuration, the numerical method employed here is the nodal discontinuous Galerkin (DG) method. Also, this method is well suited for parallel computing. To speed up the code, almost all the computational stages have been implemented to run on graphical processing unit (GPU) by using the compute unified device architecture (CUDA) programming model from NVIDIA. This approach has been validated and then used for the two-dimensional simulation of gas UFMs. The large contrast of acoustic impedance characteristic to gas UFMs makes their simulation a real challenge.
NASA Astrophysics Data System (ADS)
Avitabile, Daniele; Bridges, Thomas J.
2010-06-01
Numerical integration of complex linear systems of ODEs depending analytically on an eigenvalue parameter are considered. Complex orthogonalization, which is required to stabilize the numerical integration, results in non-analytic systems. It is shown that properties of eigenvalues are still efficiently recoverable by extracting information from a non-analytic characteristic function. The orthonormal systems are constructed using the geometry of Stiefel bundles. Different forms of continuous orthogonalization in the literature are shown to correspond to different choices of connection one-form on the Stiefel bundle. For the numerical integration, Gauss-Legendre Runge-Kutta algorithms are the principal choice for preserving orthogonality, and performance results are shown for a range of GLRK methods. The theory and methods are tested by application to example boundary value problems including the Orr-Sommerfeld equation in hydrodynamic stability.
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics
NASA Technical Reports Server (NTRS)
Kurdila, Andrew J.
1990-01-01
While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners
Li, Ruipeng; Saad, Yousef
2017-08-01
This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Ruipeng; Saad, Yousef
This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Methods, Software and Tools for Three Numerical Applications. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
E. R. Jessup
2000-03-01
This is a report of the results of the authors work supported by DOE contract DE-FG03-97ER25325. They proposed to study three numerical problems. They are: (1) the extension of the PMESC parallel programming library; (2) the development of algorithms and software for certain generalized eigenvalue and singular value (SVD) problems, and (3) the application of techniques of linear algebra to an information retrieval technique known as latent semantic indexing (LSI).
Analysis of Time Filters in Multistep Methods
NASA Astrophysics Data System (ADS)
Hurl, Nicholas
Geophysical ow simulations have evolved sophisticated implicit-explicit time stepping methods (based on fast-slow wave splittings) followed by time filters to control any unstable models that result. Time filters are modular and parallel. Their effect on stability of the overall process has been tested in numerous simulations, but never analyzed. Stability is proven herein for the Crank-Nicolson Leapfrog (CNLF) method with the Robert-Asselin (RA) time filter and for the Crank-Nicolson Leapfrog method with the Robert-Asselin-Williams (RAW) time filter for systems by energy methods. We derive an equivalent multistep method for CNLF+RA and CNLF+RAW and stability regions are obtained. The time step restriction for energy stability of CNLF+RA is smaller than CNLF and CNLF+RAW time step restriction is even smaller. Numerical tests find that RA and RAW add numerical dissipation. This thesis also shows that all modes of the Crank-Nicolson Leap Frog (CNLF) method are asymptotically stable under the standard timestep condition.
A Conforming Multigrid Method for the Pure Traction Problem of Linear Elasticity: Mixed Formulation
NASA Technical Reports Server (NTRS)
Lee, Chang-Ock
1996-01-01
A multigrid method using conforming P-1 finite element is developed for the two-dimensional pure traction boundary value problem of linear elasticity. The convergence is uniform even as the material becomes nearly incompressible. A heuristic argument for acceleration of the multigrid method is discussed as well. Numerical results with and without this acceleration as well as performance estimates on a parallel computer are included.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishioka, K.; Nakamura, Y.; Nishimura, S.
A moment approach to calculate neoclassical transport in non-axisymmetric torus plasmas composed of multiple ion species is extended to include the external parallel momentum sources due to unbalanced tangential neutral beam injections (NBIs). The momentum sources that are included in the parallel momentum balance are calculated from the collision operators of background particles with fast ions. This method is applied for the clarification of the physical mechanism of the neoclassical parallel ion flows and the multi-ion species effect on them in Heliotron J NBI plasmas. It is found that parallel ion flow can be determined by the balance between themore » parallel viscosity and the external momentum source in the region where the external source is much larger than the thermodynamic force driven source in the collisional plasmas. This is because the friction between C{sup 6+} and D{sup +} prevents a large difference between C{sup 6+} and D{sup +} flow velocities in such plasmas. The C{sup 6+} flow velocities, which are measured by the charge exchange recombination spectroscopy system, are numerically evaluated with this method. It is shown that the experimentally measured C{sup 6+} impurity flow velocities do not contradict clearly with the neoclassical estimations, and the dependence of parallel flow velocities on the magnetic field ripples is consistent in both results.« less
A conservative scheme for electromagnetic simulation of magnetized plasmas with kinetic electrons
NASA Astrophysics Data System (ADS)
Bao, J.; Lin, Z.; Lu, Z. X.
2018-02-01
A conservative scheme has been formulated and verified for gyrokinetic particle simulations of electromagnetic waves and instabilities in magnetized plasmas. An electron continuity equation derived from the drift kinetic equation is used to time advance the electron density perturbation by using the perturbed mechanical flow calculated from the parallel vector potential, and the parallel vector potential is solved by using the perturbed canonical flow from the perturbed distribution function. In gyrokinetic particle simulations using this new scheme, the shear Alfvén wave dispersion relation in the shearless slab and continuum damping in the sheared cylinder have been recovered. The new scheme overcomes the stringent requirement in the conventional perturbative simulation method that perpendicular grid size needs to be as small as electron collisionless skin depth even for the long wavelength Alfvén waves. The new scheme also avoids the problem in the conventional method that an unphysically large parallel electric field arises due to the inconsistency between electrostatic potential calculated from the perturbed density and vector potential calculated from the perturbed canonical flow. Finally, the gyrokinetic particle simulations of the Alfvén waves in sheared cylinder have superior numerical properties compared with the fluid simulations, which suffer from numerical difficulties associated with singular mode structures.
NASA Astrophysics Data System (ADS)
Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir
2016-12-01
The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.
Parallel gene analysis with allele-specific padlock probes and tag microarrays
Banér, Johan; Isaksson, Anders; Waldenström, Erik; Jarvius, Jonas; Landegren, Ulf; Nilsson, Mats
2003-01-01
Parallel, highly specific analysis methods are required to take advantage of the extensive information about DNA sequence variation and of expressed sequences. We present a scalable laboratory technique suitable to analyze numerous target sequences in multiplexed assays. Sets of padlock probes were applied to analyze single nucleotide variation directly in total genomic DNA or cDNA for parallel genotyping or gene expression analysis. All reacted probes were then co-amplified and identified by hybridization to a standard tag oligonucleotide array. The technique was illustrated by analyzing normal and pathogenic variation within the Wilson disease-related ATP7B gene, both at the level of DNA and RNA, using allele-specific padlock probes. PMID:12930977
NASA Technical Reports Server (NTRS)
Kung, Ernest C.
1994-01-01
The contract research has been conducted in the following three major areas: analysis of numerical simulations and parallel observations of atmospheric blocking, diagnosis of the lower boundary heating and the response of the atmospheric circulation, and comprehensive assessment of long-range forecasting with numerical and regression methods. The essential scientific and developmental purpose of this contract research is to extend our capability of numerical weather forecasting by the comprehensive general circulation model. The systematic work as listed above is thus geared to developing a technological basis for future NASA long-range forecasting.
Parallel high-precision orbit propagation using the modified Picard-Chebyshev method
NASA Astrophysics Data System (ADS)
Koblick, Darin C.
2012-03-01
The modified Picard-Chebyshev method, when run in parallel, is thought to be more accurate and faster than the most efficient sequential numerical integration techniques when applied to orbit propagation problems. Previous experiments have shown that the modified Picard-Chebyshev method can have up to a one order magnitude speedup over the 12
A Discontinuous Galerkin Finite Element Method for Hamilton-Jacobi Equations
NASA Technical Reports Server (NTRS)
Hu, Changqing; Shu, Chi-Wang
1998-01-01
In this paper, we present a discontinuous Galerkin finite element method for solving the nonlinear Hamilton-Jacobi equations. This method is based on the Runge-Kutta discontinuous Galerkin finite element method for solving conservation laws. The method has the flexibility of treating complicated geometry by using arbitrary triangulation, can achieve high order accuracy with a local, compact stencil, and are suited for efficient parallel implementation. One and two dimensional numerical examples are given to illustrate the capability of the method.
NASA Astrophysics Data System (ADS)
Boyko, Oleksiy; Zheleznyak, Mark
2015-04-01
The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
A New Numerical Scheme for Cosmic-Ray Transport
NASA Astrophysics Data System (ADS)
Jiang, Yan-Fei; Oh, S. Peng
2018-02-01
Numerical solutions of the cosmic-ray (CR) magnetohydrodynamic equations are dogged by a powerful numerical instability, which arises from the constraint that CRs can only stream down their gradient. The standard cure is to regularize by adding artificial diffusion. Besides introducing ad hoc smoothing, this has a significant negative impact on either computational cost or complexity and parallel scalings. We describe a new numerical algorithm for CR transport, with close parallels to two-moment methods for radiative transfer under the reduced speed of light approximation. It stably and robustly handles CR streaming without any artificial diffusion. It allows for both isotropic and field-aligned CR streaming and diffusion, with arbitrary streaming and diffusion coefficients. CR transport is handled explicitly, while source terms are handled implicitly. The overall time step scales linearly with resolution (even when computing CR diffusion) and has a perfect parallel scaling. It is given by the standard Courant condition with respect to a constant maximum velocity over the entire simulation domain. The computational cost is comparable to that of solving the ideal MHD equation. We demonstrate the accuracy and stability of this new scheme with a wide variety of tests, including anisotropic streaming and diffusion tests, CR-modified shocks, CR-driven blast waves, and CR transport in multiphase media. The new algorithm opens doors to much more ambitious and hitherto intractable calculations of CR physics in galaxies and galaxy clusters. It can also be applied to other physical processes with similar mathematical structure, such as saturated, anisotropic heat conduction.
A Dynamic Finite Element Method for Simulating the Physics of Faults Systems
NASA Astrophysics Data System (ADS)
Saez, E.; Mora, P.; Gross, L.; Weatherley, D.
2004-12-01
We introduce a dynamic Finite Element method using a novel high level scripting language to describe the physical equations, boundary conditions and time integration scheme. The library we use is the parallel Finley library: a finite element kernel library, designed for solving large-scale problems. It is incorporated as a differential equation solver into a more general library called escript, based on the scripting language Python. This library has been developed to facilitate the rapid development of 3D parallel codes, and is optimised for the Australian Computational Earth Systems Simulator Major National Research Facility (ACcESS MNRF) supercomputer, a 208 processor SGI Altix with a peak performance of 1.1 TFlops. Using the scripting approach we obtain a parallel FE code able to take advantage of the computational efficiency of the Altix 3700. We consider faults as material discontinuities (the displacement, velocity, and acceleration fields are discontinuous at the fault), with elastic behavior. The stress continuity at the fault is achieved naturally through the expression of the fault interactions in the weak formulation. The elasticity problem is solved explicitly in time, using the Saint Verlat scheme. Finally, we specify a suitable frictional constitutive relation and numerical scheme to simulate fault behaviour. Our model is based on previous work on modelling fault friction and multi-fault systems using lattice solid-like models. We adapt the 2D model for simulating the dynamics of parallel fault systems described to the Finite-Element method. The approach uses a frictional relation along faults that is slip and slip-rate dependent, and the numerical integration approach introduced by Mora and Place in the lattice solid model. In order to illustrate the new Finite Element model, single and multi-fault simulation examples are presented.
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation
NASA Astrophysics Data System (ADS)
Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.
2011-03-01
A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad
1995-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster
NASA Astrophysics Data System (ADS)
Ikuno, Soichiro; Nakata, Susumu; Hirokawa, Yuta; Itoh, Taku
2015-01-01
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be divided into rectangle meshes, and it is difficult to adopt the problem in a complexed domain to the method. On the other hand, MTDM can be easily adept to the problem because MTDM does not requires meshes. In the present study, we implement MTDM on multi-GPU cluster to speedup the method, and numerically investigate the performance of the method on multi-GPU cluster. To reduce the computation time, the communication time between the decomposed domain is hided below the perfect matched layer (PML) calculation procedure. The results of computation show that speedup of MTDM on 128 GPUs is 173 times faster than that of single CPU calculation.
NASA Astrophysics Data System (ADS)
Wang, Ting; Plecháč, Petr
2017-12-01
Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.
NASA Astrophysics Data System (ADS)
Gruber, Ralph; Periaux, Jaques; Shaw, Richard Paul
Recent advances in computational mechanics are discussed in reviews and reports. Topics addressed include spectral superpositions on finite elements for shear banding problems, strain-based finite plasticity, numerical simulation of hypersonic viscous continuum flow, constitutive laws in solid mechanics, dynamics problems, fracture mechanics and damage tolerance, composite plates and shells, contact and friction, metal forming and solidification, coupling problems, and adaptive FEMs. Consideration is given to chemical flows, convection problems, free boundaries and artificial boundary conditions, domain-decomposition and multigrid methods, combustion and thermal analysis, wave propagation, mixed and hybrid FEMs, integral-equation methods, optimization, software engineering, and vector and parallel computing.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow
NASA Technical Reports Server (NTRS)
Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry
2005-01-01
Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
Variational data assimilation system "INM RAS - Black Sea"
NASA Astrophysics Data System (ADS)
Parmuzin, Eugene; Agoshkov, Valery; Assovskiy, Maksim; Giniatulin, Sergey; Zakharova, Natalia; Kuimov, Grigory; Fomin, Vladimir
2013-04-01
Development of Informational-Computational Systems (ICS) for Data Assimilation Procedures is one of multidisciplinary problems. To study and solve these problems one needs to apply modern results from different disciplines and recent developments in: mathematical modeling; theory of adjoint equations and optimal control; inverse problems; numerical methods theory; numerical algebra and scientific computing. The problems discussed above are studied in the Institute of Numerical Mathematics of the Russian Academy of Science (INM RAS) in ICS for Personal Computers (PC). Special problems and questions arise while effective ICS versions for PC are being developed. These problems and questions can be solved with applying modern methods of numerical mathematics and by solving "parallelism problem" using OpenMP technology and special linear algebra packages. In this work the results on the ICS development for PC-ICS "INM RAS - Black Sea" are presented. In the work the following problems and questions are discussed: practical problems that can be studied by ICS; parallelism problems and their solutions with applying of OpenMP technology and the linear algebra packages used in ICS "INM - Black Sea"; Interface of ICS. The results of ICS "INM RAS - Black Sea" testing are presented. Efficiency of technologies and methods applied are discussed. The work was supported by RFBR, grants No. 13-01-00753, 13-05-00715 and by The Ministry of education and science of Russian Federation, project 8291, project 11.519.11.1005 References: [1] V.I. Agoshkov, M.V. Assovskii, S.A. Lebedev, Numerical simulation of Black Sea hydrothermodynamics taking into account tide-forming forces. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 5-31 [2] E.I. Parmuzin, V.I. Agoshkov, Numerical solution of the variational assimilation problem for sea surface temperature in the model of the Black Sea dynamics. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 69-94 [3] V.B. Zalesny, N.A. Diansky, V.V. Fomin, S.N. Moshonkin, S.G. Demyshev, Numerical model of the circulation of Black Sea and Sea of Azov. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 95-111 [4] V.I. Agoshkov, S.V. Giniatulin, G.V. Kuimov. OpenMP technology and linear algebra packages in the variation data assimilation systems. - Abstracts of the 1-st China-Russia Conference on Numerical Algebra with Applications in Radiactive Hydrodynamics, Beijing, China, October 16-18, 2012. [5] Zakharova N.B., Agoshkov V.I., Parmuzin E.I., The new method of ARGO buoys system observation data interpolation. Russian Journal of Numerical Analysis and Mathematical Modelling. Vol. 28, Issue 1, 2013.
NASA Technical Reports Server (NTRS)
Oliger, Joseph
1997-01-01
Topics considered include: high-performance computing; cognitive and perceptual prostheses (computational aids designed to leverage human abilities); autonomous systems. Also included: development of a 3D unstructured grid code based on a finite volume formulation and applied to the Navier-stokes equations; Cartesian grid methods for complex geometry; multigrid methods for solving elliptic problems on unstructured grids; algebraic non-overlapping domain decomposition methods for compressible fluid flow problems on unstructured meshes; numerical methods for the compressible navier-stokes equations with application to aerodynamic flows; research in aerodynamic shape optimization; S-HARP: a parallel dynamic spectral partitioner; numerical schemes for the Hamilton-Jacobi and level set equations on triangulated domains; application of high-order shock capturing schemes to direct simulation of turbulence; multicast technology; network testbeds; supercomputer consolidation project.
A biconjugate gradient type algorithm on massively parallel architectures
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Hochbruck, Marlis
1991-01-01
The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
NASA Astrophysics Data System (ADS)
Beskopylny, Alexey; Kadomtseva, Elena; Strelnikov, Grigory
2017-10-01
The stress-strain state of a rectangular slab resting on an elastic foundation is considered. The slab material is isotropic. The slab has stiffening ribs that directed parallel to both sides of the plate. Solving equations are obtained for determining the deflection for various mechanical and geometric characteristics of the stiffening ribs which are parallel to different sides of the plate, having different rigidity for bending and torsion. The calculation scheme assumes an orthotropic slab having different cylindrical stiffness in two mutually perpendicular directions parallel to the reinforcing ribs. An elastic foundation is adopted by Winkler model. To determine the deflection the Bubnov-Galerkin method is used. The deflection is taken in the form of an expansion in a series with unknown coefficients by special polynomials, which are a combination of Legendre polynomials.
High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn
2014-11-14
Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbormore » points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.« less
Lattice Boltzmann model for simulation of magnetohydrodynamics
NASA Technical Reports Server (NTRS)
Chen, Shiyi; Chen, Hudong; Martinez, Daniel; Matthaeus, William
1991-01-01
A numerical method, based on a discrete Boltzmann equation, is presented for solving the equations of magnetohydrodynamics (MHD). The algorithm provides advantages similar to the cellular automaton method in that it is local and easily adapted to parallel computing environments. Because of much lower noise levels and less stringent requirements on lattice size, the method appears to be more competitive with traditional solution methods. Examples show that the model accurately reproduces both linear and nonlinear MHD phenomena.
Klein, Max; Sharma, Rati; Bohrer, Chris H; Avelis, Cameron M; Roberts, Elijah
2017-01-15
Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark CONTACT: eroberts@jhu.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing
NASA Astrophysics Data System (ADS)
Decyk, V. K.; Dauger, D. E.
We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
NASA Astrophysics Data System (ADS)
Kanaun, S.; Markov, A.
2017-06-01
An efficient numerical method for solution of static problems of elasticity for an infinite homogeneous medium containing inhomogeneities (cracks and inclusions) is developed. Finite number of heterogeneous inclusions and planar parallel cracks of arbitrary shapes is considered. The problem is reduced to a system of surface integral equations for crack opening vectors and volume integral equations for stress tensors inside the inclusions. For the numerical solution of these equations, a class of Gaussian approximating functions is used. The method based on these functions is mesh free. For such functions, the elements of the matrix of the discretized system are combinations of explicit analytical functions and five standard 1D-integrals that can be tabulated. Thus, the numerical integration is excluded from the construction of the matrix of the discretized problem. For regular node grids, the matrix of the discretized system has Toeplitz's properties, and Fast Fourier Transform technique can be used for calculation matrix-vector products of such matrices.
Fast non-overlapping Schwarz domain decomposition methods for solving the neutron diffusion equation
NASA Astrophysics Data System (ADS)
Jamelot, Erell; Ciarlet, Patrick
2013-05-01
Studying numerically the steady state of a nuclear core reactor is expensive, in terms of memory storage and computational time. In order to address both requirements, one can use a domain decomposition method, implemented on a parallel computer. We present here such a method for the mixed neutron diffusion equations, discretized with Raviart-Thomas-Nédélec finite elements. This method is based on the Schwarz iterative algorithm with Robin interface conditions to handle communications. We analyse this method from the continuous point of view to the discrete point of view, and we give some numerical results in a realistic highly heterogeneous 3D configuration. Computations are carried out with the MINOS solver of the APOLLO3® neutronics code. APOLLO3 is a registered trademark in France.
Turovets, Sergei; Volkov, Vasily; Zherdetsky, Aleksej; Prakonina, Alena; Malony, Allen D
2014-01-01
The Electrical Impedance Tomography (EIT) and electroencephalography (EEG) forward problems in anisotropic inhomogeneous media like the human head belongs to the class of the three-dimensional boundary value problems for elliptic equations with mixed derivatives. We introduce and explore the performance of several new promising numerical techniques, which seem to be more suitable for solving these problems. The proposed numerical schemes combine the fictitious domain approach together with the finite-difference method and the optimally preconditioned Conjugate Gradient- (CG-) type iterative method for treatment of the discrete model. The numerical scheme includes the standard operations of summation and multiplication of sparse matrices and vector, as well as FFT, making it easy to implement and eligible for the effective parallel implementation. Some typical use cases for the EIT/EEG problems are considered demonstrating high efficiency of the proposed numerical technique.
Fast Numerical Solution of the Plasma Response Matrix for Real-time Ideal MHD Control
DOE Office of Scientific and Technical Information (OSTI.GOV)
Glasser, Alexander; Kolemen, Egemen; Glasser, Alan H.
To help effectuate near real-time feedback control of ideal MHD instabilities in tokamak geometries, a parallelized version of A.H. Glasser’s DCON (Direct Criterion of Newcomb) code is developed. To motivate the numerical implementation, we first solve DCON’s δW formulation with a Hamilton-Jacobi theory, elucidating analytical and numerical features of the ideal MHD stability problem. The plasma response matrix is demonstrated to be the solution of an ideal MHD Riccati equation. We then describe our adaptation of DCON with numerical methods natural to solutions of the Riccati equation, parallelizing it to enable its operation in near real-time. We replace DCON’s serial integration of perturbed modes—which satisfy a singular Euler- Lagrange equation—with a domain-decomposed integration of state transition matrices. Output is shown to match results from DCON with high accuracy, and with computation time < 1s. Such computational speed may enable active feedback ideal MHD stability control, especially in plasmas whose ideal MHD equilibria evolve with inductive timescalemore » $$\\tau$$ ≳ 1s—as in ITER. Further potential applications of this theory are discussed.« less
Fast Numerical Solution of the Plasma Response Matrix for Real-time Ideal MHD Control
Glasser, Alexander; Kolemen, Egemen; Glasser, Alan H.
2018-03-26
To help effectuate near real-time feedback control of ideal MHD instabilities in tokamak geometries, a parallelized version of A.H. Glasser’s DCON (Direct Criterion of Newcomb) code is developed. To motivate the numerical implementation, we first solve DCON’s δW formulation with a Hamilton-Jacobi theory, elucidating analytical and numerical features of the ideal MHD stability problem. The plasma response matrix is demonstrated to be the solution of an ideal MHD Riccati equation. We then describe our adaptation of DCON with numerical methods natural to solutions of the Riccati equation, parallelizing it to enable its operation in near real-time. We replace DCON’s serial integration of perturbed modes—which satisfy a singular Euler- Lagrange equation—with a domain-decomposed integration of state transition matrices. Output is shown to match results from DCON with high accuracy, and with computation time < 1s. Such computational speed may enable active feedback ideal MHD stability control, especially in plasmas whose ideal MHD equilibria evolve with inductive timescalemore » $$\\tau$$ ≳ 1s—as in ITER. Further potential applications of this theory are discussed.« less
Separating stages of arithmetic verification: An ERP study with a novel paradigm.
Avancini, Chiara; Soltész, Fruzsina; Szűcs, Dénes
2015-08-01
In studies of arithmetic verification, participants typically encounter two operands and they carry out an operation on these (e.g. adding them). Operands are followed by a proposed answer and participants decide whether this answer is correct or incorrect. However, interpretation of results is difficult because multiple parallel, temporally overlapping numerical and non-numerical processes of the human brain may contribute to task execution. In order to overcome this problem here we used a novel paradigm specifically designed to tease apart the overlapping cognitive processes active during arithmetic verification. Specifically, we aimed to separate effects related to detection of arithmetic correctness, detection of the violation of strategic expectations, detection of physical stimulus properties mismatch and numerical magnitude comparison (numerical distance effects). Arithmetic correctness, physical stimulus properties and magnitude information were not task-relevant properties of the stimuli. We distinguished between a series of temporally highly overlapping cognitive processes which in turn elicited overlapping ERP effects with distinct scalp topographies. We suggest that arithmetic verification relies on two major temporal phases which include parallel running processes. Our paradigm offers a new method for investigating specific arithmetic verification processes in detail. Copyright © 2015 Elsevier Ltd. All rights reserved.
ColDICE: A parallel Vlasov–Poisson solver using moving adaptive simplicial tessellation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sousbie, Thierry, E-mail: tsousbie@gmail.com; Department of Physics, The University of Tokyo, Tokyo 113-0033; Research Center for the Early Universe, School of Science, The University of Tokyo, Tokyo 113-0033
2016-09-15
Resolving numerically Vlasov–Poisson equations for initially cold systems can be reduced to following the evolution of a three-dimensional sheet evolving in six-dimensional phase-space. We describe a public parallel numerical algorithm consisting in representing the phase-space sheet with a conforming, self-adaptive simplicial tessellation of which the vertices follow the Lagrangian equations of motion. The algorithm is implemented both in six- and four-dimensional phase-space. Refinement of the tessellation mesh is performed using the bisection method and a local representation of the phase-space sheet at second order relying on additional tracers created when needed at runtime. In order to preserve in the bestmore » way the Hamiltonian nature of the system, refinement is anisotropic and constrained by measurements of local Poincaré invariants. Resolution of Poisson equation is performed using the fast Fourier method on a regular rectangular grid, similarly to particle in cells codes. To compute the density projected onto this grid, the intersection of the tessellation and the grid is calculated using the method of Franklin and Kankanhalli [65–67] generalised to linear order. As preliminary tests of the code, we study in four dimensional phase-space the evolution of an initially small patch in a chaotic potential and the cosmological collapse of a fluctuation composed of two sinusoidal waves. We also perform a “warm” dark matter simulation in six-dimensional phase-space that we use to check the parallel scaling of the code.« less
High performance computation of radiative transfer equation using the finite element method
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Favennec, Y.
2018-05-01
This article deals with an efficient strategy for numerically simulating radiative transfer phenomena using distributed computing. The finite element method alongside the discrete ordinate method is used for spatio-angular discretization of the monochromatic steady-state radiative transfer equation in an anisotropically scattering media. Two very different methods of parallelization, angular and spatial decomposition methods, are presented. To do so, the finite element method is used in a vectorial way. A detailed comparison of scalability, performance, and efficiency on thousands of processors is established for two- and three-dimensional heterogeneous test cases. Timings show that both algorithms scale well when using proper preconditioners. It is also observed that our angular decomposition scheme outperforms our domain decomposition method. Overall, we perform numerical simulations at scales that were previously unattainable by standard radiative transfer equation solvers.
Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code
NASA Astrophysics Data System (ADS)
Longoni, Gianluca; Anderson, Stanwood L.
2009-08-01
The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.
The 3-D numerical simulation research of vacuum injector for linear induction accelerator
NASA Astrophysics Data System (ADS)
Liu, Dagang; Xie, Mengjun; Tang, Xinbing; Liao, Shuqing
2017-01-01
Simulation method for voltage in-feed and electron injection of vacuum injector is given, and verification of the simulated voltage and current is carried out. The numerical simulation for the magnetic field of solenoid is implemented, and a comparative analysis is conducted between the simulation results and experimental results. A semi-implicit difference algorithm is adopted to suppress the numerical noise, and a parallel acceleration algorithm is used for increasing the computation speed. The RMS emittance calculation method of the beam envelope equations is analyzed. In addition, the simulated results of RMS emittance are compared with the experimental data. Finally, influences of the ferromagnetic rings on the radial and axial magnetic fields of solenoid as well as the emittance of beam are studied.
Planet-disc interactions with Discontinuous Galerkin Methods using GPUs
NASA Astrophysics Data System (ADS)
Velasco Romero, David A.; Veiga, Maria Han; Teyssier, Romain; Masset, Frédéric S.
2018-05-01
We present a two-dimensional Cartesian code based on high order discontinuous Galerkin methods, implemented to run in parallel over multiple GPUs. A simple planet-disc setup is used to compare the behaviour of our code against the behaviour found using the FARGO3D code with a polar mesh. We make use of the time dependence of the torque exerted by the disc on the planet as a mean to quantify the numerical viscosity of the code. We find that the numerical viscosity of the Keplerian flow can be as low as a few 10-8r2Ω, r and Ω being respectively the local orbital radius and frequency, for fifth order schemes and resolution of ˜10-2r. Although for a single disc problem a solution of low numerical viscosity can be obtained at lower computational cost with FARGO3D (which is nearly an order of magnitude faster than a fifth order method), discontinuous Galerkin methods appear promising to obtain solutions of low numerical viscosity in more complex situations where the flow cannot be captured on a polar or spherical mesh concentric with the disc.
NASA Technical Reports Server (NTRS)
Menon, R. G.; Kurdila, A. J.
1992-01-01
This paper presents a concurrent methodology to simulate the dynamics of flexible multibody systems with a large number of degrees of freedom. A general class of open-loop structures is treated and a redundant coordinate formulation is adopted. A range space method is used in which the constraint forces are calculated using a preconditioned conjugate gradient method. By using a preconditioner motivated by the regular ordering of the directed graph of the structures, it is shown that the method is order N in the total number of coordinates of the system. The overall formulation has the advantage that it permits fine parallelization and does not rely on system topology to induce concurrency. It can be efficiently implemented on the present generation of parallel computers with a large number of processors. Validation of the method is presented via numerical simulations of space structures incorporating large number of flexible degrees of freedom.
NASA Astrophysics Data System (ADS)
Heister, Timo; Dannberg, Juliane; Gassmöller, Rene; Bangerth, Wolfgang
2017-08-01
Computations have helped elucidate the dynamics of Earth's mantle for several decades already. The numerical methods that underlie these simulations have greatly evolved within this time span, and today include dynamically changing and adaptively refined meshes, sophisticated and efficient solvers, and parallelization to large clusters of computers. At the same time, many of the methods - discussed in detail in a previous paper in this series - were developed and tested primarily using model problems that lack many of the complexities that are common to the realistic models our community wants to solve today. With several years of experience solving complex and realistic models, we here revisit some of the algorithm designs of the earlier paper and discuss the incorporation of more complex physics. In particular, we re-consider time stepping and mesh refinement algorithms, evaluate approaches to incorporate compressibility, and discuss dealing with strongly varying material coefficients, latent heat, and how to track chemical compositions and heterogeneities. Taken together and implemented in a high-performance, massively parallel code, the techniques discussed in this paper then allow for high resolution, 3-D, compressible, global mantle convection simulations with phase transitions, strongly temperature dependent viscosity and realistic material properties based on mineral physics data.
Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver
NASA Technical Reports Server (NTRS)
Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)
2002-01-01
The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
NASA Astrophysics Data System (ADS)
Herrera, I.; Herrera, G. S.
2015-12-01
Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
DOE Office of Scientific and Technical Information (OSTI.GOV)
BAILEY, DAVID H.; BORWEIN, JONATHAN M.
A recent paper by the present authors, together with mathematical physicists David Broadhurst and M. Larry Glasser, explored Bessel moment integrals, namely definite integrals of the general form {integral}{sub 0}{sup {infinity}} t{sup m}f{sup n}(t) dt, where the function f(t) is one of the classical Bessel functions. In that paper, numerous previously unknown analytic evaluations were obtained, using a combination of analytic methods together with some fairly high-powered numerical computations, often performed on highly parallel computers. In several instances, while we were able to numerically discover what appears to be a solid analytic identity, based on extremely high-precision numerical computations, wemore » were unable to find a rigorous proof. Thus we present here a brief list of some of these unproven but numerically confirmed identities.« less
Parallel language constructs for tensor product computations on loosely coupled architectures
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Vanrosendale, John
1989-01-01
Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
NASA Astrophysics Data System (ADS)
Shahzad, M.; Rizvi, H.; Panwar, A.; Ryu, C. M.
2017-06-01
We have re-visited the existence criterion of the reverse shear Alfven eigenmodes (RSAEs) in the presence of the parallel equilibrium current by numerically solving the eigenvalue equation using a fast eigenvalue solver code KAES. The parallel equilibrium current can bring in the kink effect and is known to be strongly unfavorable for the RSAE. We have numerically estimated the critical value of the toroidicity factor Qtor in a circular tokamak plasma, above which RSAEs can exist, and compared it to the analytical one. The difference between the numerical and analytical critical values is small for low frequency RSAEs, but it increases as the frequency of the mode increases, becoming greater for higher poloidal harmonic modes.
NASA Astrophysics Data System (ADS)
Arteaga, Santiago Egido
1998-12-01
The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the linearization strategies considered and whose computational cost is negligible. The algebraic properties of these systems depend on both the discretization and nonlinear method used. We study in detail the positive definiteness and skewsymmetry of the advection submatrices (essentially, convection-diffusion problems). We propose a discretization based on a new trilinear form for Newton's method. We solve the linear systems using three Krylov subspace methods, GMRES, QMR and TFQMR, and compare the advantages of each. Our emphasis is on parallel algorithms, and so we consider preconditioners suitable for parallel computers such as line variants of the Jacobi and Gauss- Seidel methods, alternating direction implicit methods, and Chebyshev and least squares polynomial preconditioners. These work well for moderate viscosities (moderate Reynolds number). For small viscosities we show that effective parallel solution of the advection subproblem is a critical factor to improve performance. Implementation details on a CM-5 are presented.
NASA Astrophysics Data System (ADS)
Mukherjee, Anamitra; Patel, Niravkumar D.; Bishop, Chris; Dagotto, Elbio
2015-06-01
Lattice spin-fermion models are important to study correlated systems where quantum dynamics allows for a separation between slow and fast degrees of freedom. The fast degrees of freedom are treated quantum mechanically while the slow variables, generically referred to as the "spins," are treated classically. At present, exact diagonalization coupled with classical Monte Carlo (ED + MC) is extensively used to solve numerically a general class of lattice spin-fermion problems. In this common setup, the classical variables (spins) are treated via the standard MC method while the fermion problem is solved by exact diagonalization. The "traveling cluster approximation" (TCA) is a real space variant of the ED + MC method that allows to solve spin-fermion problems on lattice sizes with up to 103 sites. In this publication, we present a novel reorganization of the TCA algorithm in a manner that can be efficiently parallelized. This allows us to solve generic spin-fermion models easily on 104 lattice sites and with some effort on 105 lattice sites, representing the record lattice sizes studied for this family of models.
Parallel Solver for H(div) Problems Using Hybridization and AMG
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Chak S.; Vassilevski, Panayot S.
2016-01-15
In this paper, a scalable parallel solver is proposed for H(div) problems discretized by arbitrary order finite elements on general unstructured meshes. The solver is based on hybridization and algebraic multigrid (AMG). Unlike some previously studied H(div) solvers, the hybridization solver does not require discrete curl and gradient operators as additional input from the user. Instead, only some element information is needed in the construction of the solver. The hybridization results in a H1-equivalent symmetric positive definite system, which is then rescaled and solved by AMG solvers designed for H1 problems. Weak and strong scaling of the method are examinedmore » through several numerical tests. Our numerical results show that the proposed solver provides a promising alternative to ADS, a state-of-the-art solver [12], for H(div) problems. In fact, it outperforms ADS for higher order elements.« less
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram
This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less
A microkernel design for component-based parallel numerical software systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.
1999-01-13
What is the minimal software infrastructure and what type of conventions are needed to simplify development of sophisticated parallel numerical application codes using a variety of software components that are not necessarily available as source code? We propose an opaque object-based model where the objects are dynamically loadable from the file system or network. The microkernel required to manage such a system needs to include, at most: (1) a few basic services, namely--a mechanism for loading objects at run time via dynamic link libraries, and consistent schemes for error handling and memory management; and (2) selected methods that all objectsmore » share, to deal with object life (destruction, reference counting, relationships), and object observation (viewing, profiling, tracing). We are experimenting with these ideas in the context of extensible numerical software within the ALICE (Advanced Large-scale Integrated Computational Environment) project, where we are building the microkernel to manage the interoperability among various tools for large-scale scientific simulations. This paper presents some preliminary observations and conclusions from our work with microkernel design.« less
NASA Astrophysics Data System (ADS)
Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar
2017-12-01
We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
NASA Astrophysics Data System (ADS)
Russkova, Tatiana V.
2017-11-01
One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.
Self-sustained radial oscillating flows between parallel disks
NASA Astrophysics Data System (ADS)
Mochizuki, S.; Yang, W.-J.
1985-05-01
It is pointed out that radial flow between parallel circular disks is of interest in a number of physical systems such as hydrostatic air bearings, radial diffusers, and VTOL aircraft with centrally located downward-positioned jets. The present investigation is concerned with the problem of instability in radial flow between parallel disks. A time-dependent numerical study and experiments are conducted. Both approaches reveal the nucleation, growth, migration, and decay of annular separation bubbles (i.e. vortex or recirculation zones) in the laminar-flow region. A finite-difference technique is utilized to solve the full unsteady vorticity transport equation in the theoretical procedure, while the flow patterns in the experiments are visualized with the aid of dye-injection, hydrogen-bubble, and paraffin-mist methods. It is found that the separation and reattachment of shear layers in the radial flow through parallel disks are unsteady phenomena. The sequence of nucleation, growth, migration, and decay of the vortices is self-sustained.
Symplectic molecular dynamics simulations on specially designed parallel computers.
Borstnik, Urban; Janezic, Dusanka
2005-01-01
We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Domain decomposition methods in aerodynamics
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.; Saltz, Joel
1990-01-01
Compressible Euler equations are solved for two-dimensional problems by a preconditioned conjugate gradient-like technique. An approximate Riemann solver is used to compute the numerical fluxes to second order accuracy in space. Two ways to achieve parallelism are tested, one which makes use of parallelism inherent in triangular solves and the other which employs domain decomposition techniques. The vectorization/parallelism in triangular solves is realized by the use of a recording technique called wavefront ordering. This process involves the interpretation of the triangular matrix as a directed graph and the analysis of the data dependencies. It is noted that the factorization can also be done in parallel with the wave front ordering. The performances of two ways of partitioning the domain, strips and slabs, are compared. Results on Cray YMP are reported for an inviscid transonic test case. The performances of linear algebra kernels are also reported.
LDRD final report on massively-parallel linear programming : the parPCx system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar
2005-02-01
This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Data assimilation using a GPU accelerated path integral Monte Carlo approach
NASA Astrophysics Data System (ADS)
Quinn, John C.; Abarbanel, Henry D. I.
2011-09-01
The answers to data assimilation questions can be expressed as path integrals over all possible state and parameter histories. We show how these path integrals can be evaluated numerically using a Markov Chain Monte Carlo method designed to run in parallel on a graphics processing unit (GPU). We demonstrate the application of the method to an example with a transmembrane voltage time series of a simulated neuron as an input, and using a Hodgkin-Huxley neuron model. By taking advantage of GPU computing, we gain a parallel speedup factor of up to about 300, compared to an equivalent serial computation on a CPU, with performance increasing as the length of the observation time used for data assimilation increases.
GPU accelerated manifold correction method for spinning compact binaries
NASA Astrophysics Data System (ADS)
Ran, Chong-xi; Liu, Song; Zhong, Shuang-ying
2018-04-01
The graphics processing unit (GPU) acceleration of the manifold correction algorithm based on the compute unified device architecture (CUDA) technology is designed to simulate the dynamic evolution of the Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries. The feasibility and the efficiency of parallel computation on GPU have been confirmed by various numerical experiments. The numerical comparisons show that the accuracy on GPU execution of manifold corrections method has a good agreement with the execution of codes on merely central processing unit (CPU-based) method. The acceleration ability when the codes are implemented on GPU can increase enormously through the use of shared memory and register optimization techniques without additional hardware costs, implying that the speedup is nearly 13 times as compared with the codes executed on CPU for phase space scan (including 314 × 314 orbits). In addition, GPU-accelerated manifold correction method is used to numerically study how dynamics are affected by the spin-induced quadrupole-monopole interaction for black hole binary system.
NASA Astrophysics Data System (ADS)
Luo, Li; Wang, Xiao-Ping; Cai, Xiao-Chuan
2017-11-01
We study numerically the dynamics of a three-dimensional droplet spreading on a rough solid surface using a phase-field model consisting of the coupled Cahn-Hilliard and Navier-Stokes equations with a generalized Navier boundary condition (GNBC). An efficient finite element method on unstructured meshes is introduced to cope with the complex geometry of the solid surfaces. We extend the GNBC to surfaces with complex geometry by including its weak form along different normal and tangential directions in the finite element formulation. The semi-implicit time discretization scheme results in a decoupled system for the phase function, the velocity, and the pressure. In addition, a mass compensation algorithm is introduced to preserve the mass of the droplet. To efficiently solve the decoupled systems, we present a highly parallel solution strategy based on domain decomposition techniques. We validate the newly developed solution method through extensive numerical experiments, particularly for those phenomena that can not be achieved by two-dimensional simulations. On a surface with circular posts, we study how wettability of the rough surface depends on the geometry of the posts. The contact line motion for a droplet spreading over some periodic rough surfaces are also efficiently computed. Moreover, we study the spreading process of an impacting droplet on a microstructured surface, a qualitative agreement is achieved between the numerical and experimental results. The parallel performance suggests that the proposed solution algorithm is scalable with over 4,000 processors cores with tens of millions of unknowns.
NASA Astrophysics Data System (ADS)
Naumenko, Mikhail; Samarin, Viacheslav
2018-02-01
Modern parallel computing algorithm has been applied to the solution of the few-body problem. The approach is based on Feynman's continual integrals method implemented in C++ programming language using NVIDIA CUDA technology. A wide range of 3-body and 4-body bound systems has been considered including nuclei described as consisting of protons and neutrons (e.g., 3,4He) and nuclei described as consisting of clusters and nucleons (e.g., 6He). The correctness of the results was checked by the comparison with the exactly solvable 4-body oscillatory system and experimental data.
Zephyr: Open-source Parallel Seismic Waveform Inversion in an Integrated Python-based Framework
NASA Astrophysics Data System (ADS)
Smithyman, B. R.; Pratt, R. G.; Hadden, S. M.
2015-12-01
Seismic Full-Waveform Inversion (FWI) is an advanced method to reconstruct wave properties of materials in the Earth from a series of seismic measurements. These methods have been developed by researchers since the late 1980s, and now see significant interest from the seismic exploration industry. As researchers move towards implementing advanced numerical modelling (e.g., 3D, multi-component, anisotropic and visco-elastic physics), it is desirable to make use of a modular approach, minimizing the effort developing a new set of tools for each new numerical problem. SimPEG (http://simpeg.xyz) is an open source project aimed at constructing a general framework to enable geophysical inversion in various domains. In this abstract we describe Zephyr (https://github.com/bsmithyman/zephyr), which is a coupled research project focused on parallel FWI in the seismic context. The software is built on top of Python, Numpy and IPython, which enables very flexible testing and implementation of new features. Zephyr is an open source project, and is released freely to enable reproducible research. We currently implement a parallel, distributed seismic forward modelling approach that solves the 2.5D (two-and-one-half dimensional) viscoacoustic Helmholtz equation at a range modelling frequencies, generating forward solutions for a given source behaviour, and gradient solutions for a given set of observed data. Solutions are computed in a distributed manner on a set of heterogeneous workers. The researcher's frontend computer may be separated from the worker cluster by a network link to enable full support for computation on remote clusters from individual workstations or laptops. The present codebase introduces a numerical discretization equivalent to that used by FULLWV, a well-known seismic FWI research codebase. This makes it straightforward to compare results from Zephyr directly with FULLWV. The flexibility introduced by the use of a Python programming environment makes extension of the codebase with new methods much more straightforward. This enables comparison and integration of new efforts with existing results.
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids (global and local) to provide adaptive resolution and fast solution of PDEs. Like all such methods, it offers parallelism by using possibly many disconnected patches per level, but is hindered by the need to handle these levels sequentially. The finest levels must therefore wait for processing to be essentially completed on all the coarser ones. A recently developed asynchronous version of FAC, called AFAC, completely eliminates this bottleneck to parallelism. This paper describes timing results for AFAC, coupled with a simple load balancing scheme, applied to the solution of elliptic PDEs on an Intel iPSC hypercube. These tests include performance of certain processes necessary in adaptive methods, including moving grids and changing refinement. A companion paper reports on numerical and analytical results for estimating convergence factors of AFAC applied to very large scale examples.
Numerical simulation of disperse particle flows on a graphics processing unit
NASA Astrophysics Data System (ADS)
Sierakowski, Adam J.
In both nature and technology, we commonly encounter solid particles being carried within fluid flows, from dust storms to sediment erosion and from food processing to energy generation. The motion of uncountably many particles in highly dynamic flow environments characterizes the tremendous complexity of such phenomena. While methods exist for the full-scale numerical simulation of such systems, current computational capabilities require the simplification of the numerical task with significant approximation using closure models widely recognized as insufficient. There is therefore a fundamental need for the investigation of the underlying physical processes governing these disperse particle flows. In the present work, we develop a new tool based on the Physalis method for the first-principles numerical simulation of thousands of particles (a small fraction of an entire disperse particle flow system) in order to assist in the search for new reduced-order closure models. We discuss numerous enhancements to the efficiency and stability of the Physalis method, which introduces the influence of spherical particles to a fixed-grid incompressible Navier-Stokes flow solver using a local analytic solution to the flow equations. Our first-principles investigation demands the modeling of unresolved length and time scales associated with particle collisions. We introduce a collision model alongside Physalis, incorporating lubrication effects and proposing a new nonlinearly damped Hertzian contact model. By reproducing experimental studies from the literature, we document extensive validation of the methods. We discuss the implementation of our methods for massively parallel computation using a graphics processing unit (GPU). We combine Eulerian grid-based algorithms with Lagrangian particle-based algorithms to achieve computational throughput up to 90 times faster than the legacy implementation of Physalis for a single central processing unit. By avoiding all data communication between the GPU and the host system during the simulation, we utilize with great efficacy the GPU hardware with which many high performance computing systems are currently equipped. We conclude by looking forward to the future of Physalis with multi-GPU parallelization in order to perform resolved disperse flow simulations of more than 100,000 particles and further advance the development of reduced-order closure models.
Hesford, Andrew J; Tillett, Jason C; Astheimer, Jeffrey P; Waag, Robert C
2014-08-01
Accurate and efficient modeling of ultrasound propagation through realistic tissue models is important to many aspects of clinical ultrasound imaging. Simplified problems with known solutions are often used to study and validate numerical methods. Greater confidence in a time-domain k-space method and a frequency-domain fast multipole method is established in this paper by analyzing results for realistic models of the human breast. Models of breast tissue were produced by segmenting magnetic resonance images of ex vivo specimens into seven distinct tissue types. After confirming with histologic analysis by pathologists that the model structures mimicked in vivo breast, the tissue types were mapped to variations in sound speed and acoustic absorption. Calculations of acoustic scattering by the resulting model were performed on massively parallel supercomputer clusters using parallel implementations of the k-space method and the fast multipole method. The efficient use of these resources was confirmed by parallel efficiency and scalability studies using large-scale, realistic tissue models. Comparisons between the temporal and spectral results were performed in representative planes by Fourier transforming the temporal results. An RMS field error less than 3% throughout the model volume confirms the accuracy of the methods for modeling ultrasound propagation through human breast.
Numerical modeling for the retrofit of the hydraulic cooling subsystems in operating power plant
NASA Astrophysics Data System (ADS)
AlSaqoor, S.; Alahmer, A.; Al Quran, F.; Andruszkiewicz, A.; Kubas, K.; Regucki, P.; Wędrychowicz, W.
2017-08-01
This paper presents the possibility of using the numerical methods to analyze the work of hydraulic systems on the example of a cooling system of a power boiler auxiliary devices. The variety of conditions at which hydraulic system that operated in specific engineering subsystems requires an individualized approach to the model solutions that have been developed for these systems modernizing. A mathematical model of a series-parallel propagation for the cooling water was derived and iterative methods were used to solve the system of nonlinear equations. The results of numerical calculations made it possible to analyze different variants of a modernization of the studied system and to indicate its critical elements. An economic analysis of different options allows an investor to choose an optimal variant of a reconstruction of the installation.
Supercomputer modeling of flow past hypersonic flight vehicles
NASA Astrophysics Data System (ADS)
Ermakov, M. K.; Kryukov, I. A.
2017-02-01
A software platform for MPI-based parallel solution of the Navier-Stokes (Euler) equations for viscous heat-conductive compressible perfect gas on 3-D unstructured meshes is developed. The discretization and solution of the Navier-Stokes equations are constructed on generalized S.K. Godunov’s method and the second order approximation in space and time. Developed software platform allows to carry out effectively flow past hypersonic flight vehicles simulations for the Mach numbers 6 and higher, and numerical meshes with up to 1 billion numerical cells and with up to 128 processors.
Jalas, S.; Dornmair, I.; Lehe, R.; ...
2017-03-20
Particle in Cell (PIC) simulations are a widely used tool for the investigation of both laser- and beam-driven plasma acceleration. It is a known issue that the beam quality can be artificially degraded by numerical Cherenkov radiation (NCR) resulting primarily from an incorrectly modeled dispersion relation. Pseudo-spectral solvers featuring infinite order stencils can strongly reduce NCR - or even suppress it - and are therefore well suited to correctly model the beam properties. For efficient parallelization of the PIC algorithm, however, localized solvers are inevitable. Arbitrary order pseudo-spectral methods provide this needed locality. Yet, these methods can again be pronemore » to NCR. Here in this paper, we show that acceptably low solver orders are sufficient to correctly model the physics of interest, while allowing for parallel computation by domain decomposition.« less
NASA Astrophysics Data System (ADS)
Bolis, A.; Cantwell, C. D.; Moxey, D.; Serson, D.; Sherwin, S. J.
2016-09-01
A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
Sachetto Oliveira, Rafael; Martins Rocha, Bernardo; Burgarelli, Denise; Meira, Wagner; Constantinides, Christakis; Weber Dos Santos, Rodrigo
2018-02-01
The use of computer models as a tool for the study and understanding of the complex phenomena of cardiac electrophysiology has attained increased importance nowadays. At the same time, the increased complexity of the biophysical processes translates into complex computational and mathematical models. To speed up cardiac simulations and to allow more precise and realistic uses, 2 different techniques have been traditionally exploited: parallel computing and sophisticated numerical methods. In this work, we combine a modern parallel computing technique based on multicore and graphics processing units (GPUs) and a sophisticated numerical method based on a new space-time adaptive algorithm. We evaluate each technique alone and in different combinations: multicore and GPU, multicore and GPU and space adaptivity, multicore and GPU and space adaptivity and time adaptivity. All the techniques and combinations were evaluated under different scenarios: 3D simulations on slabs, 3D simulations on a ventricular mouse mesh, ie, complex geometry, sinus-rhythm, and arrhythmic conditions. Our results suggest that multicore and GPU accelerate the simulations by an approximate factor of 33×, whereas the speedups attained by the space-time adaptive algorithms were approximately 48. Nevertheless, by combining all the techniques, we obtained speedups that ranged between 165 and 498. The tested methods were able to reduce the execution time of a simulation by more than 498× for a complex cellular model in a slab geometry and by 165× in a realistic heart geometry simulating spiral waves. The proposed methods will allow faster and more realistic simulations in a feasible time with no significant loss of accuracy. Copyright © 2017 John Wiley & Sons, Ltd.
Mathematical and Numerical Aspects of the Adaptive Fast Multipole Poisson-Boltzmann Solver
Zhang, Bo; Lu, Benzhuo; Cheng, Xiaolin; ...
2013-01-01
This paper summarizes the mathematical and numerical theories and computational elements of the adaptive fast multipole Poisson-Boltzmann (AFMPB) solver. We introduce and discuss the following components in order: the Poisson-Boltzmann model, boundary integral equation reformulation, surface mesh generation, the nodepatch discretization approach, Krylov iterative methods, the new version of fast multipole methods (FMMs), and a dynamic prioritization technique for scheduling parallel operations. For each component, we also remark on feasible approaches for further improvements in efficiency, accuracy and applicability of the AFMPB solver to large-scale long-time molecular dynamics simulations. Lastly, the potential of the solver is demonstrated with preliminary numericalmore » results.« less
NASA Astrophysics Data System (ADS)
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
Fast Particle Methods for Multiscale Phenomena Simulations
NASA Technical Reports Server (NTRS)
Koumoutsakos, P.; Wray, A.; Shariff, K.; Pohorille, Andrew
2000-01-01
We are developing particle methods oriented at improving computational modeling capabilities of multiscale physical phenomena in : (i) high Reynolds number unsteady vortical flows, (ii) particle laden and interfacial flows, (iii)molecular dynamics studies of nanoscale droplets and studies of the structure, functions, and evolution of the earliest living cell. The unifying computational approach involves particle methods implemented in parallel computer architectures. The inherent adaptivity, robustness and efficiency of particle methods makes them a multidisciplinary computational tool capable of bridging the gap of micro-scale and continuum flow simulations. Using efficient tree data structures, multipole expansion algorithms, and improved particle-grid interpolation, particle methods allow for simulations using millions of computational elements, making possible the resolution of a wide range of length and time scales of these important physical phenomena.The current challenges in these simulations are in : [i] the proper formulation of particle methods in the molecular and continuous level for the discretization of the governing equations [ii] the resolution of the wide range of time and length scales governing the phenomena under investigation. [iii] the minimization of numerical artifacts that may interfere with the physics of the systems under consideration. [iv] the parallelization of processes such as tree traversal and grid-particle interpolations We are conducting simulations using vortex methods, molecular dynamics and smooth particle hydrodynamics, exploiting their unifying concepts such as : the solution of the N-body problem in parallel computers, highly accurate particle-particle and grid-particle interpolations, parallel FFT's and the formulation of processes such as diffusion in the context of particle methods. This approach enables us to transcend among seemingly unrelated areas of research.
A Parallel Numerical Micromagnetic Code Using FEniCS
NASA Astrophysics Data System (ADS)
Nagy, L.; Williams, W.; Mitchell, L.
2013-12-01
Many problems in the geosciences depend on understanding the ability of magnetic minerals to provide stable paleomagnetic recordings. Numerical micromagnetic modelling allows us to calculate the domain structures found in naturally occurring magnetic materials. However the computational cost rises exceedingly quickly with respect to the size and complexity of the geometries that we wish to model. This problem is compounded by the fact that the modern processor design no longer focuses on the speed at which calculations are performed, but rather on the number of computational units amongst which we may distribute our calculations. Consequently to better exploit modern computational resources our micromagnetic simulations must "go parallel". We present a parallel and scalable micromagnetics code written using FEniCS. FEniCS is a multinational collaboration involving several institutions (University of Cambridge, University of Chicago, The Simula Research Laboratory, etc.) that aims to provide a set of tools for writing scientific software; in particular software that employs the finite element method. The advantages of this approach are the leveraging of pre-existing projects from the world of scientific computing (PETSc, Trilinos, Metis/Parmetis, etc.) and exposing these so that researchers may pose problems in a manner closer to the mathematical language of their domain. Our code provides a scriptable interface (in Python) that allows users to not only run micromagnetic models in parallel, but also to perform pre/post processing of data.
NASA Technical Reports Server (NTRS)
Talbot, Bryan; Zhou, Shu-Jia; Higgins, Glenn; Zukor, Dorothy (Technical Monitor)
2002-01-01
One of the most significant challenges in large-scale climate modeling, as well as in high-performance computing in other scientific fields, is that of effectively integrating many software models from multiple contributors. A software framework facilitates the integration task, both in the development and runtime stages of the simulation. Effective software frameworks reduce the programming burden for the investigators, freeing them to focus more on the science and less on the parallel communication implementation. while maintaining high performance across numerous supercomputer and workstation architectures. This document surveys numerous software frameworks for potential use in Earth science modeling. Several frameworks are evaluated in depth, including Parallel Object-Oriented Methods and Applications (POOMA), Cactus (from (he relativistic physics community), Overture, Goddard Earth Modeling System (GEMS), the National Center for Atmospheric Research Flux Coupler, and UCLA/UCB Distributed Data Broker (DDB). Frameworks evaluated in less detail include ROOT, Parallel Application Workspace (PAWS), and Advanced Large-Scale Integrated Computational Environment (ALICE). A host of other frameworks and related tools are referenced in this context. The frameworks are evaluated individually and also compared with each other.
NASA Technical Reports Server (NTRS)
Peterson, Victor L.; Kim, John; Holst, Terry L.; Deiwert, George S.; Cooper, David M.; Watson, Andrew B.; Bailey, F. Ron
1992-01-01
Report evaluates supercomputer needs of five key disciplines: turbulence physics, aerodynamics, aerothermodynamics, chemistry, and mathematical modeling of human vision. Predicts these fields will require computer speed greater than 10(Sup 18) floating-point operations per second (FLOP's) and memory capacity greater than 10(Sup 15) words. Also, new parallel computer architectures and new structured numerical methods will make necessary speed and capacity available.
Conformal anomaly of some 2-d Z (n) models
NASA Astrophysics Data System (ADS)
William, Peter
1991-01-01
We describe a numerical calculation of the conformal anomaly in the case of some two-dimensional statistical models undergoing a second-order phase transition, utilizing a recently developed method to compute the partition function exactly. This computation is carried out on a massively parallel CM2 machine, using the finite size scaling behaviour of the free energy.
NASA Technical Reports Server (NTRS)
Lee, Jeh Won
1990-01-01
The objective is the theoretical analysis and the experimental verification of dynamics and control of a two link flexible manipulator with a flexible parallel link mechanism. Nonlinear equations of motion of the lightweight manipulator are derived by the Lagrangian method in symbolic form to better understand the structure of the dynamic model. The resulting equation of motion have a structure which is useful to reduce the number of terms calculated, to check correctness, or to extend the model to higher order. A manipulator with a flexible parallel link mechanism is a constrained dynamic system whose equations are sensitive to numerical integration error. This constrained system is solved using singular value decomposition of the constraint Jacobian matrix. Elastic motion is expressed by the assumed mode method. Mode shape functions of each link are chosen using the load interfaced component mode synthesis. The discrepancies between the analytical model and the experiment are explained using a simplified and a detailed finite element model.
Xia, Yidong; Luo, Hong; Frisbey, Megan; ...
2014-07-01
A set of implicit methods are proposed for a third-order hierarchical WENO reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids. An attractive feature in these methods are the application of the Jacobian matrix based on the P1 element approximation, resulting in a huge reduction of memory requirement compared with DG (P2). Also, three approaches -- analytical derivation, divided differencing, and automatic differentiation (AD) are presented to construct the Jacobian matrix respectively, where the AD approach shows the best robustness. A variety of compressible flow problems are computed to demonstrate the fast convergence property of the implemented flowmore » solver. Furthermore, an SPMD (single program, multiple data) programming paradigm based on MPI is proposed to achieve parallelism. The numerical results on complex geometries indicate that this low-storage implicit method can provide a viable and attractive DG solution for complicated flows of practical importance.« less
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases
NASA Astrophysics Data System (ADS)
Schaerer, Roman Pascal; Bansal, Pratyuksh; Torrilhon, Manuel
2017-07-01
We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) [13], we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropy distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.
Study on the wind field and pollutant dispersion in street canyons using a stable numerical method.
Xia, Ji-Yang; Leung, Dennis Y C
2005-01-01
A stable finite element method for the time dependent Navier-Stokes equations was used for studying the wind flow and pollutant dispersion within street canyons. A three-step fractional method was used to solve the velocity field and the pressure field separately from the governing equations. The Streamline Upwind Petrov-Galerkin (SUPG) method was used to get stable numerical results. Numerical oscillation was minimized and satisfactory results can be obtained for flows at high Reynolds numbers. Simulating the flow over a square cylinder within a wide range of Reynolds numbers validates the wind field model. The Strouhal numbers obtained from the numerical simulation had a good agreement with those obtained from experiment. The wind field model developed in the present study is applied to simulate more complex flow phenomena in street canyons with two different building configurations. The results indicated that the flow at rooftop of buildings might not be assumed parallel to the ground as some numerical modelers did. A counter-clockwise rotating vortex may be found in street canyons with an inflow from the left to right. In addition, increasing building height can increase velocity fluctuations in the street canyon under certain circumstances, which facilitate pollutant dispersion. At high Reynolds numbers, the flow regimes in street canyons do not change with inflow velocity.
NASA Technical Reports Server (NTRS)
Liou, Meng-Sing
1995-01-01
A unique formulation of describing fluid motion is presented. The method, referred to as 'extended Lagrangian method,' is interesting from both theoretical and numerical points of view. The formulation offers accuracy in numerical solution by avoiding numerical diffusion resulting from mixing of fluxes in the Eulerian description. The present method and the Arbitrary Lagrangian-Eulerian (ALE) method have a similarity in spirit-eliminating the cross-streamline numerical diffusion. For this purpose, we suggest a simple grid constraint condition and utilize an accurate discretization procedure. This grid constraint is only applied to the transverse cell face parallel to the local stream velocity, and hence our method for the steady state problems naturally reduces to the streamline-curvature method, without explicitly solving the steady stream-coordinate equations formulated a priori. Unlike the Lagrangian method proposed by Loh and Hui which is valid only for steady supersonic flows, the present method is general and capable of treating subsonic flows and supersonic flows as well as unsteady flows, simply by invoking in the same code an appropriate grid constraint suggested in this paper. The approach is found to be robust and stable. It automatically adapts to flow features without resorting to clustering, thereby maintaining rather uniform grid spacing throughout and large time step. Moreover, the method is shown to resolve multi-dimensional discontinuities with a high level of accuracy, similar to that found in one-dimensional problems.
Advances in locally constrained k-space-based parallel MRI.
Samsonov, Alexey A; Block, Walter F; Arunachalam, Arjun; Field, Aaron S
2006-02-01
In this article, several theoretical and methodological developments regarding k-space-based, locally constrained parallel MRI (pMRI) reconstruction are presented. A connection between Parallel MRI with Adaptive Radius in k-Space (PARS) and GRAPPA methods is demonstrated. The analysis provides a basis for unified treatment of both methods. Additionally, a weighted PARS reconstruction is proposed, which may absorb different weighting strategies for improved image reconstruction. Next, a fast and efficient method for pMRI reconstruction of data sampled on non-Cartesian trajectories is described. In the new technique, the computational burden associated with the numerous matrix inversions in the original PARS method is drastically reduced by limiting direct calculation of reconstruction coefficients to only a few reference points. The rest of the coefficients are found by interpolating between the reference sets, which is possible due to the similar configuration of points participating in reconstruction for highly symmetric trajectories, such as radial and spirals. As a result, the time requirements are drastically reduced, which makes it practical to use pMRI with non-Cartesian trajectories in many applications. The new technique was demonstrated with simulated and actual data sampled on radial trajectories. Copyright 2006 Wiley-Liss, Inc.
Scalable direct Vlasov solver with discontinuous Galerkin method on unstructured mesh.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, J.; Ostroumov, P. N.; Mustapha, B.
2010-12-01
This paper presents the development of parallel direct Vlasov solvers with discontinuous Galerkin (DG) method for beam and plasma simulations in four dimensions. Both physical and velocity spaces are in two dimesions (2P2V) with unstructured mesh. Contrary to the standard particle-in-cell (PIC) approach for kinetic space plasma simulations, i.e., solving Vlasov-Maxwell equations, direct method has been used in this paper. There are several benefits to solving a Vlasov equation directly, such as avoiding noise associated with a finite number of particles and the capability to capture fine structure in the plasma. The most challanging part of a direct Vlasov solvermore » comes from higher dimensions, as the computational cost increases as N{sup 2d}, where d is the dimension of the physical space. Recently, due to the fast development of supercomputers, the possibility has become more realistic. Many efforts have been made to solve Vlasov equations in low dimensions before; now more interest has focused on higher dimensions. Different numerical methods have been tried so far, such as the finite difference method, Fourier Spectral method, finite volume method, and spectral element method. This paper is based on our previous efforts to use the DG method. The DG method has been proven to be very successful in solving Maxwell equations, and this paper is our first effort in applying the DG method to Vlasov equations. DG has shown several advantages, such as local mass matrix, strong stability, and easy parallelization. These are particularly suitable for Vlasov equations. Domain decomposition in high dimensions has been used for parallelization; these include a highly scalable parallel two-dimensional Poisson solver. Benchmark results have been shown and simulation results will be reported.« less
Numerical and experimental studies of hydrodynamics of flapping foils
NASA Astrophysics Data System (ADS)
Zhou, Kai; Liu, Jun-kao; Chen, Wei-shan
2018-04-01
The flapping foil based on bionics is a sort of simplified models which imitate the motion of wings or fins of fish or birds. In this paper, a universal kinematic model with three degrees of freedom is adopted and the motion parallel to the flow direction is considered. The force coefficients, the torque coefficient, and the flow field characteristics are extracted and analyzed. Then the propulsive efficiency is calculated. The influence of the motion parameters on the hydrodynamic performance of the bionic foil is studied. The results show that the motion parameters play important roles in the hydrodynamic performance of the flapping foil. To validate the reliability of the numerical method used in this paper, an experiment platform is designed and verification experiments are carried out. Through the comparison, it is found that the numerical results compare well with the experimental results, to show that the adopted numerical method is reliable. The results of this paper provide a theoretical reference for the design of underwater vehicles based on the flapping propulsion.
NASA Astrophysics Data System (ADS)
Recent advances in computational fluid dynamics are discussed in reviews and reports. Topics addressed include large-scale LESs for turbulent pipe and channel flows, numerical solutions of the Euler and Navier-Stokes equations on parallel computers, multigrid methods for steady high-Reynolds-number flow past sudden expansions, finite-volume methods on unstructured grids, supersonic wake flow on a blunt body, a grid-characteristic method for multidimensional gas dynamics, and CIC numerical simulation of a wave boundary layer. Consideration is given to vortex simulations of confined two-dimensional jets, supersonic viscous shear layers, spectral methods for compressible flows, shock-wave refraction at air/water interfaces, oscillatory flow in a two-dimensional collapsible channel, the growth of randomness in a spatially developing wake, and an efficient simplex algorithm for the finite-difference and dynamic linear-programming method in optimal potential control.
Array-based Hierarchical Mesh Generation in Parallel
Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...
2015-11-03
In this paper, we describe an array-based hierarchical mesh generation capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial mesh that can be used for a number of purposes such as multi-level methods to generating large meshes. The capability is developed under the parallel mesh framework “Mesh Oriented dAtaBase” a.k.a MOAB. We describe the underlying data structures and algorithms to generate such hierarchies and present numerical results for computational efficiency and mesh quality. Inmore » conclusion, we also present results to demonstrate the applicability of the developed capability to a multigrid finite-element solver.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jalas, S.; Dornmair, I.; Lehe, R.
Particle in Cell (PIC) simulations are a widely used tool for the investigation of both laser- and beam-driven plasma acceleration. It is a known issue that the beam quality can be artificially degraded by numerical Cherenkov radiation (NCR) resulting primarily from an incorrectly modeled dispersion relation. Pseudo-spectral solvers featuring infinite order stencils can strongly reduce NCR - or even suppress it - and are therefore well suited to correctly model the beam properties. For efficient parallelization of the PIC algorithm, however, localized solvers are inevitable. Arbitrary order pseudo-spectral methods provide this needed locality. Yet, these methods can again be pronemore » to NCR. Here in this paper, we show that acceptably low solver orders are sufficient to correctly model the physics of interest, while allowing for parallel computation by domain decomposition.« less
Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram
2013-04-09
A novel parallel algorithm for non-iterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [K. Bhaskaran-Nair, J.Brabec, E. Aprà, H.J.J. van Dam, J. Pittner, K. Kowalski, J. Chem. Phys. 137, 094112 (2012)] with the possibility of accelerating numerical calculations using graphics processing unit (GPU) is presented. We discuss the performance of this algorithm on the example of the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD (iterative singles and doubles) effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithmmore » is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.« less
NASA Astrophysics Data System (ADS)
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
The island dynamics model on parallel quadtree grids
NASA Astrophysics Data System (ADS)
Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic
2018-05-01
We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.
NASA Astrophysics Data System (ADS)
Lezina, Natalya; Agoshkov, Valery
2017-04-01
Domain decomposition method (DDM) allows one to present a domain with complex geometry as a set of essentially simpler subdomains. This method is particularly applied for the hydrodynamics of oceans and seas. In each subdomain the system of thermo-hydrodynamic equations in the Boussinesq and hydrostatic approximations is solved. The problem of obtaining solution in the whole domain is that it is necessary to combine solutions in subdomains. For this purposes iterative algorithm is created and numerical experiments are conducted to investigate an effectiveness of developed algorithm using DDM. For symmetric operators in DDM, Poincare-Steklov's operators [1] are used, but for the problems of the hydrodynamics, it is not suitable. In this case for the problem, adjoint equation method [2] and inverse problem theory are used. In addition, it is possible to create algorithms for the parallel calculations using DDM on multiprocessor computer system. DDM for the model of the Baltic Sea dynamics is numerically studied. The results of numerical experiments using DDM are compared with the solution of the system of hydrodynamic equations in the whole domain. The work was supported by the Russian Science Foundation (project 14-11-00609, the formulation of the iterative process and numerical experiments). [1] V.I. Agoshkov, Domain Decompositions Methods in the Mathematical Physics Problem // Numerical processes and systems, No 8, Moscow, 1991 (in Russian). [2] V.I. Agoshkov, Optimal Control Approaches and Adjoint Equations in the Mathematical Physics Problem, Institute of Numerical Mathematics, RAS, Moscow, 2003 (in Russian).
NASA Astrophysics Data System (ADS)
Miloichikova, I. A.; Bespalov, V. I.; Krasnykh, A. A.; Stuchebrov, S. G.; Cherepennikov, Yu. M.; Dusaev, R. R.
2018-04-01
Simulation by the Monte Carlo method is widely used to calculate the character of ionizing radiation interaction with substance. A wide variety of programs based on the given method allows users to choose the most suitable package for solving computational problems. In turn, it is important to know exactly restrictions of numerical systems to avoid gross errors. Results of estimation of the feasibility of application of the program PCLab (Computer Laboratory, version 9.9) for numerical simulation of the electron energy distribution absorbed in beryllium, aluminum, gold, and water for industrial, research, and clinical beams are presented. The data obtained using programs ITS and Geant4 being the most popular software packages for solving the given problems and the program PCLab are presented in the graphic form. A comparison and an analysis of the results obtained demonstrate the feasibility of application of the program PCLab for simulation of the absorbed energy distribution and dose of electrons in various materials for energies in the range 1-20 MeV.
García-Calvo, Raúl; Guisado, JL; Diaz-del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco
2018-01-01
Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes—master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)—is carried out for this problem. Several procedures that optimize the use of the GPU’s resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs). PMID:29662297
García-Calvo, Raúl; Guisado, J L; Diaz-Del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco
2018-01-01
Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs).
Numerical modeling of exciton-polariton Bose-Einstein condensate in a microcavity
NASA Astrophysics Data System (ADS)
Voronych, Oksana; Buraczewski, Adam; Matuszewski, Michał; Stobińska, Magdalena
2017-06-01
A novel, optimized numerical method of modeling of an exciton-polariton superfluid in a semiconductor microcavity was proposed. Exciton-polaritons are spin-carrying quasiparticles formed from photons strongly coupled to excitons. They possess unique properties, interesting from the point of view of fundamental research as well as numerous potential applications. However, their numerical modeling is challenging due to the structure of nonlinear differential equations describing their evolution. In this paper, we propose to solve the equations with a modified Runge-Kutta method of 4th order, further optimized for efficient computations. The algorithms were implemented in form of C++ programs fitted for parallel environments and utilizing vector instructions. The programs form the EPCGP suite which has been used for theoretical investigation of exciton-polaritons. Catalogue identifier: AFBQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD-3 No. of lines in distributed program, including test data, etc.: 2157 No. of bytes in distributed program, including test data, etc.: 498994 Distribution format: tar.gz Programming language: C++ with OpenMP extensions (main numerical program), Python (helper scripts). Computer: Modern PC (tested on AMD and Intel processors), HP BL2x220. Operating system: Unix/Linux and Windows. Has the code been vectorized or parallelized?: Yes (OpenMP) RAM: 200 MB for single run Classification: 7, 7.7. Nature of problem: An exciton-polariton superfluid is a novel, interesting physical system allowing investigation of high temperature Bose-Einstein condensation of exciton-polaritons-quasiparticles carrying spin. They have brought a lot of attention due to their unique properties and potential applications in polariton-based optoelectronic integrated circuits. This is an out-of-equilibrium quantum system confined within a semiconductor microcavity. It is described by a set of nonlinear differential equations similar in spirit to the Gross-Pitaevskii (GP) equation, but their unique properties do not allow standard GP solving frameworks to be utilized. Finding an accurate and efficient numerical algorithm as well as development of optimized numerical software is necessary for effective theoretical investigation of exciton-polaritons. Solution method: A Runge-Kutta method of 4th order was employed to solve the set of differential equations describing exciton-polariton superfluids. The method was fitted for the exciton-polariton equations and further optimized. The C++ programs utilize OpenMP extensions and vector operations in order to fully utilize the computer hardware. Running time: 6h for 100 ps evolution, depending on the values of parameters
NASA Technical Reports Server (NTRS)
Venkatachari, Balaji Shankar; Streett, Craig L.; Chang, Chau-Lyan; Friedlander, David J.; Wang, Xiao-Yen; Chang, Sin-Chung
2016-01-01
Despite decades of development of unstructured mesh methods, high-fidelity time-accurate simulations are still predominantly carried out on structured, or unstructured hexahedral meshes by using high-order finite-difference, weighted essentially non-oscillatory (WENO), or hybrid schemes formed by their combinations. In this work, the space-time conservation element solution element (CESE) method is used to simulate several flow problems including supersonic jet/shock interaction and its impact on launch vehicle acoustics, and direct numerical simulations of turbulent flows using tetrahedral meshes. This paper provides a status report for the continuing development of the space-time conservation element solution element (CESE) numerical and software framework under the Revolutionary Computational Aerosciences (RCA) project. Solution accuracy and large-scale parallel performance of the numerical framework is assessed with the goal of providing a viable paradigm for future high-fidelity flow physics simulations.
Regularization with numerical extrapolation for finite and UV-divergent multi-loop integrals
NASA Astrophysics Data System (ADS)
de Doncker, E.; Yuasa, F.; Kato, K.; Ishikawa, T.; Kapenga, J.; Olagbemi, O.
2018-03-01
We give numerical integration results for Feynman loop diagrams such as those covered by Laporta (2000) and by Baikov and Chetyrkin (2010), and which may give rise to loop integrals with UV singularities. We explore automatic adaptive integration using multivariate techniques from the PARINT package for multivariate integration, as well as iterated integration with programs from the QUADPACK package, and a trapezoidal method based on a double exponential transformation. PARINT is layered over MPI (Message Passing Interface), and incorporates advanced parallel/distributed techniques including load balancing among processes that may be distributed over a cluster or a network/grid of nodes. Results are included for 2-loop vertex and box diagrams and for sets of 2-, 3- and 4-loop self-energy diagrams with or without UV terms. Numerical regularization of integrals with singular terms is achieved by linear and non-linear extrapolation methods.
Aircraft optimization by a system approach: Achievements and trends
NASA Technical Reports Server (NTRS)
Sobieszczanski-Sobieski, Jaroslaw
1992-01-01
Recently emerging methodology for optimal design of aircraft treated as a system of interacting physical phenomena and parts is examined. The methodology is found to coalesce into methods for hierarchic, non-hierarchic, and hybrid systems all dependent on sensitivity analysis. A separate category of methods has also evolved independent of sensitivity analysis, hence suitable for discrete problems. References and numerical applications are cited. Massively parallel computer processing is seen as enabling technology for practical implementation of the methodology.
Palladium-Catalyzed Nitromethylation of Aryl Halides: An Orthogonal Formylation Equivalent
Walvoord, Ryan R.; Berritt, Simon; Kozlowski, Marisa C.
2012-01-01
An efficient cross-coupling reaction of aryl halides and nitromethane was developed with the use of parallel microscale experimentation. The arylnitromethane products are precursors for numerous useful synthetic products. An efficient method for their direct conversion to the corresponding oximes and aldehydes in a one-pot operation has been discovered. The process exploits inexpensive nitromethane as a carbonyl equivalent, providing a mild and convenient formylation method that is compatible with many functional groups. PMID:22839593
Exploiting Data Sparsity in Parallel Matrix Powers Computations
2013-05-03
2013 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour...matrices of the form A = D+USV H, where D is sparse and USV H has low rank but may be dense. Matrices of this form arise in many practical applications...methods numerical partial di erential equation solvers, and preconditioned iterative methods. If A has this form , our algorithm enables a communication
Calculation of Moment Matrix Elements for Bilinear Quadrilaterals and Higher-Order Basis Functions
2016-01-06
methods are known as boundary integral equation (BIE) methods and the present study falls into this category. The numerical solution of the BIE is...iterated integrals. The inner integral involves the product of the free-space Green’s function for the Helmholtz equation multiplied by an appropriate...Website: http://www.wipl-d.com/ 5. Y. Zhang and T. K. Sarkar, Parallel Solution of Integral Equation -Based EM Problems in the Frequency Domain. New
Forward Monte Carlo Computations of Polarized Microwave Radiation
NASA Technical Reports Server (NTRS)
Battaglia, A.; Kummerow, C.
2000-01-01
Microwave radiative transfer computations continue to acquire greater importance as the emphasis in remote sensing shifts towards the understanding of microphysical properties of clouds and with these to better understand the non linear relation between rainfall rates and satellite-observed radiance. A first step toward realistic radiative simulations has been the introduction of techniques capable of treating 3-dimensional geometry being generated by ever more sophisticated cloud resolving models. To date, a series of numerical codes have been developed to treat spherical and randomly oriented axisymmetric particles. Backward and backward-forward Monte Carlo methods are, indeed, efficient in this field. These methods, however, cannot deal properly with oriented particles, which seem to play an important role in polarization signatures over stratiform precipitation. Moreover, beyond the polarization channel, the next generation of fully polarimetric radiometers challenges us to better understand the behavior of the last two Stokes parameters as well. In order to solve the vector radiative transfer equation, one-dimensional numerical models have been developed, These codes, unfortunately, consider the atmosphere as horizontally homogeneous with horizontally infinite plane parallel layers. The next development step for microwave radiative transfer codes must be fully polarized 3-D methods. Recently a 3-D polarized radiative transfer model based on the discrete ordinate method was presented. A forward MC code was developed that treats oriented nonspherical hydrometeors, but only for plane-parallel situations.
Phase reconstruction using compressive two-step parallel phase-shifting digital holography
NASA Astrophysics Data System (ADS)
Ramachandran, Prakash; Alex, Zachariah C.; Nelleri, Anith
2018-04-01
The linear relationship between the sample complex object wave and its approximated complex Fresnel field obtained using single shot parallel phase-shifting digital holograms (PPSDH) is used in compressive sensing framework and an accurate phase reconstruction is demonstrated. It is shown that the accuracy of phase reconstruction of this method is better than that of compressive sensing adapted single exposure inline holography (SEOL) method. It is derived that the measurement model of PPSDH method retains both the real and imaginary parts of the Fresnel field but with an approximation noise and the measurement model of SEOL retains only the real part exactly equal to the real part of the complex Fresnel field and its imaginary part is completely not available. Numerical simulation is performed for CS adapted PPSDH and CS adapted SEOL and it is demonstrated that the phase reconstruction is accurate for CS adapted PPSDH and can be used for single shot digital holographic reconstruction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Griebel, M., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de; Rüttgers, A., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de
The multiscale FENE model is applied to a 3D square-square contraction flow problem. For this purpose, the stochastic Brownian configuration field method (BCF) has been coupled with our fully parallelized three-dimensional Navier-Stokes solver NaSt3DGPF. The robustness of the BCF method enables the numerical simulation of high Deborah number flows for which most macroscopic methods suffer from stability issues. The results of our simulations are compared with that of experimental measurements from literature and show a very good agreement. In particular, flow phenomena such as a strong vortex enhancement, streamline divergence and a flow inversion for highly elastic flows are reproduced.more » Due to their computational complexity, our simulations require massively parallel computations. Using a domain decomposition approach with MPI, the implementation achieves excellent scale-up results for up to 128 processors.« less
Parallel-vector unsymmetric Eigen-Solver on high performance computers
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Jiangning, Qin
1993-01-01
The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
Using SpF to Achieve Petascale for Legacy Pseudospectral Applications
NASA Technical Reports Server (NTRS)
Clune, Thomas L.; Jiang, Weiyuan
2014-01-01
Pseudospectral (PS) methods possess a number of characteristics (e.g., efficiency, accuracy, natural boundary conditions) that are extremely desirable for dynamo models. Unfortunately, dynamo models based upon PS methods face a number of daunting challenges, which include exposing additional parallelism, leveraging hardware accelerators, exploiting hybrid parallelism, and improving the scalability of global memory transposes. Although these issues are a concern for most models, solutions for PS methods tend to require far more pervasive changes to underlying data and control structures. Further, improvements in performance in one model are difficult to transfer to other models, resulting in significant duplication of effort across the research community. We have developed an extensible software framework for pseudospectral methods called SpF that is intended to enable extreme scalability and optimal performance. Highlevel abstractions provided by SpF unburden applications of the responsibility of managing domain decomposition and load balance while reducing the changes in code required to adapt to new computing architectures. The key design concept in SpF is that each phase of the numerical calculation is partitioned into disjoint numerical kernels that can be performed entirely inprocessor. The granularity of domain decomposition provided by SpF is only constrained by the datalocality requirements of these kernels. SpF builds on top of optimized vendor libraries for common numerical operations such as transforms, matrix solvers, etc., but can also be configured to use open source alternatives for portability. SpF includes several alternative schemes for global data redistribution and is expected to serve as an ideal testbed for further research into optimal approaches for different network architectures. In this presentation, we will describe our experience in porting legacy pseudospectral models, MoSST and DYNAMO, to use SpF as well as present preliminary performance results provided by the improved scalability.
The Numerical Technique for the Landslide Tsunami Simulations Based on Navier-Stokes Equations
NASA Astrophysics Data System (ADS)
Kozelkov, A. S.
2017-12-01
The paper presents an integral technique simulating all phases of a landslide-driven tsunami. The technique is based on the numerical solution of the system of Navier-Stokes equations for multiphase flows. The numerical algorithm uses a fully implicit approximation method, in which the equations of continuity and momentum conservation are coupled through implicit summands of pressure gradient and mass flow. The method we propose removes severe restrictions on the time step and allows simulation of tsunami propagation to arbitrarily large distances. The landslide origin is simulated as an individual phase being a Newtonian fluid with its own density and viscosity and separated from the water and air phases by an interface. The basic formulas of equation discretization and expressions for coefficients are presented, and the main steps of the computation procedure are described in the paper. To enable simulations of tsunami propagation across wide water areas, we propose a parallel algorithm of the technique implementation, which employs an algebraic multigrid method. The implementation of the multigrid method is based on the global level and cascade collection algorithms that impose no limitations on the paralleling scale and make this technique applicable to petascale systems. We demonstrate the possibility of simulating all phases of a landslide-driven tsunami, including its generation, propagation and uprush. The technique has been verified against the problems supported by experimental data. The paper describes the mechanism of incorporating bathymetric data to simulate tsunamis in real water areas of the world ocean. Results of comparison with the nonlinear dispersion theory, which has demonstrated good agreement, are presented for the case of a historical tsunami of volcanic origin on the Montserrat Island in the Caribbean Sea.
Parallel 3-D numerical simulation of dielectric barrier discharge plasma actuators
NASA Astrophysics Data System (ADS)
Houba, Tomas
Dielectric barrier discharge plasma actuators have shown promise in a range of applications including flow control, sterilization and ozone generation. Developing numerical models of plasma actuators is of great importance, because a high-fidelity parallel numerical model allows new design configurations to be tested rapidly. Additionally, it provides a better understanding of the plasma actuator physics which is useful for further innovation. The physics of plasma actuators is studied numerically. A loosely coupled approach is utilized for the coupling of the plasma to the neutral fluid. The state of the art in numerical plasma modeling is advanced by the development of a parallel, three-dimensional, first-principles model with detailed air chemistry. The model incorporates 7 charged species and 18 reactions, along with a solution of the electron energy equation. To the author's knowledge, a parallel three-dimensional model of a gas discharge with a detailed air chemistry model and the solution of electron energy is unique. Three representative geometries are studied using the gas discharge model. The discharge of gas between two parallel electrodes is used to validate the air chemistry model developed for the gas discharge code. The gas discharge model is then applied to the discharge produced by placing a dc powered wire and grounded plate electrodes in a channel. Finally, a three-dimensional simulation of gas discharge produced by electrodes placed inside a riblet is carried out. The body force calculated with the gas discharge model is loosely coupled with a fluid model to predict the induced flow inside the riblet.
Yang, Tzuhsiung; Berry, John F
2018-06-04
The computation of nuclear second derivatives of energy, or the nuclear Hessian, is an essential routine in quantum chemical investigations of ground and transition states, thermodynamic calculations, and molecular vibrations. Analytic nuclear Hessian computations require the resolution of costly coupled-perturbed self-consistent field (CP-SCF) equations, while numerical differentiation of analytic first derivatives has an unfavorable 6 N ( N = number of atoms) prefactor. Herein, we present a new method in which grid computing is used to accelerate and/or enable the evaluation of the nuclear Hessian via numerical differentiation: NUMFREQ@Grid. Nuclear Hessians were successfully evaluated by NUMFREQ@Grid at the DFT level as well as using RIJCOSX-ZORA-MP2 or RIJCOSX-ZORA-B2PLYP for a set of linear polyacenes with systematically increasing size. For the larger members of this group, NUMFREQ@Grid was found to outperform the wall clock time of analytic Hessian evaluation; at the MP2 or B2LYP levels, these Hessians cannot even be evaluated analytically. We also evaluated a 156-atom catalytically relevant open-shell transition metal complex and found that NUMFREQ@Grid is faster (7.7 times shorter wall clock time) and less demanding (4.4 times less memory requirement) than an analytic Hessian. Capitalizing on the capabilities of parallel grid computing, NUMFREQ@Grid can outperform analytic methods in terms of wall time, memory requirements, and treatable system size. The NUMFREQ@Grid method presented herein demonstrates how grid computing can be used to facilitate embarrassingly parallel computational procedures and is a pioneer for future implementations.
Asymptotic-preserving Lagrangian approach for modeling anisotropic transport in magnetized plasmas
NASA Astrophysics Data System (ADS)
Chacon, Luis; Del-Castillo-Negrete, Diego
2011-10-01
Modeling electron transport in magnetized plasmas is extremely challenging due to the extreme anisotropy introduced by the presence of the magnetic field (χ∥ /χ⊥ ~1010 in fusion plasmas). Recently, a novel Lagrangian method has been proposed to solve the local and non-local purely parallel transport equation in general 3D magnetic fields. The approach avoids numerical pollution (in fact, it respects transport barriers -flux surfaces- exactly by construction), is inherently positivity-preserving, and is scalable algorithmically (i.e., work per degree-of-freedom is grid-independent). In this poster, we discuss the extension of the Lagrangian approach to include perpendicular transport and sources. We present an asymptotic-preserving numerical formulation that ensures a consistent numerical discretization temporally and spatially for arbitrary χ∥ /χ⊥ ratios. This is of importance because parallel and perpendicular transport terms in the transport equation may become comparable in regions of the plasma (e.g., at incipient islands), while remaining disparate elsewhere. We will demonstrate the potential of the approach with various challenging configurations, including the case of transport across a magnetic island in cylindrical geometry. D. del-Castillo-Negrete, L. Chacón, PRL, 106, 195004 (2011); DPP11 invited talk by del-Castillo-Negrete.
Numerical Analysis of Ginzburg-Landau Models for Superconductivity.
NASA Astrophysics Data System (ADS)
Coskun, Erhan
Thin film conventional, as well as High T _{c} superconductors of various geometric shapes placed under both uniform and variable strength magnetic field are studied using the universially accepted macroscopic Ginzburg-Landau model. A series of new theoretical results concerning the properties of solution is presented using the semi -discrete time-dependent Ginzburg-Landau equations, staggered grid setup and natural boundary conditions. Efficient serial algorithms including a novel adaptive algorithm is developed and successfully implemented for solving the governing highly nonlinear parabolic system of equations. Refinement technique used in the adaptive algorithm is based on modified forward Euler method which was also developed by us to ease the restriction on time step size for stability considerations. Stability and convergence properties of forward and modified forward Euler schemes are studied. Numerical simulations of various recent physical experiments of technological importance such as vortes motion and pinning are performed. The numerical code for solving time-dependent Ginzburg-Landau equations is parallelized using BlockComm -Chameleon and PCN. The parallel code was run on the distributed memory multiprocessors intel iPSC/860, IBM-SP1 and cluster of Sun Sparc workstations, all located at Mathematics and Computer Science Division, Argonne National Laboratory.
Mukherjee, Anamitra; Patel, Niravkumar D.; Bishop, Chris; ...
2015-06-08
Lattice spin-fermion models are quite important to study correlated systems where quantum dynamics allows for a separation between slow and fast degrees of freedom. The fast degrees of freedom are treated quantum mechanically while the slow variables, generically referred to as the “spins,” are treated classically. At present, exact diagonalization coupled with classical Monte Carlo (ED + MC) is extensively used to solve numerically a general class of lattice spin-fermion problems. In this common setup, the classical variables (spins) are treated via the standard MC method while the fermion problem is solved by exact diagonalization. The “traveling cluster approximation” (TCA)more » is a real space variant of the ED + MC method that allows to solve spin-fermion problems on lattice sizes with up to 10 3 sites. In this paper, we present a novel reorganization of the TCA algorithm in a manner that can be efficiently parallelized. Finally, this allows us to solve generic spin-fermion models easily on 10 4 lattice sites and with some effort on 10 5 lattice sites, representing the record lattice sizes studied for this family of models.« less
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Hixon, Duane; Sankar, L. N.
1993-01-01
During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mukherjee, Anamitra; Patel, Niravkumar D.; Bishop, Chris
Lattice spin-fermion models are quite important to study correlated systems where quantum dynamics allows for a separation between slow and fast degrees of freedom. The fast degrees of freedom are treated quantum mechanically while the slow variables, generically referred to as the “spins,” are treated classically. At present, exact diagonalization coupled with classical Monte Carlo (ED + MC) is extensively used to solve numerically a general class of lattice spin-fermion problems. In this common setup, the classical variables (spins) are treated via the standard MC method while the fermion problem is solved by exact diagonalization. The “traveling cluster approximation” (TCA)more » is a real space variant of the ED + MC method that allows to solve spin-fermion problems on lattice sizes with up to 10 3 sites. In this paper, we present a novel reorganization of the TCA algorithm in a manner that can be efficiently parallelized. Finally, this allows us to solve generic spin-fermion models easily on 10 4 lattice sites and with some effort on 10 5 lattice sites, representing the record lattice sizes studied for this family of models.« less
Parallel momentum input by tangential neutral beam injections in stellarator and heliotron plasmas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, S., E-mail: nishimura.shin@lhd.nifs.ac.jp; Nakamura, Y.; Nishioka, K.
The configuration dependence of parallel momentum inputs to target plasma particle species by tangentially injected neutral beams is investigated in non-axisymmetric stellarator/heliotron model magnetic fields by assuming the existence of magnetic flux-surfaces. In parallel friction integrals of the full Rosenbluth-MacDonald-Judd collision operator in thermal particles' kinetic equations, numerically obtained eigenfunctions are used for excluding trapped fast ions that cannot contribute to the friction integrals. It is found that the momentum inputs to thermal ions strongly depend on magnetic field strength modulations on the flux-surfaces, while the input to electrons is insensitive to the modulation. In future plasma flow studies requiringmore » flow calculations of all particle species in more general non-symmetric toroidal configurations, the eigenfunction method investigated here will be useful.« less
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Chiou, Jin-Chern
1990-01-01
Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
NASA Technical Reports Server (NTRS)
Hussaini, M. Y. (Editor); Kumar, A. (Editor); Salas, M. D. (Editor)
1993-01-01
The purpose here is to assess the state of the art in the areas of numerical analysis that are particularly relevant to computational fluid dynamics (CFD), to identify promising new developments in various areas of numerical analysis that will impact CFD, and to establish a long-term perspective focusing on opportunities and needs. Overviews are given of discretization schemes, computational fluid dynamics, algorithmic trends in CFD for aerospace flow field calculations, simulation of compressible viscous flow, and massively parallel computation. Also discussed are accerelation methods, spectral and high-order methods, multi-resolution and subcell resolution schemes, and inherently multidimensional schemes.
NASA Astrophysics Data System (ADS)
Marx, Alain; Lütjens, Hinrich
2017-03-01
A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
NASA Astrophysics Data System (ADS)
Qiang, FangWei; Wei, PeiJun; Li, Li
2012-07-01
In the present paper, the effective propagation constants of elastic SH waves in composites with randomly distributed parallel cylindrical nanofibers are studied. The surface stress effects are considered based on the surface elasticity theory and non-classical interfacial conditions between the nanofiber and the host are derived. The scattering waves from individual nanofibers embedded in an infinite elastic host are obtained by the plane wave expansion method. The scattering waves from all fibers are summed up to obtain the multiple scattering waves. The interactions among random dispersive nanofibers are taken into account by the effective field approximation. The effective propagation constants are obtained by the configurational average of the multiple scattering waves. The effective speed and attenuation of the averaged wave and the associated dynamical effective shear modulus of composites are numerically calculated. Based on the numerical results, the size effects of the nanofibers on the effective propagation constants and the effective modulus are discussed.
A Parallel 2D Numerical Simulation of Tumor Cells Necrosis by Local Hyperthermia
NASA Astrophysics Data System (ADS)
Reis, R. F.; Loureiro, F. S.; Lobosco, M.
2014-03-01
Hyperthermia has been widely used in cancer treatment to destroy tumors. The main idea of the hyperthermia is to heat a specific region like a tumor so that above a threshold temperature the tumor cells are destroyed. This can be accomplished by many heat supply techniques and the use of magnetic nanoparticles that generate heat when an alternating magnetic field is applied has emerged as a promise technique. In the present paper, the Pennes bioheat transfer equation is adopted to model the thermal tumor ablation in the context of magnetic nanoparticles. Numerical simulations are carried out considering different injection sites for the nanoparticles in an attempt to achieve better hyperthermia conditions. Explicit finite difference method is employed to solve the equations. However, a large amount of computation is required for this purpose. Therefore, this work also presents an initial attempt to improve performance using OpenMP, a parallel programming API. Experimental results were quite encouraging: speedups around 35 were obtained on a 64-core machine.
Lattice Boltzmann modeling of transport phenomena in fuel cells and flow batteries
NASA Astrophysics Data System (ADS)
Xu, Ao; Shyy, Wei; Zhao, Tianshou
2017-06-01
Fuel cells and flow batteries are promising technologies to address climate change and air pollution problems. An understanding of the complex multiscale and multiphysics transport phenomena occurring in these electrochemical systems requires powerful numerical tools. Over the past decades, the lattice Boltzmann (LB) method has attracted broad interest in the computational fluid dynamics and the numerical heat transfer communities, primarily due to its kinetic nature making it appropriate for modeling complex multiphase transport phenomena. More importantly, the LB method fits well with parallel computing due to its locality feature, which is required for large-scale engineering applications. In this article, we review the LB method for gas-liquid two-phase flows, coupled fluid flow and mass transport in porous media, and particulate flows. Examples of applications are provided in fuel cells and flow batteries. Further developments of the LB method are also outlined.
A method of measuring the effective thermal conductivity of thermoplastic foams
NASA Astrophysics Data System (ADS)
Asséko, André Chateau Akué; Cosson, Benoit; Chaki, Salim; Duborper, Clément; Lacrampe, Marie-France; Krawczak, Patricia
2017-10-01
An inverse method for determining the in-plane effective thermal conductivity of porous thermoplastics was implemented by coupling infrared thermography experiments and numerical solution of heat transfer in straight fins having temperature-dependent convective heat transfer coefficient. The obtained effective thermal conductivity values were compared with previous results obtained using a numerical solution based on periodic homogenization techniques (NSHT) in which the microstructure heterogeneity of extruded polymeric polyethylene (PE) foam in which pores are filled with air with different levels of open and closed porosity was taken into account and Transient Plane Source Technique (TPS) in order to verify the accuracy of the proposed method. The new method proposed in the present study is in good agreement with both NSHT and TPS. It is also applicable to structural materials such as composites, e.g. unidirectional fiber-reinforced plastics, where heat transfer is very different according to the fiber direction (parallel or transverse to the fibers).
NASA Technical Reports Server (NTRS)
Nash, Stephen G.; Polyak, R.; Sofer, Ariela
1994-01-01
When a classical barrier method is applied to the solution of a nonlinear programming problem with inequality constraints, the Hessian matrix of the barrier function becomes increasingly ill-conditioned as the solution is approached. As a result, it may be desirable to consider alternative numerical algorithms. We compare the performance of two methods motivated by barrier functions. The first is a stabilized form of the classical barrier method, where a numerically stable approximation to the Newton direction is used when the barrier parameter is small. The second is a modified barrier method where a barrier function is applied to a shifted form of the problem, and the resulting barrier terms are scaled by estimates of the optimal Lagrange multipliers. The condition number of the Hessian matrix of the resulting modified barrier function remains bounded as the solution to the constrained optimization problem is approached. Both of these techniques can be used in the context of a truncated-Newton method, and hence can be applied to large problems, as well as on parallel computers. In this paper, both techniques are applied to problems with bound constraints and we compare their practical behavior.
Detailed Aerodynamic Analysis of a Shrouded Tail Rotor Using an Unstructured Mesh Flow Solver
NASA Astrophysics Data System (ADS)
Lee, Hee Dong; Kwon, Oh Joon
The detailed aerodynamics of a shrouded tail rotor in hover has been numerically studied using a parallel inviscid flow solver on unstructured meshes. The numerical method is based on a cell-centered finite-volume discretization and an implicit Gauss-Seidel time integration. The calculation was made for a single blade by imposing a periodic boundary condition between adjacent rotor blades. The grid periodicity was also imposed at the periodic boundary planes to avoid numerical inaccuracy resulting from solution interpolation. The results were compared with available experimental data and those from a disk vortex theory for validation. It was found that realistic three-dimensional modeling is important for the prediction of detailed aerodynamics of shrouded rotors including the tip clearance gap flow.
Numerical approach of the quantum circuit theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Silva, J.J.B., E-mail: jaedsonfisica@hotmail.com; Duarte-Filho, G.C.; Almeida, F.A.G.
2017-03-15
In this paper we develop a numerical method based on the quantum circuit theory to approach the coherent electronic transport in a network of quantum dots connected with arbitrary topology. The algorithm was employed in a circuit formed by quantum dots connected each other in a shape of a linear chain (associations in series), and of a ring (associations in series, and in parallel). For both systems we compute two current observables: conductance and shot noise power. We find an excellent agreement between our numerical results and the ones found in the literature. Moreover, we analyze the algorithm efficiency formore » a chain of quantum dots, where the mean processing time exhibits a linear dependence with the number of quantum dots in the array.« less
NASA Astrophysics Data System (ADS)
Drabik, Timothy J.; Lee, Sing H.
1986-11-01
The intrinsic parallelism characteristics of easily realizable optical SIMD arrays prompt their present consideration in the implementation of highly structured algorithms for the numerical solution of multidimensional partial differential equations and the computation of fast numerical transforms. Attention is given to a system, comprising several spatial light modulators (SLMs), an optical read/write memory, and a functional block, which performs simple, space-invariant shifts on images with sufficient flexibility to implement the fastest known methods for partial differential equations as well as a wide variety of numerical transforms in two or more dimensions. Either fixed or floating-point arithmetic may be used. A performance projection of more than 1 billion floating point operations/sec using SLMs with 1000 x 1000-resolution and operating at 1-MHz frame rates is made.
NASA Astrophysics Data System (ADS)
Moreto, Jose; Liu, Xiaofeng
2017-11-01
The accuracy of the Rotating Parallel Ray omnidirectional integration for pressure reconstruction from the measured pressure gradient (Liu et al., AIAA paper 2016-1049) is evaluated against both the Circular Virtual Boundary omnidirectional integration (Liu and Katz, 2006 and 2013) and the conventional Poisson equation approach. Dirichlet condition at one boundary point and Neumann condition at all other boundary points are applied to the Poisson solver. A direct numerical simulation database of isotropic turbulence flow (JHTDB), with a homogeneously distributed random noise added to the entire field of DNS pressure gradient, is used to assess the performance of the methods. The random noise, generated by the Matlab function Rand, has a magnitude varying randomly within the range of +/-40% of the maximum DNS pressure gradient. To account for the effect of the noise distribution pattern on the reconstructed pressure accuracy, a total of 1000 different noise distributions achieved by using different random number seeds are involved in the evaluation. Final results after averaging the 1000 realizations show that the error of the reconstructed pressure normalized by the DNS pressure variation range is 0.15 +/-0.07 for the Poisson equation approach, 0.028 +/-0.003 for the Circular Virtual Boundary method and 0.027 +/-0.003 for the Rotating Parallel Ray method, indicating the robustness of the Rotating Parallel Ray method in pressure reconstruction. Sponsor: The San Diego State University UGP program.
NASA Astrophysics Data System (ADS)
Sloan, Gregory James
The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
On the impact of communication complexity in the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
On the impact of communication complexity on the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D. B.; Van Rosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical alorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
NASA Astrophysics Data System (ADS)
Blasevski, D.; Del-Castillo-Negrete, D.
2012-10-01
Heat transport in magnetized plasmas is a problem of fundamental interest in controlled fusion. In Ref.footnotetext D. del-Castillo-Negrete, and L. Chac'on, Phys. Rev. Lett., 106, 195004 (2011); Phys. Plasmas 19, 056112 (2012). we proposed a Lagrangian-Green's function (LG) method to study this problem in the strongly anisotropic (χ=0) regime. The LG method bypasses the need to discretize the transport operators on a grid and it is applicable to general parallel flux closures and 3-D magnetic fields. Here we apply the LG method to parallel transport (with local and nonlocal parallel flux closures) in reversed shear magnetic field configurations known to exhibit robust transport barriers in the vicinity of the extrema of the q-profile. By shearless Cantori (SC) we mean the invariant Cantor sets remaining after the destruction of toroidal flux surfaces with zero magnetic shear, q^'=0. We provide numerical evidence of the role of SC in the anomalously slow relaxation of radial temperature gradients in chaotic magnetic fields with no transport barriers. The spatio-temporal evolution of temperature pulses localized in the reversed shear region exhibits non-diffusive self-similar evolution and nonlocal effective radial transport.
A Parallel Decoding Algorithm for Short Polar Codes Based on Error Checking and Correcting
Pan, Xiaofei; Pan, Kegang; Ye, Zhan; Gong, Chao
2014-01-01
We propose a parallel decoding algorithm based on error checking and correcting to improve the performance of the short polar codes. In order to enhance the error-correcting capacity of the decoding algorithm, we first derive the error-checking equations generated on the basis of the frozen nodes, and then we introduce the method to check the errors in the input nodes of the decoder by the solutions of these equations. In order to further correct those checked errors, we adopt the method of modifying the probability messages of the error nodes with constant values according to the maximization principle. Due to the existence of multiple solutions of the error-checking equations, we formulate a CRC-aided optimization problem of finding the optimal solution with three different target functions, so as to improve the accuracy of error checking. Besides, in order to increase the throughput of decoding, we use a parallel method based on the decoding tree to calculate probability messages of all the nodes in the decoder. Numerical results show that the proposed decoding algorithm achieves better performance than that of some existing decoding algorithms with the same code length. PMID:25540813
Le Corre, Mathieu; Carey, Susan
2007-11-01
Since the publication of [Gelman, R., & Gallistel, C. R. (1978). The child's understanding of number. Cambridge, MA: Harvard University Press.] seminal work on the development of verbal counting as a representation of number, the nature of the ontogenetic sources of the verbal counting principles has been intensely debated. The present experiments explore proposals according to which the verbal counting principles are acquired by mapping numerals in the count list onto systems of numerical representation for which there is evidence in infancy, namely, analog magnitudes, parallel individuation, and set-based quantification. By asking 3- and 4-year-olds to estimate the number of elements in sets without counting, we investigate whether the numerals that are assigned cardinal meaning as part of the acquisition process display the signatures of what we call "enriched parallel individuation" (which combines properties of parallel individuation and of set-based quantification) or analog magnitudes. Two experiments demonstrate that while "one" to "four" are mapped onto core representations of small sets prior to the acquisition of the counting principles, numerals beyond "four" are only mapped onto analog magnitudes about six months after the acquisition of the counting principles. Moreover, we show that children's numerical estimates of sets from 1 to 4 elements fail to show the signature of numeral use based on analog magnitudes - namely, scalar variability. We conclude that, while representations of small sets provided by parallel individuation, enriched by the resources of set-based quantification are recruited in the acquisition process to provide the first numerical meanings for "one" to "four", analog magnitudes play no role in this process.
NASA Astrophysics Data System (ADS)
Alfonso, Lester; Zamora, Jose; Cruz, Pedro
2015-04-01
The stochastic approach to coagulation considers the coalescence process going in a system of a finite number of particles enclosed in a finite volume. Within this approach, the full description of the system can be obtained from the solution of the multivariate master equation, which models the evolution of the probability distribution of the state vector for the number of particles of a given mass. Unfortunately, due to its complexity, only limited results were obtained for certain type of kernels and monodisperse initial conditions. In this work, a novel numerical algorithm for the solution of the multivariate master equation for stochastic coalescence that works for any type of kernels and initial conditions is introduced. The performance of the method was checked by comparing the numerically calculated particle mass spectrum with analytical solutions obtained for the constant and sum kernels, with an excellent correspondence between the analytical and numerical solutions. In order to increase the speedup of the algorithm, software parallelization techniques with OpenMP standard were used, along with an implementation in order to take advantage of new accelerator technologies. Simulations results show an important speedup of the parallelized algorithms. This study was funded by a grant from Consejo Nacional de Ciencia y Tecnologia de Mexico SEP-CONACYT CB-131879. The authors also thanks LUFAC® Computacion SA de CV for CPU time and all the support provided.
Vectorized and multitasked solution of the few-group neutron diffusion equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zee, S.K.; Turinsky, P.J.; Shayer, Z.
1989-03-01
A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
NASA Technical Reports Server (NTRS)
Shu, Chi-Wang
1992-01-01
The nonlinear stability of compact schemes for shock calculations is investigated. In recent years compact schemes were used in various numerical simulations including direct numerical simulation of turbulence. However to apply them to problems containing shocks, one has to resolve the problem of spurious numerical oscillation and nonlinear instability. A framework to apply nonlinear limiting to a local mean is introduced. The resulting scheme can be proven total variation (1D) or maximum norm (multi D) stable and produces nice numerical results in the test cases. The result is summarized in the preprint entitled 'Nonlinearly Stable Compact Schemes for Shock Calculations', which was submitted to SIAM Journal on Numerical Analysis. Research was continued on issues related to two and three dimensional essentially non-oscillatory (ENO) schemes. The main research topics include: parallel implementation of ENO schemes on Connection Machines; boundary conditions; shock interaction with hydrogen bubbles, a preparation for the full combustion simulation; and direct numerical simulation of compressible sheared turbulence.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schaerer, Roman Pascal, E-mail: schaerer@mathcces.rwth-aachen.de; Bansal, Pratyuksh; Torrilhon, Manuel
We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) , we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropymore » distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.« less
Singular boundary method for global gravity field modelling
NASA Astrophysics Data System (ADS)
Cunderlik, Robert
2014-05-01
The singular boundary method (SBM) and method of fundamental solutions (MFS) are meshless boundary collocation techniques that use the fundamental solution of a governing partial differential equation (e.g. the Laplace equation) as their basis functions. They have been developed to avoid singular numerical integration as well as mesh generation in the traditional boundary element method (BEM). SBM have been proposed to overcome a main drawback of MFS - its controversial fictitious boundary outside the domain. The key idea of SBM is to introduce a concept of the origin intensity factors that isolate singularities of the fundamental solution and its derivatives using some appropriate regularization techniques. Consequently, the source points can be placed directly on the real boundary and coincide with the collocation nodes. In this study we deal with SBM applied for high-resolution global gravity field modelling. The first numerical experiment presents a numerical solution to the fixed gravimetric boundary value problem. The achieved results are compared with the numerical solutions obtained by MFS or the direct BEM indicating efficiency of all methods. In the second numerical experiments, SBM is used to derive the geopotential and its first derivatives from the Tzz components of the gravity disturbing tensor observed by the GOCE satellite mission. A determination of the origin intensity factors allows to evaluate the disturbing potential and gravity disturbances directly on the Earth's surface where the source points are located. To achieve high-resolution numerical solutions, the large-scale parallel computations are performed on the cluster with 1TB of the distributed memory and an iterative elimination of far zones' contributions is applied.
NASA Technical Reports Server (NTRS)
Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos
1996-01-01
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
NASA Technical Reports Server (NTRS)
Aftosmis, M. J.; Berger, M. J.; Adomavicius, G.
2000-01-01
Preliminary verification and validation of an efficient Euler solver for adaptively refined Cartesian meshes with embedded boundaries is presented. The parallel, multilevel method makes use of a new on-the-fly parallel domain decomposition strategy based upon the use of space-filling curves, and automatically generates a sequence of coarse meshes for processing by the multigrid smoother. The coarse mesh generation algorithm produces grids which completely cover the computational domain at every level in the mesh hierarchy. A series of examples on realistically complex three-dimensional configurations demonstrate that this new coarsening algorithm reliably achieves mesh coarsening ratios in excess of 7 on adaptively refined meshes. Numerical investigations of the scheme's local truncation error demonstrate an achieved order of accuracy between 1.82 and 1.88. Convergence results for the multigrid scheme are presented for both subsonic and transonic test cases and demonstrate W-cycle multigrid convergence rates between 0.84 and 0.94. Preliminary parallel scalability tests on both simple wing and complex complete aircraft geometries shows a computational speedup of 52 on 64 processors using the run-time mesh partitioner.
Super-resolved Parallel MRI by Spatiotemporal Encoding
Schmidt, Rita; Baishya, Bikash; Ben-Eliezer, Noam; Seginer, Amir; Frydman, Lucio
2016-01-01
Recent studies described an alternative “ultrafast” scanning method based on spatiotemporal (SPEN) principles. SPEN demonstrates numerous potential advantages over EPI-based alternatives, at no additional expense in experimental complexity. An important aspect that SPEN still needs to achieve for providing a competitive acquisition alternative entails exploiting parallel imaging algorithms, without compromising its proven capabilities. The present work introduces a combination of multi-band frequency-swept pulses simultaneously encoding multiple, partial fields-of-view; together with a new algorithm merging a Super-Resolved SPEN image reconstruction and SENSE multiple-receiving methods. The ensuing approach enables one to reduce both the excitation and acquisition times of ultrafast SPEN acquisitions by the customary acceleration factor R, without compromises in either the ensuing spatial resolution, SAR deposition, or the capability to operate in multi-slice mode. The performance of these new single-shot imaging sequences and their ancillary algorithms were explored on phantoms and human volunteers at 3T. The gains of the parallelized approach were particularly evident when dealing with heterogeneous systems subject to major T2/T2* effects, as is the case upon single-scan imaging near tissue/air interfaces. PMID:24120293
3D brain tumor localization and parameter estimation using thermographic approach on GPU.
Bousselham, Abdelmajid; Bouattane, Omar; Youssfi, Mohamed; Raihani, Abdelhadi
2018-01-01
The aim of this paper is to present a GPU parallel algorithm for brain tumor detection to estimate its size and location from surface temperature distribution obtained by thermography. The normal brain tissue is modeled as a rectangular cube including spherical tumor. The temperature distribution is calculated using forward three dimensional Pennes bioheat transfer equation, it's solved using massively parallel Finite Difference Method (FDM) and implemented on Graphics Processing Unit (GPU). Genetic Algorithm (GA) was used to solve the inverse problem and estimate the tumor size and location by minimizing an objective function involving measured temperature on the surface to those obtained by numerical simulation. The parallel implementation of Finite Difference Method reduces significantly the time of bioheat transfer and greatly accelerates the inverse identification of brain tumor thermophysical and geometrical properties. Experimental results show significant gains in the computational speed on GPU and achieve a speedup of around 41 compared to the CPU. The analysis performance of the estimation based on tumor size inside brain tissue also presented. Copyright © 2017 Elsevier Ltd. All rights reserved.
An extended algebraic reconstruction technique (E-ART) for dual spectral CT.
Zhao, Yunsong; Zhao, Xing; Zhang, Peng
2015-03-01
Compared with standard computed tomography (CT), dual spectral CT (DSCT) has many advantages for object separation, contrast enhancement, artifact reduction, and material composition assessment. But it is generally difficult to reconstruct images from polychromatic projections acquired by DSCT, because of the nonlinear relation between the polychromatic projections and the images to be reconstructed. This paper first models the DSCT reconstruction problem as a nonlinear system problem; and then extend the classic ART method to solve the nonlinear system. One feature of the proposed method is its flexibility. It fits for any scanning configurations commonly used and does not require consistent rays for different X-ray spectra. Another feature of the proposed method is its high degree of parallelism, which means that the method is suitable for acceleration on GPUs (graphic processing units) or other parallel systems. The method is validated with numerical experiments from simulated noise free and noisy data. High quality images are reconstructed with the proposed method from the polychromatic projections of DSCT. The reconstructed images are still satisfactory even if there are certain errors in the estimated X-ray spectra.
On the wall-normal velocity of the compressible boundary-layer equations
NASA Technical Reports Server (NTRS)
Pruett, C. David
1991-01-01
Numerical methods for the compressible boundary-layer equations are facilitated by transformation from the physical (x,y) plane to a computational (xi,eta) plane in which the evolution of the flow is 'slow' in the time-like xi direction. The commonly used Levy-Lees transformation results in a computationally well-behaved problem for a wide class of non-similar boundary-layer flows, but it complicates interpretation of the solution in physical space. Specifically, the transformation is inherently nonlinear, and the physical wall-normal velocity is transformed out of the problem and is not readily recovered. In light of recent research which shows mean-flow non-parallelism to significantly influence the stability of high-speed compressible flows, the contribution of the wall-normal velocity in the analysis of stability should not be routinely neglected. Conventional methods extract the wall-normal velocity in physical space from the continuity equation, using finite-difference techniques and interpolation procedures. The present spectrally-accurate method extracts the wall-normal velocity directly from the transformation itself, without interpolation, leaving the continuity equation free as a check on the quality of the solution. The present method for recovering wall-normal velocity, when used in conjunction with a highly-accurate spectral collocation method for solving the compressible boundary-layer equations, results in a discrete solution which is extraordinarily smooth and accurate, and which satisfies the continuity equation nearly to machine precision. These qualities make the method well suited to the computation of the non-parallel mean flows needed by spatial direct numerical simulations (DNS) and parabolized stability equation (PSE) approaches to the analysis of stability.
libvdwxc: a library for exchange-correlation functionals in the vdW-DF family
NASA Astrophysics Data System (ADS)
Hjorth Larsen, Ask; Kuisma, Mikael; Löfgren, Joakim; Pouillon, Yann; Erhart, Paul; Hyldgaard, Per
2017-09-01
We present libvdwxc, a general library for evaluating the energy and potential for the family of vdW-DF exchange-correlation functionals. libvdwxc is written in C and provides an efficient implementation of the vdW-DF method and can be interfaced with various general-purpose DFT codes. Currently, the Gpaw and Octopus codes implement interfaces to libvdwxc. The present implementation emphasizes scalability and parallel performance, and thereby enables ab initio calculations of nanometer-scale complexes. The numerical accuracy is benchmarked on the S22 test set whereas parallel performance is benchmarked on ligand-protected gold nanoparticles ({{Au}}144{({{SC}}11{{NH}}25)}60) up to 9696 atoms.
Efficient, massively parallel eigenvalue computation
NASA Technical Reports Server (NTRS)
Huo, Yan; Schreiber, Robert
1993-01-01
In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
Peker, Musa; Şen, Baha; Gürüler, Hüseyin
2015-02-01
The effect of anesthesia on the patient is referred to as depth of anesthesia. Rapid classification of appropriate depth level of anesthesia is a matter of great importance in surgical operations. Similarly, accelerating classification algorithms is important for the rapid solution of problems in the field of biomedical signal processing. However numerous, time-consuming mathematical operations are required when training and testing stages of the classification algorithms, especially in neural networks. In this study, to accelerate the process, parallel programming and computing platform (Nvidia CUDA) facilitates dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU) was utilized. The system was employed to detect anesthetic depth level on related electroencephalogram (EEG) data set. This dataset is rather complex and large. Moreover, the achieving more anesthetic levels with rapid response is critical in anesthesia. The proposed parallelization method yielded high accurate classification results in a faster time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ohsuga, Ken; Takahashi, Hiroyuki R.
2016-02-20
We develop a numerical scheme for solving the equations of fully special relativistic, radiation magnetohydrodynamics (MHDs), in which the frequency-integrated, time-dependent radiation transfer equation is solved to calculate the specific intensity. The radiation energy density, the radiation flux, and the radiation stress tensor are obtained by the angular quadrature of the intensity. In the present method, conservation of total mass, momentum, and energy of the radiation magnetofluids is guaranteed. We treat not only the isotropic scattering but also the Thomson scattering. The numerical method of MHDs is the same as that of our previous work. The advection terms are explicitlymore » solved, and the source terms, which describe the gas–radiation interaction, are implicitly integrated. Our code is suitable for massive parallel computing. We present that our code shows reasonable results in some numerical tests for propagating radiation and radiation hydrodynamics. Particularly, the correct solution is given even in the optically very thin or moderately thin regimes, and the special relativistic effects are nicely reproduced.« less
State Transition Matrix for Perturbed Orbital Motion Using Modified Chebyshev Picard Iteration
NASA Astrophysics Data System (ADS)
Read, Julie L.; Younes, Ahmad Bani; Macomber, Brent; Turner, James; Junkins, John L.
2015-06-01
The Modified Chebyshev Picard Iteration (MCPI) method has recently proven to be highly efficient for a given accuracy compared to several commonly adopted numerical integration methods, as a means to solve for perturbed orbital motion. This method utilizes Picard iteration, which generates a sequence of path approximations, and Chebyshev Polynomials, which are orthogonal and also enable both efficient and accurate function approximation. The nodes consistent with discrete Chebyshev orthogonality are generated using cosine sampling; this strategy also reduces the Runge effect and as a consequence of orthogonality, there is no matrix inversion required to find the basis function coefficients. The MCPI algorithms considered herein are parallel-structured so that they are immediately well-suited for massively parallel implementation with additional speedup. MCPI has a wide range of applications beyond ephemeris propagation, including the propagation of the State Transition Matrix (STM) for perturbed two-body motion. A solution is achieved for a spherical harmonic series representation of earth gravity (EGM2008), although the methodology is suitable for application to any gravity model. Included in this representation the normalized, Associated Legendre Functions are given and verified numerically. Modifications of the classical algorithm techniques, such as rewriting the STM equations in a second-order cascade formulation, gives rise to additional speedup. Timing results for the baseline formulation and this second-order formulation are given.
Flow of nanofluid past a Riga plate
NASA Astrophysics Data System (ADS)
Ahmad, Adeel; Asghar, Saleem; Afzal, Sumaira
2016-03-01
This paper studies the mixed convection boundary layer flow of a nanofluid past a vertical Riga plate in the presence of strong suction. The mathematical model incorporates the Brownian motion and thermophoresis effects due to nanofluid and the Grinberg-term for the wall parallel Lorentz force due to Riga plate. The analytical solution of the problem is presented using the perturbation method for small Brownian and thermophoresis diffusion parameters. The numerical solution is also presented to ensure the reliability of the asymptotic method. The comparison of the two solutions shows an excellent agreement. The correlation expressions for skin friction, Nusselt number and Sherwood number are developed by performing linear regression on the obtained numerical data. The effects of nanofluid and the Lorentz force due to Riga plate, on the skin friction are discussed.
Proceedings of the 14th International Conference on the Numerical Simulation of Plasmas
NASA Astrophysics Data System (ADS)
Partial Contents are as follows: Numerical Simulations of the Vlasov-Maxwell Equations by Coupled Particle-Finite Element Methods on Unstructured Meshes; Electromagnetic PIC Simulations Using Finite Elements on Unstructured Grids; Modelling Travelling Wave Output Structures with the Particle-in-Cell Code CONDOR; SST--A Single-Slice Particle Simulation Code; Graphical Display and Animation of Data Produced by Electromagnetic, Particle-in-Cell Codes; A Post-Processor for the PEST Code; Gray Scale Rendering of Beam Profile Data; A 2D Electromagnetic PIC Code for Distributed Memory Parallel Computers; 3-D Electromagnetic PIC Simulation on the NRL Connection Machine; Plasma PIC Simulations on MIMD Computers; Vlasov-Maxwell Algorithm for Electromagnetic Plasma Simulation on Distributed Architectures; MHD Boundary Layer Calculation Using the Vortex Method; and Eulerian Codes for Plasma Simulations.
3D Lagrangian VPM: simulations of the near-wake of an actuator disc and horizontal axis wind turbine
NASA Astrophysics Data System (ADS)
Berdowski, T.; Ferreira, C.; Walther, J.
2016-09-01
The application of a 3-dimensional Lagrangian vortex particle method has been assessed for modelling the near-wake of an axisymmetrical actuator disc and 3-bladed horizontal axis wind turbine with prescribed circulation from the MEXICO (Model EXperiments In COntrolled conditions) experiment. The method was developed in the framework of the open- source Parallel Particle-Mesh library for handling the efficient data-parallelism on a CPU (Central Processing Unit) cluster, and utilized a O(N log N)-type fast multipole method for computational acceleration. Simulations with the actuator disc resulted in a wake expansion, velocity deficit profile, and induction factor that showed a close agreement with theoretical, numerical, and experimental results from literature. Also the shear layer expansion was present; the Kelvin-Helmholtz instability in the shear layer was triggered due to the round-off limitations of a numerical method, but this instability was delayed to beyond 1 diameter downstream due to the particle smoothing. Simulations with the 3-bladed turbine demonstrated that a purely 3-dimensional flow representation is challenging to model with particles. The manifestation of local complex flow structures of highly stretched vortices made the simulation unstable, but this was successfully counteracted by the application of a particle strength exchange scheme. The axial and radial velocity profile over the near wake have been compared to that of the original MEXICO experiment, which showed close agreement between results.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
NASA Astrophysics Data System (ADS)
Wang, Yi; Trouvé, Arnaud
2004-09-01
A pseudo-compressibility method is proposed to modify the acoustic time step restriction found in fully compressible, explicit flow solvers. The method manipulates terms in the governing equations of order Ma2, where Ma is a characteristic flow Mach number. A decrease in the speed of acoustic waves is obtained by adding an extra term in the balance equation for total energy. This term is proportional to flow dilatation and uses a decomposition of the dilatational field into an acoustic component and a component due to heat transfer. The present method is a variation of the pressure gradient scaling (PGS) method proposed in Ramshaw et al (1985 Pressure gradient scaling method for fluid flow with nearly uniform pressure J. Comput. Phys. 58 361-76). It achieves gains in computational efficiencies similar to PGS: at the cost of a slightly more involved right-hand-side computation, the numerical time step increases by a full order of magnitude. It also features the added benefit of preserving the hydrodynamic pressure field. The original and modified PGS methods are implemented into a parallel direct numerical simulation solver developed for applications to turbulent reacting flows with detailed chemical kinetics. The performance of the pseudo-compressibility methods is illustrated in a series of test problems ranging from isothermal sound propagation to laminar premixed flame problems.
NASA Technical Reports Server (NTRS)
Arbocz, Johann; Hol, J. M. A. M.; deVries, J.
1998-01-01
A rigorous solution is presented for the case of stiffened anisotropic cylindrical shells with general imperfections under combined loading, where the edge supports are provided by symmetrical or unsymmetrical elastic rings. The circumferential dependence is eliminated by a truncated Fourier series. The resulting nonlinear 2-point boundary value problem is solved numerically via the "Parallel Shooting Method". The changing deformation patterns resulting from the different degrees of interaction between the given initial imperfections and the specified end rings are displayed. Recommendations are made as to the minimum ring stiffnesses required for optimal load carrying configurations.
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
Acceleration of charged particles by crossed cyclotron waves, Resonant Moments Method
NASA Astrophysics Data System (ADS)
Ponomarjov, M.; Carati, D.
A mechanism for enhanced acceleration of charged particles in crossing radio frequency or micro waves propagating at different angles with respect to an external magnetic field is investigated. This mechanism consists in introducing low amplitude secondary waves in order to improve the parallel momentum transfer from the high amplitude primary wave to charged particles. The use of two parallel counter-propagating waves has recently been considered (Gell and Nakach, 1999) and numerical tests (Louies et al, 2001) have shown that the two-wave scheme may lead to higher averaged parallel velocity. On the other hand, it has been concluded that it may be more effective to accelerate electrons when the waves propagate obliquely to the external magnetic field (Karimabadi and Angelopoulos 1989, Cohen et al 1991). The idea considered here is similar although no constraint is imposed on the refraction indices of the primary and the secondary waves. The theoretical analysis of the acceleration mechanism is based on the Resonance Moments Method (RMM) in which moments of the velocity distribution are computed by using an averages over the resonant layers (RL)i only instead of a complete phase-space average. The quantities obtained using this approach, referred to as Resonant Moments (RM), suggest the existence of optimal angles of propagation for the primary and secondary waves as long as the maximization of the parallel flux of charged particles is considered. The fraction of charged particles that are close to the resonance conditions, that correspond to the RL, becomes then as important as the time these particles remain resonant. The secondary wave tends to maintain a pseudo-equilibrium velocity distribution by continuously re-filling the RL. Our suggestions are confirmed by direct numerical simulations for a populations of 105 relativistic electrons. The secondary wave yields a clear increase (up to one order of magnitude) of the average parallel velocity of the particles. It is a quite promising result since the amplitude of the secondary wave is ten times lower the one of the first wave. Qualitative results give one of the enhanced acceleration mechanisms of the charged particles (including relativistic electrons in planetary magnetospheres) by the crossed cyclotron waves in ambient magnetic field.
Discontinuous Galerkin finite element methods for radiative transfer in spherical symmetry
NASA Astrophysics Data System (ADS)
Kitzmann, D.; Bolte, J.; Patzer, A. B. C.
2016-11-01
The discontinuous Galerkin finite element method (DG-FEM) is successfully applied to treat a broad variety of transport problems numerically. In this work, we use the full capacity of the DG-FEM to solve the radiative transfer equation in spherical symmetry. We present a discontinuous Galerkin method to directly solve the spherically symmetric radiative transfer equation as a two-dimensional problem. The transport equation in spherical atmospheres is more complicated than in the plane-parallel case owing to the appearance of an additional derivative with respect to the polar angle. The DG-FEM formalism allows for the exact integration of arbitrarily complex scattering phase functions, independent of the angular mesh resolution. We show that the discontinuous Galerkin method is able to describe accurately the radiative transfer in extended atmospheres and to capture discontinuities or complex scattering behaviour which might be present in the solution of certain radiative transfer tasks and can, therefore, cause severe numerical problems for other radiative transfer solution methods.
An efficient numerical method for solving the Boltzmann equation in multidimensions
NASA Astrophysics Data System (ADS)
Dimarco, Giacomo; Loubère, Raphaël; Narski, Jacek; Rey, Thomas
2018-01-01
In this paper we deal with the extension of the Fast Kinetic Scheme (FKS) (Dimarco and Loubère, 2013 [26]) originally constructed for solving the BGK equation, to the more challenging case of the Boltzmann equation. The scheme combines a robust and fast method for treating the transport part based on an innovative Lagrangian technique supplemented with conservative fast spectral schemes to treat the collisional operator by means of an operator splitting approach. This approach along with several implementation features related to the parallelization of the algorithm permits to construct an efficient simulation tool which is numerically tested against exact and reference solutions on classical problems arising in rarefied gas dynamic. We present results up to the 3 D × 3 D case for unsteady flows for the Variable Hard Sphere model which may serve as benchmark for future comparisons between different numerical methods for solving the multidimensional Boltzmann equation. For this reason, we also provide for each problem studied details on the computational cost and memory consumption as well as comparisons with the BGK model or the limit model of compressible Euler equations.
Simulation of two-dimensional turbulent flows in a rotating annulus
NASA Astrophysics Data System (ADS)
Storey, Brian D.
2004-05-01
Rotating water tank experiments have been used to study fundamental processes of atmospheric and geophysical turbulence in a controlled laboratory setting. When these tanks are undergoing strong rotation the forced turbulent flow becomes highly two dimensional along the axis of rotation. An efficient numerical method has been developed for simulating the forced quasi-geostrophic equations in an annular geometry to model current laboratory experiments. The algorithm employs a spectral method with Fourier series and Chebyshev polynomials as basis functions. The algorithm has been implemented on a parallel architecture to allow modelling of a wide range of spatial scales over long integration times. This paper describes the derivation of the model equations, numerical method, testing and performance of the algorithm. Results provide reasonable agreement with the experimental data, indicating that such computations can be used as a predictive tool to design future experiments.
Eckert, Paulo Roberto; Goltz, Evandro Claiton; Filho, Aly Ferreira Flores
2014-01-01
This work analyses the effects of segmentation followed by parallel magnetization of ring-shaped NdFeB permanent magnets used in slotless cylindrical linear actuators. The main purpose of the work is to evaluate the effects of that segmentation on the performance of the actuator and to present a general overview of the influence of parallel magnetization by varying the number of segments and comparing the results with ideal radially magnetized rings. The analysis is first performed by modelling mathematically the radial and circumferential components of magnetization for both radial and parallel magnetizations, followed by an analysis carried out by means of the 3D finite element method. Results obtained from the models are validated by measuring radial and tangential components of magnetic flux distribution in the air gap on a prototype which employs magnet rings with eight segments each with parallel magnetization. The axial force produced by the actuator was also measured and compared with the results obtained from numerical models. Although this analysis focused on a specific topology of cylindrical actuator, the observed effects on the topology could be extended to others in which surface-mounted permanent magnets are employed, including rotating electrical machines. PMID:25051032
Eckert, Paulo Roberto; Goltz, Evandro Claiton; Flores Filho, Aly Ferreira
2014-07-21
This work analyses the effects of segmentation followed by parallel magnetization of ring-shaped NdFeB permanent magnets used in slotless cylindrical linear actuators. The main purpose of the work is to evaluate the effects of that segmentation on the performance of the actuator and to present a general overview of the influence of parallel magnetization by varying the number of segments and comparing the results with ideal radially magnetized rings. The analysis is first performed by modelling mathematically the radial and circumferential components of magnetization for both radial and parallel magnetizations, followed by an analysis carried out by means of the 3D finite element method. Results obtained from the models are validated by measuring radial and tangential components of magnetic flux distribution in the air gap on a prototype which employs magnet rings with eight segments each with parallel magnetization. The axial force produced by the actuator was also measured and compared with the results obtained from numerical models. Although this analysis focused on a specific topology of cylindrical actuator, the observed effects on the topology could be extended to others in which surface-mounted permanent magnets are employed, including rotating electrical machines.
A numerical differentiation library exploiting parallel architectures
NASA Astrophysics Data System (ADS)
Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.
2009-08-01
We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
A block-based algorithm for the solution of compressible flows in rotor-stator combinations
NASA Technical Reports Server (NTRS)
Akay, H. U.; Ecer, A.; Beskok, A.
1990-01-01
A block-based solution algorithm is developed for the solution of compressible flows in rotor-stator combinations. The method allows concurrent solution of multiple solution blocks in parallel machines. It also allows a time averaged interaction at the stator-rotor interfaces. Numerical results are presented to illustrate the performance of the algorithm. The effect of the interaction between the stator and rotor is evaluated.
Tensor methodology and computational geometry in direct computational experiments in fluid mechanics
NASA Astrophysics Data System (ADS)
Degtyarev, Alexander; Khramushin, Vasily; Shichkina, Julia
2017-07-01
The paper considers a generalized functional and algorithmic construction of direct computational experiments in fluid dynamics. Notation of tensor mathematics is naturally embedded in the finite - element operation in the construction of numerical schemes. Large fluid particle, which have a finite size, its own weight, internal displacement and deformation is considered as an elementary computing object. Tensor representation of computational objects becomes strait linear and uniquely approximation of elementary volumes and fluid particles inside them. The proposed approach allows the use of explicit numerical scheme, which is an important condition for increasing the efficiency of the algorithms developed by numerical procedures with natural parallelism. It is shown that advantages of the proposed approach are achieved among them by considering representation of large particles of a continuous medium motion in dual coordinate systems and computing operations in the projections of these two coordinate systems with direct and inverse transformations. So new method for mathematical representation and synthesis of computational experiment based on large particle method is proposed.
Parallel optimization algorithm for drone inspection in the building industry
NASA Astrophysics Data System (ADS)
Walczyński, Maciej; BoŻejko, Wojciech; Skorupka, Dariusz
2017-07-01
In this paper we present an approach for Vehicle Routing Problem with Drones (VRPD) in case of building inspection from the air. In autonomic inspection process there is a need to determine of the optimal route for inspection drone. This is especially important issue because of the very limited flight time of modern multicopters. The method of determining solutions for Traveling Salesman Problem(TSP), described in this paper bases on Parallel Evolutionary Algorithm (ParEA)with cooperative and independent approach for communication between threads. This method described first by Bożejko and Wodecki [1] bases on the observation that if exists some number of elements on certain positions in a number of permutations which are local minima, then those elements will be in the same position in the optimal solution for TSP problem. Numerical experiments were made on BEM computational cluster with using MPI library.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TETRAHEDRAL DOMAINS
Fu, Zhisong; Kirby, Robert M.; Whitaker, Ross T.
2014-01-01
Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers. PMID:25221418
PARALLEL ASSAY OF OXYGEN EQUILIBRIA OF HEMOGLOBIN
Lilly, Laura E.; Blinebry, Sara K.; Viscardi, Chelsea M.; Perez, Luis; Bonaventura, Joe; McMahon, Tim J.
2013-01-01
Methods to systematically analyze in parallel the function of multiple protein or cell samples in vivo or ex vivo (i.e. functional proteomics) in a controlled gaseous environment have thus far been limited. Here we describe an apparatus and procedure that enables, for the first time, parallel assay of oxygen equilibria in multiple samples. Using this apparatus, numerous simultaneous oxygen equilibrium curves (OECs) can be obtained under truly identical conditions from blood cell samples or purified hemoglobins (Hbs). We suggest that the ability to obtain these parallel datasets under identical conditions can be of immense value, both to biomedical researchers and clinicians who wish to monitor blood health, and to physiologists studying non-human organisms and the effects of climate change on these organisms. Parallel monitoring techniques are essential in order to better understand the functions of critical cellular proteins. The procedure can be applied to human studies, wherein an OEC can be analyzed in light of an individual’s entire genome. Here, we analyzed intraerythrocytic Hb, a protein that operates at the organism’s environmental interface and then comes into close contact with virtually all of the organism’s cells. The apparatus is theoretically scalable, and establishes a functional proteomic screen that can be correlated with genomic information on the same individuals. This new method is expected to accelerate our general understanding of protein function, an increasingly challenging objective as advances in proteomic and genomic throughput outpace the ability to study proteins’ functional properties. PMID:23827235
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Parallel Algorithms for Least Squares and Related Computations.
1991-03-22
for dense computations in linear algebra . The work has recently been published in a general reference book on parallel algorithms by SIAM. AFO SR...written his Ph.D. dissertation with the principal investigator. (See publication 6.) • Parallel Algorithms for Dense Linear Algebra Computations. Our...and describe and to put into perspective a selection of the more important parallel algorithms for numerical linear algebra . We give a major new
NASA Astrophysics Data System (ADS)
Kajikawa, Kazuhiro; Funaki, Kazuo
2011-12-01
Application of an external AC magnetic field parallel to superconducting tapes helps in eliminating the magnetization caused by the shielding current induced in the flat faces of the tapes. This method helps in realizing a magnet system with high-temperature superconducting tapes for magnetic resonance imaging (MRI) and nuclear magnetic resonance (NMR) applications. The effectiveness of the proposed method is validated by numerical calculations carried out using the finite-element method and experiments performed using a commercially available superconducting tape. The field uniformity for a single-layer solenoid coil after the application of an AC field is also estimated by a theoretical consideration.
Parallel Algorithms for Computational Models of Geophysical Systems
NASA Astrophysics Data System (ADS)
Carrillo Ledesma, A.; Herrera, I.; de la Cruz, L. M.; Hernández, G.; Grupo de Modelacion Matematica y Computacional
2013-05-01
Mathematical models of many systems of interest, including very important continuous systems of Earth Sciences and Engineering, lead to a great variety of partial differential equations (PDEs) whose solution methods are based on the computational processing of large-scale algebraic systems. Furthermore, the incredible expansion experienced by the existing computational hardware and software has made amenable to effective treatment problems of an ever increasing diversity and complexity, posed by scientific and engineering applications. Parallel computing is outstanding among the new computational tools and, in order to effectively use the most advanced computers available today, massively parallel software is required. Domain decomposition methods (DDMs) have been developed precisely for effectively treating PDEs in paralle. Ideally, the main objective of domain decomposition research is to produce algorithms capable of 'obtaining the global solution by exclusively solving local problems', but up-to-now this has only been an aspiration; that is, a strong desire for achieving such a property and so we call it 'the DDM-paradigm'. In recent times, numerically competitive DDM-algorithms are non-overlapping, preconditioned and necessarily incorporate constraints which pose an additional challenge for achieving the DDM-paradigm. Recently a group of four algorithms, referred to as the 'DVS-algorithms', which fulfill the DDM-paradigm, was developed. To derive them a new discretization method, which uses a non-overlapping system of nodes (the derived-nodes), was introduced. This discretization procedure can be applied to any boundary-value problem, or system of such equations. In turn, the resulting system of discrete equations can be treated using any available DDM-algorithm. In particular, two of the four DVS-algorithms mentioned above were obtained by application of the well-known and very effective algorithms BDDC and FETI-DP; these will be referred to as the DVS-BDDC and DVS-FETI-DP algorithms. The other two, which will be referred to as the DVS-PRIMAL and DVS-DUAL algorithms, were obtained by application of two new algorithms that had not been previously reported in the literature. As said before, the four DVS-algorithms constitute a group of preconditioned and constrained algorithms that, for the first time, fulfill the DDM-paradigm. Both, BDDC and FETI-DP, are very well-known; and both are highly efficient. Recently, it was established that these two methods are closely related and its numerical performance is quite similar. On the other hand, through numerical experiments, we have established that the numerical performances of each one of the members of DVS-algorithms group (DVS-BDDC, DVS-FETI-DP, DVS-PRIMAL and DVS-DUAL) are very similar too. Furthermore, we have carried out comparisons of the performances of the standard versions of BDDC and FETI-DP with DVS-BDDC and DVS-FETI-DP, and in all such numerical experiments the DVS algorithms have performed significantly better.
Some recent developments of the immersed interface method for flow simulation
NASA Astrophysics Data System (ADS)
Xu, Sheng
2017-11-01
The immersed interface method is a general methodology for solving PDEs subject to interfaces. In this talk, I will give an overview of some recent developments of the method toward the enhancement of its robustness for flow simulation. In particular, I will present with numerical results how to capture boundary conditions on immersed rigid objects, how to adopt interface triangulation in the method, and how to parallelize the method for flow with moving objects. With these developments, the immersed interface method can achieve accurate and efficient simulation of a flow involving multiple moving complex objects. Thanks to NSF for the support of this work under Grant NSF DMS 1320317.
Long-time atomistic simulations with the Parallel Replica Dynamics method
NASA Astrophysics Data System (ADS)
Perez, Danny
Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steich, D J; Brugger, S T; Kallman, J S
2000-02-01
This final report describes our efforts on the Three-Dimensional Massively Parallel CEM Technologies LDRD project (97-ERD-009). Significant need exists for more advanced time domain computational electromagnetics modeling. Bookkeeping details and modifying inflexible software constitute a vast majority of the effort required to address such needs. The required effort escalates rapidly as problem complexity increases. For example, hybrid meshes requiring hybrid numerics on massively parallel platforms (MPPs). This project attempts to alleviate the above limitations by investigating flexible abstractions for these numerical algorithms on MPPs using object-oriented methods, providing a programming environment insulating physics from bookkeeping. The three major design iterationsmore » during the project, known as TIGER-I to TIGER-III, are discussed. Each version of TIGER is briefly discussed along with lessons learned during the development and implementation. An Application Programming Interface (API) of the object-oriented interface for Tiger-III is included in three appendices. The three appendices contain the Utilities, Entity-Attribute, and Mesh libraries developed during the project. The API libraries represent a snapshot of our latest attempt at insulated the physics from the bookkeeping.« less
NASA Technical Reports Server (NTRS)
Mishchenko, Michael I.; Dlugach, Janna M.; Zakharova, Nadezhda T.
2016-01-01
The numerically exact superposition T-matrix method is used to model far-field electromagnetic scattering by two types of particulate object. Object 1 is a fixed configuration which consists of N identical spherical particles (with N 200 or 400) quasi-randomly populating a spherical volume V having a median size parameter of 50. Object 2 is a true discrete random medium (DRM) comprising the same number N of particles randomly moving throughout V. The median particle size parameter is fixed at 4. We show that if Object 1 is illuminated by a quasi-monochromatic parallel beam then it generates a typical speckle pattern having no resemblance to the scattering pattern generated by Object 2. However, if Object 1 is illuminated by a parallel polychromatic beam with a 10 bandwidth then it generates a scattering pattern that is largely devoid of speckles and closely reproduces the quasi-monochromatic pattern generated by Object 2. This result serves to illustrate the capacity of the concept of electromagnetic scattering by a DRM to encompass fixed quasi-random particulate samples provided that they are illuminated by polychromatic light.
NASA Astrophysics Data System (ADS)
Wu, Z.; Gao, K.; Wang, Z. L.; Shao, Q. G.; Hu, R. F.; Wei, C. X.; Zan, G. B.; Wali, F.; Luo, R. H.; Zhu, P. P.; Tian, Y. C.
2017-06-01
In X-ray grating-based phase contrast imaging, information retrieval is necessary for quantitative research, especially for phase tomography. However, numerous and repetitive processes have to be performed for tomographic reconstruction. In this paper, we report a novel information retrieval method, which enables retrieving phase and absorption information by means of a linear combination of two mutually conjugate images. Thanks to the distributive law of the multiplication as well as the commutative law and associative law of the addition, the information retrieval can be performed after tomographic reconstruction, thus simplifying the information retrieval procedure dramatically. The theoretical model of this method is established in both parallel beam geometry for Talbot interferometer and fan beam geometry for Talbot-Lau interferometer. Numerical experiments are also performed to confirm the feasibility and validity of the proposed method. In addition, we discuss its possibility in cone beam geometry and its advantages compared with other methods. Moreover, this method can also be employed in other differential phase contrast imaging methods, such as diffraction enhanced imaging, non-interferometric imaging, and edge illumination.
Parallel family trees for transfer matrices in the Potts model
NASA Astrophysics Data System (ADS)
Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo
2015-02-01
The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster scenario it was in the range p ∈ [ 8 , 10 ] . Because of the parallel capabilities of the algorithm, a large-scale execution of the parallel family trees strategy in a supercomputer could contribute to the study of wider strip lattices.
NASA Astrophysics Data System (ADS)
Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya
2017-02-01
GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.
Magnus-induced ratchet effects for skyrmions interacting with asymmetric substrates
NASA Astrophysics Data System (ADS)
Reichhardt, C.; Ray, D.; Olson Reichhardt, C. J.
2015-07-01
We show using numerical simulations that pronounced ratchet effects can occur for ac driven skyrmions moving over asymmetric quasi-one-dimensional substrates. We find a new type of ratchet effect called a Magnus-induced transverse ratchet that arises when the ac driving force is applied perpendicular rather than parallel to the asymmetry direction of the substrate. This transverse ratchet effect only occurs when the Magnus term is finite, and the threshold ac amplitude needed to induce it decreases as the Magnus term becomes more prominent. Ratcheting skyrmions follow ordered orbits in which the net displacement parallel to the substrate asymmetry direction is quantized. Skyrmion ratchets represent a new ac current-based method for controlling skyrmion positions and motion for spintronic applications.
Mixing enhancement of reacting parallel fuel jets in a supersonic combustor
NASA Technical Reports Server (NTRS)
Drummond, J. P.
1991-01-01
Pursuant to a NASA-Langley development program for a scramjet HST propulsion system entailing the optimization of the scramjet combustor's fuel-air mixing and reaction characteristics, a numerical study has been conducted of the candidate parallel fuel injectors. Attention is given to a method for flow mixing-process and combustion-efficiency enhancement in which a supersonic circular hydrogen jet coflows with a supersonic air stream. When enhanced by a planar oblique shock, the injector configuration exhibited a substantial degree of induced vorticity in the fuel stream which increased mixing and chemical reaction rates, relative to the unshocked configuration. The resulting heat release was effective in breaking down the stable hydrogen vortex pair that had inhibited more extensive fuel-air mixing.
Hypersonic Boundary Layer Instability Over a Corner
NASA Technical Reports Server (NTRS)
Balakumar, Ponnampalam; Zhao, Hong-Wu; McClinton, Charles (Technical Monitor)
2001-01-01
A boundary-layer transition study over a compression corner was conducted under a hypersonic flow condition. Due to the discontinuities in boundary layer flow, the full Navier-Stokes equations were solved to simulate the development of disturbance in the boundary layer. A linear stability analysis and PSE method were used to get the initial disturbance for parallel and non-parallel flow respectively. A 2-D code was developed to solve the full Navier-stokes by using WENO(weighted essentially non-oscillating) scheme. The given numerical results show the evolution of the linear disturbance for the most amplified disturbance in supersonic and hypersonic flow over a compression ramp. The nonlinear computations also determined the minimal amplitudes necessary to cause transition at a designed location.
Parallel Numerical Simulations of Water Reservoirs
NASA Astrophysics Data System (ADS)
Torres, Pedro; Mangiavacchi, Norberto
2010-11-01
The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.
Three-dimensional finite amplitude electroconvection in dielectric liquids
NASA Astrophysics Data System (ADS)
Luo, Kang; Wu, Jian; Yi, Hong-Liang; Tan, He-Ping
2018-02-01
Charge injection induced electroconvection in a dielectric liquid lying between two parallel plates is numerically simulated in three dimensions (3D) using a unified lattice Boltzmann method (LBM). Cellular flow patterns and their subcritical bifurcation phenomena of 3D electroconvection are numerically investigated for the first time. A unit conversion is also derived to connect the LBM system to the real physical system. The 3D LBM codes are validated by three carefully chosen cases and all results are found to be highly consistent with the analytical solutions or other numerical studies. For strong injection, the steady state roll, polygon, and square flow patterns are observed under different initial disturbances. Numerical results show that the hexagonal cell with the central region being empty of charge and centrally downward flow is preferred in symmetric systems under random initial disturbance. For weak injection, the numerical results show that the flow directly passes from the motionless state to turbulence once the system loses its linear stability. In addition, the numerically predicted linear and finite amplitude stability criteria of different flow patterns are discussed.
Numerical approaches to combustion modeling. Progress in Astronautics and Aeronautics. Vol. 135
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oran, E.S.; Boris, J.P.
1991-01-01
Various papers on numerical approaches to combustion modeling are presented. The topics addressed include; ab initio quantum chemistry for combustion; rate coefficient calculations for combustion modeling; numerical modeling of combustion of complex hydrocarbons; combustion kinetics and sensitivity analysis computations; reduction of chemical reaction models; length scales in laminar and turbulent flames; numerical modeling of laminar diffusion flames; laminar flames in premixed gases; spectral simulations of turbulent reacting flows; vortex simulation of reacting shear flow; combustion modeling using PDF methods. Also considered are: supersonic reacting internal flow fields; studies of detonation initiation, propagation, and quenching; numerical modeling of heterogeneous detonations, deflagration-to-detonationmore » transition to reactive granular materials; toward a microscopic theory of detonations in energetic crystals; overview of spray modeling; liquid drop behavior in dense and dilute clusters; spray combustion in idealized configurations: parallel drop streams; comparisons of deterministic and stochastic computations of drop collisions in dense sprays; ignition and flame spread across solid fuels; numerical study of pulse combustor dynamics; mathematical modeling of enclosure fires; nuclear systems.« less
Algebraic multigrid domain and range decomposition (AMG-DD / AMG-RD)*
Bank, R.; Falgout, R. D.; Jones, T.; ...
2015-10-29
In modern large-scale supercomputing applications, algebraic multigrid (AMG) is a leading choice for solving matrix equations. However, the high cost of communication relative to that of computation is a concern for the scalability of traditional implementations of AMG on emerging architectures. This paper introduces two new algebraic multilevel algorithms, algebraic multigrid domain decomposition (AMG-DD) and algebraic multigrid range decomposition (AMG-RD), that replace traditional AMG V-cycles with a fully overlapping domain decomposition approach. While the methods introduced here are similar in spirit to the geometric methods developed by Brandt and Diskin [Multigrid solvers on decomposed domains, in Domain Decomposition Methods inmore » Science and Engineering, Contemp. Math. 157, AMS, Providence, RI, 1994, pp. 135--155], Mitchell [Electron. Trans. Numer. Anal., 6 (1997), pp. 224--233], and Bank and Holst [SIAM J. Sci. Comput., 22 (2000), pp. 1411--1443], they differ primarily in that they are purely algebraic: AMG-RD and AMG-DD trade communication for computation by forming global composite “grids” based only on the matrix, not the geometry. (As is the usual AMG convention, “grids” here should be taken only in the algebraic sense, regardless of whether or not it corresponds to any geometry.) Another important distinguishing feature of AMG-RD and AMG-DD is their novel residual communication process that enables effective parallel computation on composite grids, avoiding the all-to-all communication costs of the geometric methods. The main purpose of this paper is to study the potential of these two algebraic methods as possible alternatives to existing AMG approaches for future parallel machines. As a result, this paper develops some theoretical properties of these methods and reports on serial numerical tests of their convergence properties over a spectrum of problem parameters.« less
SpF: Enabling Petascale Performance for Pseudospectral Dynamo Models
NASA Astrophysics Data System (ADS)
Jiang, W.; Clune, T.; Vriesema, J.; Gutmann, G.
2013-12-01
Pseudospectral (PS) methods possess a number of characteristics (e.g., efficiency, accuracy, natural boundary conditions) that are extremely desirable for dynamo models. Unfortunately, dynamo models based upon PS methods face a number of daunting challenges, which include exposing additional parallelism, leveraging hardware accelerators, exploiting hybrid parallelism, and improving the scalability of global memory transposes. Although these issues are a concern for most models, solutions for PS methods tend to require far more pervasive changes to underlying data and control structures. Further, improvements in performance in one model are difficult to transfer to other models, resulting in significant duplication of effort across the research community. We have developed an extensible software framework for pseudospectral methods called SpF that is intended to enable extreme scalability and optimal performance. High-level abstractions provided by SpF unburden applications of the responsibility of managing domain decomposition and load balance while reducing the changes in code required to adapt to new computing architectures. The key design concept in SpF is that each phase of the numerical calculation is partitioned into disjoint numerical 'kernels' that can be performed entirely in-processor. The granularity of domain-decomposition provided by SpF is only constrained by the data-locality requirements of these kernels. SpF builds on top of optimized vendor libraries for common numerical operations such as transforms, matrix solvers, etc., but can also be configured to use open source alternatives for portability. SpF includes several alternative schemes for global data redistribution and is expected to serve as an ideal testbed for further research into optimal approaches for different network architectures. In this presentation, we will describe the basic architecture of SpF as well as preliminary performance data and experience with adapting legacy dynamo codes. We will conclude with a discussion of planned extensions to SpF that will provide pseudospectral applications with additional flexibility with regard to time integration, linear solvers, and discretization in the radial direction.
Ef: Software for Nonrelativistic Beam Simulation by Particle-in-Cell Algorithm
NASA Astrophysics Data System (ADS)
Boytsov, A. Yu.; Bulychev, A. A.
2018-04-01
Understanding of particle dynamics is crucial in construction of electron guns, ion sources and other types of nonrelativistic beam devices. Apart from external guiding and focusing systems, a prominent role in evolution of such low-energy beams is played by particle-particle interaction. Numerical simulations taking into account these effects are typically accomplished by a well-known particle-in-cell method. In practice, for convenient work a simulation program should not only implement this method, but also support parallelization, provide integration with CAD systems and allow access to details of the simulation algorithm. To address the formulated requirements, development of a new open source code - Ef - has been started. It's current features and main functionality are presented. Comparison with several analytical models demonstrates good agreement between the numerical results and the theory. Further development plans are discussed.
Review of An Introduction to Parallel and Vector Scientific Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H.; Lefton, Lew
2006-06-30
On one hand, the field of high-performance scientific computing is thriving beyond measure. Performance of leading-edge systems on scientific calculations, as measured say by the Top500 list, has increased by an astounding factor of 8000 during the 15-year period from 1993 to 2008, which is slightly faster even than Moore's Law. Even more importantly, remarkable advances in numerical algorithms, numerical libraries and parallel programming environments have led to improvements in the scope of what can be computed that are entirely on a par with the advances in computing hardware. And these successes have spread far beyond the confines of largemore » government-operated laboratories, many universities, modest-sized research institutes and private firms now operate clusters that differ only in scale from the behemoth systems at the large-scale facilities. In the wake of these recent successes, researchers from fields that heretofore have not been part of the scientific computing world have been drawn into the arena. For example, at the recent SC07 conference, the exhibit hall, which long has hosted displays from leading computer systems vendors and government laboratories, featured some 70 exhibitors who had not previously participated. In spite of all these exciting developments, and in spite of the clear need to present these concepts to a much broader technical audience, there is a perplexing dearth of training material and textbooks in the field, particularly at the introductory level. Only a handful of universities offer coursework in the specific area of highly parallel scientific computing, and instructors of such courses typically rely on custom-assembled material. For example, the present reviewer and Robert F. Lucas relied on materials assembled in a somewhat ad-hoc fashion from colleagues and personal resources when presenting a course on parallel scientific computing at the University of California, Berkeley, a few years ago. Thus it is indeed refreshing to see the publication of the book An Introduction to Parallel and Vector Scientic Computing, written by Ronald W. Shonkwiler and Lew Lefton, both of the Georgia Institute of Technology. They have taken the bull by the horns and produced a book that appears to be entirely satisfactory as an introductory textbook for use in such a course. It is also of interest to the much broader community of researchers who are already in the field, laboring day by day to improve the power and performance of their numerical simulations. The book is organized into 11 chapters, plus an appendix. The first three chapters describe the basics of system architecture including vector, parallel and distributed memory systems, the details of task dependence and synchronization, and the various programming models currently in use - threads, MPI and OpenMP. Chapters four through nine provide a competent introduction to floating-point arithmetic, numerical error and numerical linear algebra. Some of the topics presented include Gaussian elimination, LU decomposition, tridiagonal systems, Givens rotations, QR decompositions, Gauss-Seidel iterations and Householder transformations. Chapters 10 and 11 introduce Monte Carlo methods and schemes for discrete optimization such as genetic algorithms.« less
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations
Mitchell, William F.
1998-01-01
Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.
Mitchell, William F
1998-01-01
Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
Lattice Boltzmann approach for complex nonequilibrium flows.
Montessori, A; Prestininzi, P; La Rocca, M; Succi, S
2015-10-01
We present a lattice Boltzmann realization of Grad's extended hydrodynamic approach to nonequilibrium flows. This is achieved by using higher-order isotropic lattices coupled with a higher-order regularization procedure. The method is assessed for flow across parallel plates and three-dimensional flows in porous media, showing excellent agreement of the mass flow with analytical and numerical solutions of the Boltzmann equation across the full range of Knudsen numbers, from the hydrodynamic regime to ballistic motion.
Gravitational field calculations on a dynamic lattice by distributed computing.
NASA Astrophysics Data System (ADS)
Mähönen, P.; Punkka, V.
A new method of calculating numerically time evolution of a gravitational field in general relativity is introduced. Vierbein (tetrad) formalism, dynamic lattice and massively parallelized computation are suggested as they are expected to speed up the calculations considerably and facilitate the solution of problems previously considered too hard to be solved, such as the time evolution of a system consisting of two or more black holes or the structure of worm holes.
Gravitation Field Calculations on a Dynamic Lattice by Distributed Computing
NASA Astrophysics Data System (ADS)
Mähönen, Petri; Punkka, Veikko
A new method of calculating numerically time evolution of a gravitational field in General Relatity is introduced. Vierbein (tetrad) formalism, dynamic lattice and massively parallelized computation are suggested as they are expected to speed up the calculations considerably and facilitate the solution of problems previously considered too hard to be solved, such as the time evolution of a system consisting of two or more black holes or the structure of worm holes.
Modeling Flue Pipes: Subsonic Flow, Lattice Boltzmann, and Parallel Distributed Computers.
1995-01-01
Abstract The problem of simulating the hydrodynamics and the acoustic waves inside wind musical instruments such as the recorder, the organ, and the ute...inside wind musical instruments such as the recorder, the organ, and the ute is considered. The problem is attacked by developing suitable local...applications such as the simulation of uid dynamics inside wind musical instruments. In the past, he has also worked on numerical methods for ordinary di
New trends in Taylor series based applications
NASA Astrophysics Data System (ADS)
Kocina, Filip; Šátek, Václav; Veigend, Petr; Nečasová, Gabriela; Valenta, Václav; Kunovský, Jiří
2016-06-01
The paper deals with the solution of large system of linear ODEs when minimal comunication among parallel processors is required. The Modern Taylor Series Method (MTSM) is used. The MTSM allows using a higher order during the computation that means a larger integration step size while keeping desired accuracy. As an example of complex systems we can take the Telegraph Equation Model. Symbolic and numeric solutions are compared when harmonic input signal is used.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont
2004-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for allmore » numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.« less
GPU accelerated study of heat transfer and fluid flow by lattice Boltzmann method on CUDA
NASA Astrophysics Data System (ADS)
Ren, Qinlong
Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1,000 US dollars. The release of NVIDIA's CUDA architecture which includes both hardware and programming environment in 2007 makes GPU computing attractive. Due to its highly parallel nature, lattice Boltzmann method is successfully ported into GPU with a performance benefit during the recent years. In the current work, LBM CUDA code is developed for different fluid flow and heat transfer problems. In this dissertation, lattice Boltzmann method and immersed boundary method are used to study natural convection in an enclosure with an array of conduting obstacles, double-diffusive convection in a vertical cavity with Soret and Dufour effects, PCM melting process in a latent heat thermal energy storage system with internal fins, mixed convection in a lid-driven cavity with a sinusoidal cylinder, and AC electrothermal pumping in microfluidic systems on a CUDA computational platform. It is demonstrated that LBM is an efficient method to simulate complex heat transfer problems using GPU on CUDA.
Extending HPF for advanced data parallel applications
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1994-01-01
The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.
Developing Information Power Grid Based Algorithms and Software
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.
NASA Astrophysics Data System (ADS)
Ayub, M.; Abbas, T.; Bhatti, M. M.
2016-06-01
The boundary layer flow of nanofluid that is electrically conducting over a Riga plate is considered. The Riga plate is an electromagnetic actuator which comprises a spanwise adjusted cluster of substituting terminal and lasting magnets mounted on a plane surface. The numerical model fuses the Brownian motion and the thermophoresis impacts because of the nanofluid and the Grinberg term for the wall parallel Lorentz force due to the Riga plate in the presence of slip effects. The numerical solution of the problem is presented using the shooting method. The novelties of all the physical parameters such as modified Hartmann number, Richardson number, nanoparticle concentration flux parameter, Prandtl number, Lewis number, thermophoresis parameter, Brownian motion parameter and slip parameter are demonstrated graphically. Numerical values of reduced Nusselt number, Sherwood number are discussed in detail.
Demi, Libertario; Verweij, Martin D; Van Dongen, Koen W A
2012-11-01
Real-time 2-D or 3-D ultrasound imaging systems are currently used for medical diagnosis. To achieve the required data acquisition rate, these systems rely on parallel beamforming, i.e., a single wide-angled beam is used for transmission and several narrow parallel beams are used for reception. When applied to harmonic imaging, the demand for high-amplitude pressure wave fields, necessary to generate the harmonic components, conflicts with the use of a wide-angled beam in transmission because this results in a large spatial decay of the acoustic pressure. To enhance the amplitude of the harmonics, it is preferable to do the reverse: transmit several narrow parallel beams and use a wide-angled beam in reception. Here, this concept is investigated to determine whether it can be used for harmonic imaging. The method proposed in this paper relies on orthogonal frequency division multiplexing (OFDM), which is used to create distinctive parallel beams in transmission. To test the proposed method, a numerical study has been performed, in which the transmit, receive, and combined beam profiles generated by a linear array have been simulated for the second-harmonic component. Compared with standard parallel beamforming, application of the proposed technique results in a gain of 12 dB for the main beam and in a reduction of the side lobes. Experimental verification in water has also been performed. Measurements obtained with a single-element emitting transducer and a hydrophone receiver confirm the possibility of exciting a practical ultrasound transducer with multiple Gaussian modulated pulses, each having a different center frequency, and the capability to generate distinguishable second-harmonic components.
Acoustic 3D modeling by the method of integral equations
NASA Astrophysics Data System (ADS)
Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.
2018-02-01
This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
NASA Astrophysics Data System (ADS)
Liang, Dong; Song, Yimin; Sun, Tao; Jin, Xueying
2017-09-01
A systematic dynamic modeling methodology is presented to develop the rigid-flexible coupling dynamic model (RFDM) of an emerging flexible parallel manipulator with multiple actuation modes. By virtue of assumed mode method, the general dynamic model of an arbitrary flexible body with any number of lumped parameters is derived in an explicit closed form, which possesses the modular characteristic. Then the completely dynamic model of system is formulated based on the flexible multi-body dynamics (FMD) theory and the augmented Lagrangian multipliers method. An approach of combining the Udwadia-Kalaba formulation with the hybrid TR-BDF2 numerical algorithm is proposed to address the nonlinear RFDM. Two simulation cases are performed to investigate the dynamic performance of the manipulator with different actuation modes. The results indicate that the redundant actuation modes can effectively attenuate vibration and guarantee higher dynamic performance compared to the traditional non-redundant actuation modes. Finally, a virtual prototype model is developed to demonstrate the validity of the presented RFDM. The systematic methodology proposed in this study can be conveniently extended for the dynamic modeling and controller design of other planar flexible parallel manipulators, especially the emerging ones with multiple actuation modes.
Computer Science Techniques Applied to Parallel Atomistic Simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
1998-03-01
Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Perspectives on the Future of CFD
NASA Technical Reports Server (NTRS)
Kwak, Dochan
2000-01-01
This viewgraph presentation gives an overview of the future of computational fluid dynamics (CFD), which in the past has pioneered the field of flow simulation. Over time CFD has progressed as computing power. Numerical methods have been advanced as CPU and memory capacity increases. Complex configurations are routinely computed now and direct numerical simulations (DNS) and large eddy simulations (LES) are used to study turbulence. As the computing resources changed to parallel and distributed platforms, computer science aspects such as scalability (algorithmic and implementation) and portability and transparent codings have advanced. Examples of potential future (or current) challenges include risk assessment, limitations of the heuristic model, and the development of CFD and information technology (IT) tools.
Propagation of elastic wave in nanoporous material with distributed cylindrical nanoholes
NASA Astrophysics Data System (ADS)
Qiang, FangWei; Wei, PeiJun; Liu, XiQiang
2013-08-01
The effective propagation constants of plane longitudinal and shear waves in nanoporous material with random distributed parallel cylindrical nanoholes are studied. The surface elastic theory is used to consider the surface stress effects and to derive the nontraditional boundary condition on the surface of nanoholes. The plane wave expansion method is used to obtain the scattering waves from the single nanohole. The multiple scattering effects are taken into consideration by summing the scattered waves from all scatterers and performing the configuration averaging of random distributed scatterers. The effective propagation constants of coherent waves along with the associated dynamic effective elastic modulus are numerically evaluated. The influences of surface stress are discussed based on the numerical results.
Earthquake Rupture Dynamics using Adaptive Mesh Refinement and High-Order Accurate Numerical Methods
NASA Astrophysics Data System (ADS)
Kozdon, J. E.; Wilcox, L.
2013-12-01
Our goal is to develop scalable and adaptive (spatial and temporal) numerical methods for coupled, multiphysics problems using high-order accurate numerical methods. To do so, we are developing an opensource, parallel library known as bfam (available at http://bfam.in). The first application to be developed on top of bfam is an earthquake rupture dynamics solver using high-order discontinuous Galerkin methods and summation-by-parts finite difference methods. In earthquake rupture dynamics, wave propagation in the Earth's crust is coupled to frictional sliding on fault interfaces. This coupling is two-way, required the simultaneous simulation of both processes. The use of laboratory-measured friction parameters requires near-fault resolution that is 4-5 orders of magnitude higher than that needed to resolve the frequencies of interest in the volume. This, along with earlier simulations using a low-order, finite volume based adaptive mesh refinement framework, suggest that adaptive mesh refinement is ideally suited for this problem. The use of high-order methods is motivated by the high level of resolution required off the fault in earlier the low-order finite volume simulations; we believe this need for resolution is a result of the excessive numerical dissipation of low-order methods. In bfam spatial adaptivity is handled using the p4est library and temporal adaptivity will be accomplished through local time stepping. In this presentation we will present the guiding principles behind the library as well as verification of code against the Southern California Earthquake Center dynamic rupture code validation test problems.
Dimensional synthesis of a 3-DOF parallel manipulator with full circle rotation
NASA Astrophysics Data System (ADS)
Ni, Yanbing; Wu, Nan; Zhong, Xueyong; Zhang, Biao
2015-07-01
Parallel robots are widely used in the academic and industrial fields. In spite of the numerous achievements in the design and dimensional synthesis of the low-mobility parallel robots, few research efforts are directed towards the asymmetric 3-DOF parallel robots whose end-effector can realize 2 translational and 1 rotational(2T1R) motion. In order to develop a manipulator with the capability of full circle rotation to enlarge the workspace, a new 2T1R parallel mechanism is proposed. The modeling approach and kinematic analysis of this proposed mechanism are investigated. Using the method of vector analysis, the inverse kinematic equations are established. This is followed by a vigorous proof that this mechanism attains an annular workspace through its circular rotation and 2 dimensional translations. Taking the first order perturbation of the kinematic equations, the error Jacobian matrix which represents the mapping relationship between the error sources of geometric parameters and the end-effector position errors is derived. With consideration of the constraint conditions of pressure angles and feasible workspace, the dimensional synthesis is conducted with a goal to minimize the global comprehensive performance index. The dimension parameters making the mechanism to have optimal error mapping and kinematic performance are obtained through the optimization algorithm. All these research achievements lay the foundation for the prototype building of such kind of parallel robots.
NASA Astrophysics Data System (ADS)
Hahn, T.
2016-10-01
The parallel version of the multidimensional numerical integration package Cuba is presented and achievable speed-ups discussed. The parallelization is based on the fork/wait POSIX functions, needs no extra software installed, imposes almost no constraints on the integrand function, and works largely automatically.
NASA Astrophysics Data System (ADS)
Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.
2011-10-01
SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.
Multilevel summation method for electrostatic force evaluation.
Hardy, David J; Wu, Zhe; Phillips, James C; Stone, John E; Skeel, Robert D; Schulten, Klaus
2015-02-10
The multilevel summation method (MSM) offers an efficient algorithm utilizing convolution for evaluating long-range forces arising in molecular dynamics simulations. Shifting the balance of computation and communication, MSM provides key advantages over the ubiquitous particle–mesh Ewald (PME) method, offering better scaling on parallel computers and permitting more modeling flexibility, with support for periodic systems as does PME but also for semiperiodic and nonperiodic systems. The version of MSM available in the simulation program NAMD is described, and its performance and accuracy are compared with the PME method. The accuracy feasible for MSM in practical applications reproduces PME results for water property calculations of density, diffusion constant, dielectric constant, surface tension, radial distribution function, and distance-dependent Kirkwood factor, even though the numerical accuracy of PME is higher than that of MSM. Excellent agreement between MSM and PME is found also for interface potentials of air–water and membrane–water interfaces, where long-range Coulombic interactions are crucial. Applications demonstrate also the suitability of MSM for systems with semiperiodic and nonperiodic boundaries. For this purpose, simulations have been performed with periodic boundaries along directions parallel to a membrane surface but not along the surface normal, yielding membrane pore formation induced by an imbalance of charge across the membrane. Using a similar semiperiodic boundary condition, ion conduction through a graphene nanopore driven by an ion gradient has been simulated. Furthermore, proteins have been simulated inside a single spherical water droplet. Finally, parallel scalability results show the ability of MSM to outperform PME when scaling a system of modest size (less than 100 K atoms) to over a thousand processors, demonstrating the suitability of MSM for large-scale parallel simulation.
Numerical characteristics of quantum computer simulation
NASA Astrophysics Data System (ADS)
Chernyavskiy, A.; Khamitov, K.; Teplov, A.; Voevodin, V.; Voevodin, Vl.
2016-12-01
The simulation of quantum circuits is significantly important for the implementation of quantum information technologies. The main difficulty of such modeling is the exponential growth of dimensionality, thus the usage of modern high-performance parallel computations is relevant. As it is well known, arbitrary quantum computation in circuit model can be done by only single- and two-qubit gates, and we analyze the computational structure and properties of the simulation of such gates. We investigate the fact that the unique properties of quantum nature lead to the computational properties of the considered algorithms: the quantum parallelism make the simulation of quantum gates highly parallel, and on the other hand, quantum entanglement leads to the problem of computational locality during simulation. We use the methodology of the AlgoWiki project (algowiki-project.org) to analyze the algorithm. This methodology consists of theoretical (sequential and parallel complexity, macro structure, and visual informational graph) and experimental (locality and memory access, scalability and more specific dynamic characteristics) parts. Experimental part was made by using the petascale Lomonosov supercomputer (Moscow State University, Russia). We show that the simulation of quantum gates is a good base for the research and testing of the development methods for data intense parallel software, and considered methodology of the analysis can be successfully used for the improvement of the algorithms in quantum information science.
Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi
2016-08-05
The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
NASA Astrophysics Data System (ADS)
Rybakin, B.; Bogatencov, P.; Secrieru, G.; Iliuha, N.
2013-10-01
The paper deals with a parallel algorithm for calculations on multiprocessor computers and GPU accelerators. The calculations of shock waves interaction with low-density bubble results and the problem of the gas flow with the forces of gravity are presented. This algorithm combines a possibility to capture a high resolution of shock waves, the second-order accuracy for TVD schemes, and a possibility to observe a low-level diffusion of the advection scheme. Many complex problems of continuum mechanics are numerically solved on structured or unstructured grids. To improve the accuracy of the calculations is necessary to choose a sufficiently small grid (with a small cell size). This leads to the drawback of a substantial increase of computation time. Therefore, for the calculations of complex problems it is reasonable to use the method of Adaptive Mesh Refinement. That is, the grid refinement is performed only in the areas of interest of the structure, where, e.g., the shock waves are generated, or a complex geometry or other such features exist. Thus, the computing time is greatly reduced. In addition, the execution of the application on the resulting sequence of nested, decreasing nets can be parallelized. Proposed algorithm is based on the AMR method. Utilization of AMR method can significantly improve the resolution of the difference grid in areas of high interest, and from other side to accelerate the processes of the multi-dimensional problems calculating. Parallel algorithms of the analyzed difference models realized for the purpose of calculations on graphic processors using the CUDA technology [1].
A Numerical Investigation of Two-Different Drosophila Forward Flight Modes
NASA Astrophysics Data System (ADS)
Sahin, Mehmet; Dilek, Ezgi; Erzincanli, Belkis
2016-11-01
The parallel large-scale unstructured finite volume method based on an Arbitrary Lagrangian-Eulerian (ALE) formulation has been applied in order to investigate the near wake structure of Drosophila in forward flight. DISTENE MeshGems-Hexa algorithm based on the octree method is used to generate the all hexahedral mesh for the wing-body combination. The mesh deformation algorithm is based on the indirect radial basis function (RBF) method at each time level while avoiding remeshing in order to enhance numerical robustness. The large-scale numerical simulations are carried out for a flapping Drosophila in forward flight. In the first case, the wing tip-path plane is tilted forward to generate forward force. In the second case, paddling wing motion is used to generate the forward fore. The λ2-criterion proposed by Jeong and Hussain (1995) is used for investigating the time variation of the Eulerian coherent structures in the near wake. The present simulations reveal highly detailed near wake topology for a hovering Drosophila. This is very useful in terms of understanding physics in biological flights which can provide a very useful tool for designing bio-inspired MAVs.
NASA Astrophysics Data System (ADS)
Markou, A. A.; Manolis, G. D.
2018-03-01
Numerical methods for the solution of dynamical problems in engineering go back to 1950. The most famous and widely-used time stepping algorithm was developed by Newmark in 1959. In the present study, for the first time, the Newmark algorithm is developed for the case of the trilinear hysteretic model, a model that was used to describe the shear behaviour of high damping rubber bearings. This model is calibrated against free-vibration field tests implemented on a hybrid base isolated building, namely the Solarino project in Italy, as well as against laboratory experiments. A single-degree-of-freedom system is used to describe the behaviour of a low-rise building isolated with a hybrid system comprising high damping rubber bearings and low friction sliding bearings. The behaviour of the high damping rubber bearings is simulated by the trilinear hysteretic model, while the description of the behaviour of the low friction sliding bearings is modeled by a linear Coulomb friction model. In order to prove the effectiveness of the numerical method we compare the analytically solved trilinear hysteretic model calibrated from free-vibration field tests (Solarino project) against the same model solved with the Newmark method with Netwon-Raphson iteration. Almost perfect agreement is observed between the semi-analytical solution and the fully numerical solution with Newmark's time integration algorithm. This will allow for extension of the trilinear mechanical models to bidirectional horizontal motion, to time-varying vertical loads, to multi-degree-of-freedom-systems, as well to generalized models connected in parallel, where only numerical solutions are possible.
Scalable Domain Decomposed Monte Carlo Particle Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, Matthew Joseph
2013-12-05
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.
NASA Technical Reports Server (NTRS)
Chang, Chau-Lyan; Venkatachari, Balaji Shankar; Cheng, Gary
2013-01-01
With the wide availability of affordable multiple-core parallel supercomputers, next generation numerical simulations of flow physics are being focused on unsteady computations for problems involving multiple time scales and multiple physics. These simulations require higher solution accuracy than most algorithms and computational fluid dynamics codes currently available. This paper focuses on the developmental effort for high-fidelity multi-dimensional, unstructured-mesh flow solvers using the space-time conservation element, solution element (CESE) framework. Two approaches have been investigated in this research in order to provide high-accuracy, cross-cutting numerical simulations for a variety of flow regimes: 1) time-accurate local time stepping and 2) highorder CESE method. The first approach utilizes consistent numerical formulations in the space-time flux integration to preserve temporal conservation across the cells with different marching time steps. Such approach relieves the stringent time step constraint associated with the smallest time step in the computational domain while preserving temporal accuracy for all the cells. For flows involving multiple scales, both numerical accuracy and efficiency can be significantly enhanced. The second approach extends the current CESE solver to higher-order accuracy. Unlike other existing explicit high-order methods for unstructured meshes, the CESE framework maintains a CFL condition of one for arbitrarily high-order formulations while retaining the same compact stencil as its second-order counterpart. For large-scale unsteady computations, this feature substantially enhances numerical efficiency. Numerical formulations and validations using benchmark problems are discussed in this paper along with realistic examples.
NASA Astrophysics Data System (ADS)
Yang, Haijian; Sun, Shuyu; Yang, Chao
2017-03-01
Most existing methods for solving two-phase flow problems in porous media do not take the physically feasible saturation fractions between 0 and 1 into account, which often destroys the numerical accuracy and physical interpretability of the simulation. To calculate the solution without the loss of this basic requirement, we introduce a variational inequality formulation of the saturation equilibrium with a box inequality constraint, and use a conservative finite element method for the spatial discretization and a backward differentiation formula with adaptive time stepping for the temporal integration. The resulting variational inequality system at each time step is solved by using a semismooth Newton algorithm. To accelerate the Newton convergence and improve the robustness, we employ a family of adaptive nonlinear elimination methods as a nonlinear preconditioner. Some numerical results are presented to demonstrate the robustness and efficiency of the proposed algorithm. A comparison is also included to show the superiority of the proposed fully implicit approach over the classical IMplicit Pressure-Explicit Saturation (IMPES) method in terms of the time step size and the total execution time measured on a parallel computer.
Bai, Shao-Yuan; Song, Zhi-Xin; Ding, Yan-Li; You, Shao-Hong; He, Shan
2014-02-01
The correlation of substrate structure and hydraulic characteristics was studied by numerical simulation combined with experimental method. The numerical simulation results showed that the permeability coefficient of matrix had a great influence on hydraulic efficiency in subsurface flow constructed wetlands. The filler with a high permeability coefficient had a worse flow field distribution in the constructed wetland with single layer structure. The layered substrate structure with the filler permeability coefficient increased from surface to bottom could avoid the short-circuited flow and dead-zones, and thus, increased the hydraulic efficiency. Two parallel pilot-scale constructed wetlands were built according to the numerical simulation results, and tracer experiments were conducted to validate the simulation results. The tracer experiment result showed that hydraulic characteristics in the layered constructed wetland were obviously better than that in the single layer system, and the substrate effective utilization rates were 0.87 and 0.49, respectively. It was appeared that numerical simulation would be favorable for substrate structure optimization in subsurface flow constructed wetlands.
Algorithms and software for solving finite element equations on serial and parallel architectures
NASA Technical Reports Server (NTRS)
George, Alan
1989-01-01
Over the past 15 years numerous new techniques have been developed for solving systems of equations and eigenvalue problems arising in finite element computations. A package called SPARSPAK has been developed by the author and his co-workers which exploits these new methods. The broad objective of this research project is to incorporate some of this software in the Computational Structural Mechanics (CSM) testbed, and to extend the techniques for use on multiprocessor architectures.
Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures
NASA Astrophysics Data System (ADS)
Biazewicz, Marek; Kurowski, Krzysztof; Ludwiczak, Bogdan; Napieraia, Krystyna
2010-09-01
Computational Fluid Dynamics (CFD) is one of the branches of fluid mechanics, which uses numerical methods and algorithms to solve and analyze fluid flows. CFD is used in various domains, such as oil and gas reservoir uncertainty analysis, aerodynamic body shapes optimization (e.g. planes, cars, ships, sport helmets, skis), natural phenomena analysis, numerical simulation for weather forecasting or realistic visualizations. CFD problem is very complex and needs a lot of computational power to obtain the results in a reasonable time. We have implemented a parallel application for two-dimensional CFD simulation with a free surface approximation (MAC method) using new hardware architectures, in particular multi-GPU and hybrid computing environments. For this purpose we decided to use NVIDIA graphic cards with CUDA environment due to its simplicity of programming and good computations performance. We used finite difference discretization of Navier-Stokes equations, where fluid is propagated over an Eulerian Grid. In this model, the behavior of the fluid inside the cell depends only on the properties of local, surrounding cells, therefore it is well suited for the GPU-based architecture. In this paper we demonstrate how to use efficiently the computing power of GPUs for CFD. Additionally, we present some best practices to help users analyze and improve the performance of CFD applications executed on GPU. Finally, we discuss various challenges around the multi-GPU implementation on the example of matrix multiplication.
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
Grayver, Alexander V.; Kolev, Tzanio V.
2015-11-01
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grayver, Alexander V.; Kolev, Tzanio V.
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Vector radiative transfer code SORD: Performance analysis and quick start guide
NASA Astrophysics Data System (ADS)
Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Alexander; Holben, Brent; Kokhanovsky, Alexander
2017-10-01
We present a new open source polarized radiative transfer code SORD written in Fortran 90/95. SORD numerically simulates propagation of monochromatic solar radiation in a plane-parallel atmosphere over a reflecting surface using the method of successive orders of scattering (hence the name). Thermal emission is ignored. We did not improve the method in any way, but report the accuracy and runtime in 52 benchmark scenarios. This paper also serves as a quick start user's guide for the code available from ftp://maiac.gsfc.nasa.gov/pub/skorkin, from the JQSRT website, or from the corresponding (first) author.
NASA Astrophysics Data System (ADS)
Mizumoto, Ikuro; Tsunematsu, Junpei; Fujii, Seiya
2016-09-01
In this paper, a design method of an output feedback control system with a simple feedforward input for a combustion model of diesel engine will be proposed based on the almost strictly positive real-ness (ASPR-ness) of the controlled system for a combustion control of diesel engines. A parallel feedforward compensator (PFC) design scheme which renders the resulting augmented controlled system ASPR will also be proposed in order to design a stable output feedback control system for the considered combustion model. The effectiveness of our proposed method will be confirmed through numerical simulations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Zhang, Yingchen; Muljadi, Eduard
In this paper, a short-term load forecasting approach based network reconfiguration is proposed in a parallel manner. Specifically, a support vector regression (SVR) based short-term load forecasting approach is designed to provide an accurate load prediction and benefit the network reconfiguration. Because of the nonconvexity of the three-phase balanced optimal power flow, a second-order cone program (SOCP) based approach is used to relax the optimal power flow problem. Then, the alternating direction method of multipliers (ADMM) is used to compute the optimal power flow in distributed manner. Considering the limited number of the switches and the increasing computation capability, themore » proposed network reconfiguration is solved in a parallel way. The numerical results demonstrate the feasible and effectiveness of the proposed approach.« less
Optimal mapping of irregular finite element domains to parallel processors
NASA Technical Reports Server (NTRS)
Flower, J.; Otto, S.; Salama, M.
1987-01-01
Mapping the solution domain of n-finite elements into N-subdomains that may be processed in parallel by N-processors is an optimal one if the subdomain decomposition results in a well-balanced workload distribution among the processors. The problem is discussed in the context of irregular finite element domains as an important aspect of the efficient utilization of the capabilities of emerging multiprocessor computers. Finding the optimal mapping is an intractable combinatorial optimization problem, for which a satisfactory approximate solution is obtained here by analogy to a method used in statistical mechanics for simulating the annealing process in solids. The simulated annealing analogy and algorithm are described, and numerical results are given for mapping an irregular two-dimensional finite element domain containing a singularity onto the Hypercube computer.
NASA Technical Reports Server (NTRS)
Gentzsch, W.
1982-01-01
Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.
Magnus-induced ratchet effects for skyrmions interacting with asymmetric substrates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reichhardt, C.; Ray, D.; Reichhardt, C. J. Olson
2015-07-31
We show using numerical simulations that pronounced ratchet effects can occur for ac driven skyrmions moving over asymmetric quasi-one-dimensional substrates. We find a new type of ratchet effect called a Magnus-induced transverse ratchet that arises when the ac driving force is applied perpendicular rather than parallel to the asymmetry direction of the substrate. This transverse ratchet effect only occurs when the Magnus term is finite, and the threshold ac amplitude needed to induce it decreases as the Magnus term becomes more prominent. Ratcheting skyrmions follow ordered orbits in which the net displacement parallel to the substrate asymmetry direction is quantized.more » As a result, skyrmion ratchets represent a new ac current-based method for controlling skyrmion positions and motion for spintronic applications.« less
Chaudhry, Jehanzeb Hameed; Estep, Don; Tavener, Simon; Carey, Varis; Sandelin, Jeff
2016-01-01
We consider numerical methods for initial value problems that employ a two stage approach consisting of solution on a relatively coarse discretization followed by solution on a relatively fine discretization. Examples include adaptive error control, parallel-in-time solution schemes, and efficient solution of adjoint problems for computing a posteriori error estimates. We describe a general formulation of two stage computations then perform a general a posteriori error analysis based on computable residuals and solution of an adjoint problem. The analysis accommodates various variations in the two stage computation and in formulation of the adjoint problems. We apply the analysis to compute "dual-weighted" a posteriori error estimates, to develop novel algorithms for efficient solution that take into account cancellation of error, and to the Parareal Algorithm. We test the various results using several numerical examples.
NASA Astrophysics Data System (ADS)
Bonavita, M.; Torrisi, L.
2005-03-01
A new data assimilation system has been designed and implemented at the National Center for Aeronautic Meteorology and Climatology of the Italian Air Force (CNMCA) in order to improve its operational numerical weather prediction capabilities and provide more accurate guidance to operational forecasters. The system, which is undergoing testing before operational use, is based on an “observation space” version of the 3D-VAR method for the objective analysis component, and on the High Resolution Regional Model (HRM) of the Deutscher Wetterdienst (DWD) for the prognostic component. Notable features of the system include a completely parallel (MPI+OMP) implementation of the solution of analysis equations by a preconditioned conjugate gradient descent method; correlation functions in spherical geometry with thermal wind constraint between mass and wind field; derivation of the objective analysis parameters from a statistical analysis of the innovation increments.
Stress-strain state of reinforced bimodulus beam on an elastic foundation
NASA Astrophysics Data System (ADS)
Beskopylny, A. N.; Kadomtseva, E. E.; Strelnikov, G. P.; Berdnik, Y. A.
2017-10-01
The paper provides the calculation theory of an arbitrary supported and arbitrary loaded reinforced beam filled with bimodulus material. The formulas determining normal stresses, bending moments, shear forces, rotation angles and a deflection of a rectangular crosssection beam reinforced with any number of bars aligned parallel to the beam axis have been obtained. The numerical study has been carried out to investigate an influence of a modulus of subgrade reaction on values of maximum normal stresses, maximum bending moments and a maximum deflection of a hinged supported beam loaded with a point force or uniform distributed load. The estimation is based on the method of initial parameters for a beam on elastic foundation and the Bubnov-Galerkin method. Values of maximum deflections, maximum bending moments and maximum stresses obtained by these methods coincide. The numerical studies show that taking into consideration the bimodulus of material leads to the necessity to calculate the strength analysis of both tensile stresses and compressive stresses.
NASA Astrophysics Data System (ADS)
Řidký, V.; Šidlof, P.; Vlček, V.
2013-04-01
The work is devoted to comparing measured data with the results of numerical simulations. As mathematical model was used mathematical model whitout turbulence for incompressible flow In the experiment was observed the behavior of designed NACA0015 airfoil in airflow. For the numerical solution was used OpenFOAM computational package, this is open-source software based on finite volume method. In the numerical solution is prescribed displacement of the airfoil, which corresponds to the experiment. The velocity at a point close to the airfoil surface is compared with the experimental data obtained from interferographic measurements of the velocity field. Numerical solution is computed on a 3D mesh composed of about 1 million ortogonal hexahedron elements. The time step is limited by the Courant number. Parallel computations are run on supercomputers of the CIV at Technical University in Prague (HAL and FOX) and on a computer cluster of the Faculty of Mechatronics of Liberec (HYDRA). Run time is fixed at five periods, the results from the fifth periods and average value for all periods are then be compared with experiment.
NASA Astrophysics Data System (ADS)
Kees, C. E.; Farthing, M. W.; Terrel, A.; Certik, O.; Seljebotn, D.
2013-12-01
This presentation will focus on two barriers to progress in the hydrological modeling community, and research and development conducted to lessen or eliminate them. The first is a barrier to sharing hydrological models among specialized scientists that is caused by intertwining the implementation of numerical methods with the implementation of abstract numerical modeling information. In the Proteus toolkit for computational methods and simulation, we have decoupled these two important parts of computational model through separate "physics" and "numerics" interfaces. More recently we have begun developing the Strong Form Language for easy and direct representation of the mathematical model formulation in a domain specific language embedded in Python. The second major barrier is sharing ANY scientific software tools that have complex library or module dependencies, as most parallel, multi-physics hydrological models must have. In this setting, users and developer are dependent on an entire distribution, possibly depending on multiple compilers and special instructions depending on the environment of the target machine. To solve these problem we have developed, hashdist, a stateless package management tool and a resulting portable, open source scientific software distribution.
Notes on the KIVA-2 software and chemically reactive fluid mechanics
NASA Astrophysics Data System (ADS)
Holst, M. J.
1992-09-01
Working notes regarding the mechanics of chemically reactive fluids with sprays, and their numerical simulation with the KIVA-2 software are presented. KIVA-2 is a large FORTRAN program developed at Los Alamos National Laboratory for internal combustion engine simulation. It is our hope that these notes summarize some of the necessary background material in fluid mechanics and combustion, explain the numerical methods currently used in KIVA-2 and similar combustion codes, and provide an outline of the overall structure of KIVA-2 as a representative combustion program, in order to aid the researcher in the task of implementing KIVA-2 or a similar combustion code on a massively parallel computer. The notes are organized into three parts as follows. In Part 1, a brief introduction to continuum mechanics, to fluid mechanics, and to the mechanics of chemically reactive fluids with sprays is presented. In Part 2, a close look at the governing equations of KIVA-2 is taken, and the methods employed in the numerical solution of these equations is discussed. Some conclusions are drawn and some observations are made in Part 3.
Ballistic Deficits for Ionization Chamber Pulses in Pulse Shaping Amplifiers
NASA Astrophysics Data System (ADS)
Kumar, G. Anil; Sharma, S. L.; Choudhury, R. K.
2007-04-01
In order to understand the dependence of the ballistic deficit on the shape of rising portion of the voltage pulse at the input of a pulse shaping amplifier, we have estimated the ballistic deficits for the pulses from a two-electrode parallel plate ionization chamber as well as for the pulses from a gridded parallel plate ionization chamber. These estimations have been made using numerical integration method when the pulses are processed through the CR-RCn (n=1-6) shaping network as well as when the pulses are processed through the complex shaping network of the ORTEC Model 472 spectroscopic amplifier. Further, we have made simulations to see the effect of ballistic deficit on the pulse-height spectra under different conditions. We have also carried out measurements of the ballistic deficits for the pulses from a two-electrode parallel plate ionization chamber as well as for the pulses from a gridded parallel plate ionization chamber when these pulses are processed through the ORTEC 572 linear amplifier having a simple CR-RC shaping network. The reasonable matching of the simulated ballistic deficits with the experimental ballistic deficits for the CR-RC shaping network clearly establishes the validity of the simulation technique
Parallel traveling-wave MRI: a feasibility study.
Pang, Yong; Vigneron, Daniel B; Zhang, Xiaoliang
2012-04-01
Traveling-wave magnetic resonance imaging utilizes far fields of a single-piece patch antenna in the magnet bore to generate radio frequency fields for imaging large-size samples, such as the human body. In this work, the feasibility of applying the "traveling-wave" technique to parallel imaging is studied using microstrip patch antenna arrays with both the numerical analysis and experimental tests. A specific patch array model is built and each array element is a microstrip patch antenna. Bench tests show that decoupling between two adjacent elements is better than -26-dB while matching of each element reaches -36-dB, demonstrating excellent isolation performance and impedance match capability. The sensitivity patterns are simulated and g-factors are calculated for both unloaded and loaded cases. The results on B 1- sensitivity patterns and g-factors demonstrate the feasibility of the traveling-wave parallel imaging. Simulations also suggest that different array configuration such as patch shape, position and orientation leads to different sensitivity patterns and g-factor maps, which provides a way to manipulate B(1) fields and improve the parallel imaging performance. The proposed method is also validated by using 7T MR imaging experiments. Copyright © 2011 Wiley-Liss, Inc.
Numerical modelling of the formation of fibrous bedding-parallel veins
NASA Astrophysics Data System (ADS)
Torremans, Koen; Muchez, Philippe; Sintubin, Manuel
2014-05-01
Bedding-parallel veins with a fibrous infill oriented orthogonal to the vein wall, are often observed in fine-grained metasedimentary sequences. Several mechanisms have been proposed for their formation, mostly with respect to effects of fluid overpressures and anisotropy of the host-rock fabric in order to explain the inferred extensional failure with sub-vertical opening. Abundant pre-folding, bedding-parallel fibrous dolomite veins are found associated with the Nkana-Mindola stratiform Cu-Co deposit in Zambia. The goal of this study is to better understand the formation mechanisms of these veins and to explain their particular spatial and thickness distribution, with respect to failure of transversely isotropic rocks. The spatial distribution and thickness variation of these veins was quantified during a field campaign in thirteen line transects perpendicular to undeformed veins in underground crosscuts. The fibrous dolomite veins studied are not related to lithological contrasts, but to a strong bedding-parallel shaly fabric, typical for the black shale facies of the Copperbelt Orebody Member. The host rock can hence be considered as transversely isotropic. Growth morphologies vary from antitaxial with a pronounced median surface to asymmetric syntaxial, always with small but quantifiable growth competition. A microstructural fabric study reveals that the undeformed dolomite veins show low-tortuosity vein walls and quantifiable growth competition. Here, we use a Discrete Element Method numerical modelling approach with ESyS-Particle (http://launchpad.net/esys-particle) to simulate the observed properties of the veins. Calibrated numerical specimens with a transversely isotropic matrix are repeatedly brought to failure under constant strain rates by changing the effective strain rates at model boundaries. After each fracture event, fractures in the numerical model are filled with cohesive vein material and the experiment is repeated. By systematically varying stress states, fluid pressures and mechanical properties of materials (host rock, vein infill and interface), we attempt to reproduce the characteristics of spatial distribution and thickness variation of the veins. Four parameter sets of mechanical micro-properties are defined in the models, essentially yielding (1) a competent and (2) incompetent matrix, (3) a vein material and (4) a vein-matrix interface. Each combination of parameters and particle packings is calibrated to fit a predetermined Mohr-Coulomb type failure envelope, via an automated calibration procedure. Preliminary tests already show that by varying these parameters, we are able to simulate realistically distributed cracking through crack-seal processes. Different types of veins and vein generations can be modelled, ranging from single veins, over crack-seal veins to anastomosing veins, by varying the mechanical strength of competent and incompetent matrix, vein and interface material. Further results of this approach will be presented. We will discuss our results with respect to mechanisms proposed in the literature for bedding-parallel, fibrous veins in metasedimentary rock sequences.
GPU surface extraction using the closest point embedding
NASA Astrophysics Data System (ADS)
Kim, Mark; Hansen, Charles
2015-01-01
Isosurface extraction is a fundamental technique used for both surface reconstruction and mesh generation. One method to extract well-formed isosurfaces is a particle system; unfortunately, particle systems can be slow. In this paper, we introduce an enhanced parallel particle system that uses the closest point embedding as the surface representation to speedup the particle system for isosurface extraction. The closest point embedding is used in the Closest Point Method (CPM), a technique that uses a standard three dimensional numerical PDE solver on two dimensional embedded surfaces. To fully take advantage of the closest point embedding, it is coupled with a Barnes-Hut tree code on the GPU. This new technique produces well-formed, conformal unstructured triangular and tetrahedral meshes from labeled multi-material volume datasets. Further, this new parallel implementation of the particle system is faster than any known methods for conformal multi-material mesh extraction. The resulting speed-ups gained in this implementation can reduce the time from labeled data to mesh from hours to minutes and benefits users, such as bioengineers, who employ triangular and tetrahedral meshes
Seamless contiguity method for parallel segmentation of remote sensing image
NASA Astrophysics Data System (ADS)
Wang, Geng; Wang, Guanghui; Yu, Mei; Cui, Chengling
2015-12-01
Seamless contiguity is the key technology for parallel segmentation of remote sensing data with large quantities. It can be effectively integrate fragments of the parallel processing into reasonable results for subsequent processes. There are numerous methods reported in the literature for seamless contiguity, such as establishing buffer, area boundary merging and data sewing. et. We proposed a new method which was also based on building buffers. The seamless contiguity processes we adopt are based on the principle: ensuring the accuracy of the boundary, ensuring the correctness of topology. Firstly, block number is computed based on data processing ability, unlike establishing buffer on both sides of block line, buffer is established just on the right side and underside of the line. Each block of data is segmented respectively and then gets the segmentation objects and their label value. Secondly, choose one block(called master block) and do stitching on the adjacent blocks(called slave block), process the rest of the block in sequence. Through the above processing, topological relationship and boundaries of master block are guaranteed. Thirdly, if the master block polygons boundaries intersect with buffer boundary and the slave blocks polygons boundaries intersect with block line, we adopt certain rules to merge and trade-offs them. Fourthly, check the topology and boundary in the buffer area. Finally, a set of experiments were conducted and prove the feasibility of this method. This novel seamless contiguity algorithm provides an applicable and practical solution for efficient segmentation of massive remote sensing image.
Parameter estimation for stiff deterministic dynamical systems via ensemble Kalman filter
NASA Astrophysics Data System (ADS)
Arnold, Andrea; Calvetti, Daniela; Somersalo, Erkki
2014-10-01
A commonly encountered problem in numerous areas of applications is to estimate the unknown coefficients of a dynamical system from direct or indirect observations at discrete times of some of the components of the state vector. A related problem is to estimate unobserved components of the state. An egregious example of such a problem is provided by metabolic models, in which the numerous model parameters and the concentrations of the metabolites in tissue are to be estimated from concentration data in the blood. A popular method for addressing similar questions in stochastic and turbulent dynamics is the ensemble Kalman filter (EnKF), a particle-based filtering method that generalizes classical Kalman filtering. In this work, we adapt the EnKF algorithm for deterministic systems in which the numerical approximation error is interpreted as a stochastic drift with variance based on classical error estimates of numerical integrators. This approach, which is particularly suitable for stiff systems where the stiffness may depend on the parameters, allows us to effectively exploit the parallel nature of particle methods. Moreover, we demonstrate how spatial prior information about the state vector, which helps the stability of the computed solution, can be incorporated into the filter. The viability of the approach is shown by computed examples, including a metabolic system modeling an ischemic episode in skeletal muscle, with a high number of unknown parameters.
NASA Astrophysics Data System (ADS)
Yunxiao, CAO; Zhiqiang, WANG; Jinjun, WANG; Guofeng, LI
2018-05-01
Electrostatic separation has been extensively used in mineral processing, and has the potential to separate gangue minerals from raw talcum ore. As for electrostatic separation, the particle charging status is one of important influence factors. To describe the talcum particle charging status in a parallel plate electrostatic separator accurately, this paper proposes a modern images processing method. Based on the actual trajectories obtained from sequence images of particle movement and the analysis of physical forces applied on a charged particle, a numerical model is built, which could calculate the charge-to-mass ratios represented as the charging status of particle and simulate the particle trajectories. The simulated trajectories agree well with the experimental results obtained by images processing. In addition, chemical composition analysis is employed to reveal the relationship between ferrum gangue mineral content and charge-to-mass ratios. Research results show that the proposed method is effective for describing the particle charging status in electrostatic separation.
Asymmetry in the Farley-Buneman dispersion relation caused by parallel electric fields
NASA Astrophysics Data System (ADS)
Forsythe, Victoriya V.; Makarevich, Roman A.
2016-11-01
An implicit assumption utilized in studies of E region plasma waves generated by the Farley-Buneman instability (FBI) is that the FBI dispersion relation and its solutions for the growth rate and phase velocity are perfectly symmetric with respect to the reversal of the wave propagation component parallel to the magnetic field. In the present study, a recently derived general dispersion relation that describes fundamental plasma instabilities in the lower ionosphere including FBI is considered and it is demonstrated that the dispersion relation is symmetric only for background electric fields that are perfectly perpendicular to the magnetic field. It is shown that parallel electric fields result in significant differences between the growth rates and phase velocities for propagation of parallel components of opposite signs. These differences are evaluated using numerical solutions of the general dispersion relation and shown to exhibit an approximately linear relationship with the parallel electric field near the E region peak altitude of 110 km. An analytic expression for the differences is also derived from an approximate version of the dispersion relation, with comparisons between numerical and analytic results agreeing near 110 km. It is further demonstrated that parallel electric fields do not change the overall symmetry when the full 3-D wave propagation vector is reversed, with no symmetry seen when either the perpendicular or parallel component is reversed. The present results indicate that moderate-to-strong parallel electric fields of 0.1-1.0 mV/m can result in experimentally measurable differences between the characteristics of plasma waves with parallel propagation components of opposite polarity.
Evaluation of a new parallel numerical parameter optimization algorithm for a dynamical system
NASA Astrophysics Data System (ADS)
Duran, Ahmet; Tuncel, Mehmet
2016-10-01
It is important to have a scalable parallel numerical parameter optimization algorithm for a dynamical system used in financial applications where time limitation is crucial. We use Message Passing Interface parallel programming and present such a new parallel algorithm for parameter estimation. For example, we apply the algorithm to the asset flow differential equations that have been developed and analyzed since 1989 (see [3-6] and references contained therein). We achieved speed-up for some time series to run up to 512 cores (see [10]). Unlike [10], we consider more extensive financial market situations, for example, in presence of low volatility, high volatility and stock market price at a discount/premium to its net asset value with varying magnitude, in this work. Moreover, we evaluated the convergence of the model parameter vector, the nonlinear least squares error and maximum improvement factor to quantify the success of the optimization process depending on the number of initial parameter vectors.
NASA Technical Reports Server (NTRS)
Hirsh, R. S.
1976-01-01
A numerical method is presented for solving the parabolic-elliptic Navier-Stokes equations. The solution procedure is applied to three-dimensional supersonic laminar jet flow issuing parallel with a supersonic free stream. A coordinate transformation is introduced which maps the boundaries at infinity into a finite computational domain in order to eliminate difficulties associated with the imposition of free-stream boundary conditions. Results are presented for an approximate circular jet, a square jet, varying aspect ratio rectangular jets, and interacting square jets. The solution behavior varies from axisymmetric to nearly two-dimensional in character. For cases where comparisons of the present results with those obtained from shear layer calculations could be made, agreement was good.
Propagation of acoustic shock waves between parallel rigid boundaries and into shadow zones
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desjouy, C., E-mail: cyril.desjouy@gmail.com; Ollivier, S.; Dragna, D.
2015-10-28
The study of acoustic shock propagation in complex environments is of great interest for urban acoustics, but also for source localization, an underlying problematic in military applications. To give a better understanding of the phenomenon taking place during the propagation of acoustic shocks, laboratory-scale experiments and numerical simulations were performed to study the propagation of weak shock waves between parallel rigid boundaries, and into shadow zones created by corners. In particular, this work focuses on the study of the local interactions taking place between incident, reflected, and diffracted waves according to the geometry in both regular or irregular – alsomore » called Von Neumann – regimes of reflection. In this latter case, an irregular reflection can lead to the formation of a Mach stem that can modify the spatial distribution of the acoustic pressure. Short duration acoustic shock waves were produced by a 20 kilovolts electric spark source and a schlieren optical method was used to visualize the incident shockfront and the reflection/diffraction patterns. Experimental results are compared to numerical simulations based on the high-order finite difference solution of the two dimensional Navier-Stokes equations.« less
NASA Astrophysics Data System (ADS)
Brunet, V.; Molton, P.; Bézard, H.; Deck, S.; Jacquin, L.
2012-01-01
This paper describes the results obtained during the European Union JEDI (JEt Development Investigations) project carried out in cooperation between ONERA and Airbus. The aim of these studies was first to acquire a complete database of a modern-type engine jet installation set under a wall-to-wall swept wing in various transonic flow conditions. Interactions between the engine jet, the pylon, and the wing were studied thanks to ¤advanced¥ measurement techniques. In parallel, accurate Reynolds-averaged Navier Stokes (RANS) simulations were carried out from simple ones with the Spalart Allmaras model to more complex ones like the DRSM-SSG (Differential Reynolds Stress Modef of Speziale Sarkar Gatski) turbulence model. In the end, Zonal-Detached Eddy Simulations (Z-DES) were also performed to compare different simulation techniques. All numerical results are accurately validated thanks to the experimental database acquired in parallel. This complete and complex study of modern civil aircraft engine installation allowed many upgrades in understanding and simulation methods to be obtained. Furthermore, a setup for engine jet installation studies has been validated for possible future works in the S3Ch transonic research wind-tunnel. The main conclusions are summed up in this paper.
NASA Astrophysics Data System (ADS)
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.
2018-05-01
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
Kushibiki, Jun-ichi; Arakawa, Mototaka; Ohashi, Yuji; Suzuki, Kouji
2006-09-01
Experimental procedures and standard specimens for characterizing and evaluating TiO2-SiO2 ultra-low expansion glasses with periodic striae using the line-focus-beam (LFB) ultrasonic material characterization system are discussed. Two types of specimens were prepared, with specimen surfaces parallel and perpendicular to the striae plane using two different grades of glass ingots. The inhomogeneities of each of the specimens were evaluated at 225 MHz. It was clarified that parallel specimens are useful for accurately measuring velocity variations of leaky surface acoustic waves (LSAWs) excited on a water-loaded specimen surface associated with the striae. Perpendicular specimens are useful for obtaining periodicities in the striae for LSAW propagation perpendicular to the striae plane on a surface and for precisely measuring averaged velocities for LSAW propagation parallel to the striae plane. The standard velocity of Rayleigh-type LSAWs traveling parallel to the striae plane for the perpendicular specimens was numerically calculated using the measured velocities of longitudinal and shear waves and density. Consequently, a reliable standard specimen with an LSAW velocity of 3308.18 +/- 0.35 m/s at 23 degrees C and its temperature coefficient of 0.39 (m/s)/degrees C was obtained for a TiO2-SiO2 glass with a TiO2 concentration of 7.09 wt%. A basis for the striae analysis using this ultrasonic method was established.
NASA Astrophysics Data System (ADS)
Konduri, Aditya
Many natural and engineering systems are governed by nonlinear partial differential equations (PDEs) which result in a multiscale phenomena, e.g. turbulent flows. Numerical simulations of these problems are computationally very expensive and demand for extreme levels of parallelism. At realistic conditions, simulations are being carried out on massively parallel computers with hundreds of thousands of processing elements (PEs). It has been observed that communication between PEs as well as their synchronization at these extreme scales take up a significant portion of the total simulation time and result in poor scalability of codes. This issue is likely to pose a bottleneck in scalability of codes on future Exascale systems. In this work, we propose an asynchronous computing algorithm based on widely used finite difference methods to solve PDEs in which synchronization between PEs due to communication is relaxed at a mathematical level. We show that while stability is conserved when schemes are used asynchronously, accuracy is greatly degraded. Since message arrivals at PEs are random processes, so is the behavior of the error. We propose a new statistical framework in which we show that average errors drop always to first-order regardless of the original scheme. We propose new asynchrony-tolerant schemes that maintain accuracy when synchronization is relaxed. The quality of the solution is shown to depend, not only on the physical phenomena and numerical schemes, but also on the characteristics of the computing machine. A novel algorithm using remote memory access communications has been developed to demonstrate excellent scalability of the method for large-scale computing. Finally, we present a path to extend this method in solving complex multi-scale problems on Exascale machines.
Astrophysical N-body Simulations Using Hierarchical Tree Data Structures
NASA Astrophysics Data System (ADS)
Warren, M. S.; Salmon, J. K.
The authors report on recent large astrophysical N-body simulations executed on the Intel Touchstone Delta system. They review the astrophysical motivation and the numerical techniques and discuss steps taken to parallelize these simulations. The methods scale as O(N log N), for large values of N, and also scale linearly with the number of processors. The performance sustained for a duration of 67 h, was between 5.1 and 5.4 Gflop/s on a 512-processor system.
Parallel FEM Simulation of Electromechanics in the Heart
NASA Astrophysics Data System (ADS)
Xia, Henian; Wong, Kwai; Zhao, Xiaopeng
2011-11-01
Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
Macro-actor execution on multilevel data-driven architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaudiot, J.L.; Najjar, W.
1988-12-31
The data-flow model of computation brings to multiprocessors high programmability at the expense of increased overhead. Applying the model at a higher level leads to better performance but also introduces loss of parallelism. We demonstrate here syntax directed program decomposition methods for the creation of large macro-actors in numerical algorithms. In order to alleviate some of the problems introduced by the lower resolution interpretation, we describe a multi-level of resolution and analyze the requirements for its actual hardware and software integration.
Experimental Validation of Numerical Simulations for an Acoustic Liner in Grazing Flow
NASA Technical Reports Server (NTRS)
Tam, Christopher K. W.; Pastouchenko, Nikolai N.; Jones, Michael G.; Watson, Willie R.
2013-01-01
A coordinated experimental and numerical simulation effort is carried out to improve our understanding of the physics of acoustic liners in a grazing flow as well our computational aeroacoustics (CAA) method prediction capability. A numerical simulation code based on advanced CAA methods is developed. In a parallel effort, experiments are performed using the Grazing Flow Impedance Tube at the NASA Langley Research Center. In the experiment, a liner is installed in the upper wall of a rectangular flow duct with a 2 inch by 2.5 inch cross section. Spatial distribution of sound pressure levels and relative phases are measured on the wall opposite the liner in the presence of a Mach 0.3 grazing flow. The computer code is validated by comparing computed results with experimental measurements. Good agreements are found. The numerical simulation code is then used to investigate the physical properties of the acoustic liner. It is shown that an acoustic liner can produce self-noise in the presence of a grazing flow and that a feedback acoustic resonance mechanism is responsible for the generation of this liner self-noise. In addition, the same mechanism also creates additional liner drag. An estimate, based on numerical simulation data, indicates that for a resonant liner with a 10% open area ratio, the drag increase would be about 4% of the turbulent boundary layer drag over a flat wall.
Parallel implementation of an adaptive and parameter-free N-body integrator
NASA Astrophysics Data System (ADS)
Pruett, C. David; Ingham, William H.; Herman, Ralph D.
2011-05-01
Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.
1991-01-01
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
NASA Astrophysics Data System (ADS)
Decyk, Viktor K.; Dauger, Dean E.
We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...
2017-03-05
Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Method and apparatus for data sampling
Odell, Daniel M. C.
1994-01-01
A method and apparatus for sampling radiation detector outputs and determining event data from the collected samples. The method uses high speed sampling of the detector output, the conversion of the samples to digital values, and the discrimination of the digital values so that digital values representing detected events are determined. The high speed sampling and digital conversion is performed by an A/D sampler that samples the detector output at a rate high enough to produce numerous digital samples for each detected event. The digital discrimination identifies those digital samples that are not representative of detected events. The sampling and discrimination also provides for temporary or permanent storage, either serially or in parallel, to a digital storage medium.
NASA Technical Reports Server (NTRS)
Hall, A. Daniel (Inventor); Davies, Francis J. (Inventor)
2007-01-01
Method and system are disclosed for determining individual string resistance in a network of strings when the current through a parallel connected string is unknown and when the voltage across a series connected string is unknown. The method/system of the invention involves connecting one or more frequency-varying impedance components with known electrical characteristics to each string and applying a frequency-varying input signal to the network of strings. The frequency-varying impedance components may be one or more capacitors, inductors, or both, and are selected so that each string is uniquely identifiable in the output signal resulting from the frequency-varying input signal. Numerical methods, such as non-linear regression, may then be used to resolve the resistance associated with each string.
Mapping implicit spectral methods to distributed memory architectures
NASA Technical Reports Server (NTRS)
Overman, Andrea L.; Vanrosendale, John
1991-01-01
Spectral methods were proven invaluable in numerical simulation of PDEs (Partial Differential Equations), but the frequent global communication required raises a fundamental barrier to their use on highly parallel architectures. To explore this issue, a 3-D implicit spectral method was implemented on an Intel hypercube. Utilization of about 50 percent was achieved on a 32 node iPSC/860 hypercube, for a 64 x 64 x 64 Fourier-spectral grid; finer grids yield higher utilizations. Chebyshev-spectral grids are more problematic, since plane-relaxation based multigrid is required. However, by using a semicoarsening multigrid algorithm, and by relaxing all multigrid levels concurrently, relatively high utilizations were also achieved in this harder case.
Studies on the interference of wings and propeller slipstreams
NASA Technical Reports Server (NTRS)
Prabhu, R. K.; Tiwari, S. N.
1985-01-01
The small disturbance potential flow theory is applied to determine the lift of an airfoil in a nonuniform parallel stream. The given stream is replaced by an equivalent stream with a certain number of velocity discontinuities, and the influence of these discontinuities is obtained by the method of images. Next, this method is extended to the problem of an airfoil in a nonuniform stream of smooth velocity profile. This model allows perturbation velocity potential in a rotational undisturbed stream. A comparison of these results with numerical solutions of Euler equations indicates that, although approximate, the present method provides useful information about the interaction problem while avoiding the need to solve the Euler equations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Shadid, John N.; Sala, Marzio
In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.
2011-07-01
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu
2012-10-01
We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Discontinuous Galerkin Methods and High-Speed Turbulent Flows
NASA Astrophysics Data System (ADS)
Atak, Muhammed; Larsson, Johan; Munz, Claus-Dieter
2014-11-01
Discontinuous Galerkin methods gain increasing importance within the CFD community as they combine arbitrary high order of accuracy in complex geometries with parallel efficiency. Particularly the discontinuous Galerkin spectral element method (DGSEM) is a promising candidate for both the direct numerical simulation (DNS) and large eddy simulation (LES) of turbulent flows due to its excellent scaling attributes. In this talk, we present a DNS of a compressible turbulent boundary layer along a flat plate at a free-stream Mach number of M = 2.67 and assess the computational efficiency of the DGSEM at performing high-fidelity simulations of both transitional and turbulent boundary layers. We compare the accuracy of the results as well as the computational performance to results using a high order finite difference method.
Parallel Computation of Flow in Heterogeneous Media Modelled by Mixed Finite Elements
NASA Astrophysics Data System (ADS)
Cliffe, K. A.; Graham, I. G.; Scheichl, R.; Stals, L.
2000-11-01
In this paper we describe a fast parallel method for solving highly ill-conditioned saddle-point systems arising from mixed finite element simulations of stochastic partial differential equations (PDEs) modelling flow in heterogeneous media. Each realisation of these stochastic PDEs requires the solution of the linear first-order velocity-pressure system comprising Darcy's law coupled with an incompressibility constraint. The chief difficulty is that the permeability may be highly variable, especially when the statistical model has a large variance and a small correlation length. For reasonable accuracy, the discretisation has to be extremely fine. We solve these problems by first reducing the saddle-point formulation to a symmetric positive definite (SPD) problem using a suitable basis for the space of divergence-free velocities. The reduced problem is solved using parallel conjugate gradients preconditioned with an algebraically determined additive Schwarz domain decomposition preconditioner. The result is a solver which exhibits a good degree of robustness with respect to the mesh size as well as to the variance and to physically relevant values of the correlation length of the underlying permeability field. Numerical experiments exhibit almost optimal levels of parallel efficiency. The domain decomposition solver (DOUG, http://www.maths.bath.ac.uk/~parsoft) used here not only is applicable to this problem but can be used to solve general unstructured finite element systems on a wide range of parallel architectures.
Dynamic analysis and control of lightweight manipulators with flexible parallel link mechanisms
NASA Technical Reports Server (NTRS)
Lee, Jeh Won
1991-01-01
The flexible parallel link mechanism is designed for increased rigidity to sustain the buckling when it carries a heavy payload. Compared to a one link flexible manipulator, a two link flexible manipulator, especially the flexible parallel mechanism, has more complicated characteristics in dynamics and control. The objective of this research is the theoretical analysis and the experimental verification of dynamics and control of a two link flexible manipulator with a flexible parallel link mechanism. Nonlinear equations of motion of the lightweight manipulator are derived by the Lagrangian method in symbolic form to better understand the structure of the dynamic model. A manipulator with a flexible parallel link mechanism is a constrained dynamic system whose equations are sensitive to numerical integration error. This constrained system is solved using singular value decomposition of the constraint Jacobian matrix. The discrepancies between the analytical model and the experiment are explained using a simplified and a detailed finite element model. The step response of the analytical model and the TREETOPS model match each other well. The nonlinear dynamics is studied using a sinusoidal excitation. The actuator dynamic effect on a flexible robot was investigated. The effects are explained by the root loci and the Bode plot theoretically and experimentally. For the base performance for the advanced control scheme, a simple decoupled feedback scheme is applied.
ERIC Educational Resources Information Center
Hyde, Daniel C.; Spelke, Elizabeth S.
2011-01-01
Behavioral research suggests that two cognitive systems are at the foundations of numerical thinking: one for representing 1-3 objects in parallel and one for representing and comparing large, approximate numerical magnitudes. We tested for dissociable neural signatures of these systems in preverbal infants by recording event-related potentials…
A Simple Application of Compressed Sensing to Further Accelerate Partially Parallel Imaging
Miao, Jun; Guo, Weihong; Narayan, Sreenath; Wilson, David L.
2012-01-01
Compressed Sensing (CS) and partially parallel imaging (PPI) enable fast MR imaging by reducing the amount of k-space data required for reconstruction. Past attempts to combine these two have been limited by the incoherent sampling requirement of CS, since PPI routines typically sample on a regular (coherent) grid. Here, we developed a new method, “CS+GRAPPA,” to overcome this limitation. We decomposed sets of equidistant samples into multiple random subsets. Then, we reconstructed each subset using CS, and averaging the results to get a final CS k-space reconstruction. We used both a standard CS, and an edge and joint-sparsity guided CS reconstruction. We tested these intermediate results on both synthetic and real MR phantom data, and performed a human observer experiment to determine the effectiveness of decomposition, and to optimize the number of subsets. We then used these CS reconstructions to calibrate the GRAPPA complex coil weights. In vivo parallel MR brain and heart data sets were used. An objective image quality evaluation metric, Case-PDM, was used to quantify image quality. Coherent aliasing and noise artifacts were significantly reduced using two decompositions. More decompositions further reduced coherent aliasing and noise artifacts but introduced blurring. However, the blurring was effectively minimized using our new edge and joint-sparsity guided CS using two decompositions. Numerical results on parallel data demonstrated that the combined method greatly improved image quality as compared to standard GRAPPA, on average halving Case-PDM scores across a range of sampling rates. The proposed technique allowed the same Case-PDM scores as standard GRAPPA, using about half the number of samples. We conclude that the new method augments GRAPPA by combining it with CS, allowing CS to work even when the k-space sampling pattern is equidistant. PMID:22902065
Two-dimensional numerical simulation of a Stirling engine heat exchanger
NASA Technical Reports Server (NTRS)
Ibrahim, Mounir; Tew, Roy C.; Dudenhoefer, James E.
1989-01-01
The first phase of an effort to develop multidimensional models of Stirling engine components is described. The ultimate goal is to model an entire engine working space. Parallel plate and tubular heat exchanger models are described, with emphasis on the central part of the channel (i.e., ignoring hydrodynamic and thermal end effects). The model assumes laminar, incompressible flow with constant thermophysical properties. In addition, a constant axial temperature gradient is imposed. The governing equations describing the model have been solved using the Crack-Nicloson finite-difference scheme. Model predictions are compared with analytical solutions for oscillating/reversing flow and heat transfer in order to check numerical accuracy. Excellent agreement is obtained for flow both in circular tubes and between parallel plates. The computational heat transfer results are in good agreement with the analytical heat transfer results for parallel plates.
NASA Astrophysics Data System (ADS)
Delhi Babu, R.; Ganesh, S.
2018-04-01
The Steady Laminar stream of an electrically directing thick, incompressible liquid between two parallel permeable plates of a divert within the sight of a transverse attractive field with an angular velocity when the liquid is being pulled back through both the dividers of the channel at a similar rate with a precise speed is examined. Numerical arrangement is acquired for various estimations of R (Suction Reynolds number) utilizing R-K Gill's technique and the diagrams of dimensionless functions f ' and f have been drawn.
Solving large sparse eigenvalue problems on supercomputers
NASA Technical Reports Server (NTRS)
Philippe, Bernard; Saad, Youcef
1988-01-01
An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
The ADER-DG method for seismic wave propagation and earthquake rupture dynamics
NASA Astrophysics Data System (ADS)
Pelties, Christian; Gabriel, Alice; Ampuero, Jean-Paul; de la Puente, Josep; Käser, Martin
2013-04-01
We will present the Arbitrary high-order DERivatives Discontinuous Galerkin (ADER-DG) method for solving the combined elastodynamic wave propagation and dynamic rupture problem. The ADER-DG method enables high-order accuracy in space and time while being implemented on unstructured tetrahedral meshes. A tetrahedral element discretization provides rapid and automatized mesh generation as well as geometrical flexibility. Features as mesh coarsening and local time stepping schemes can be applied to reduce computational efforts without introducing numerical artifacts. The method is well suited for parallelization and large scale high-performance computing since only directly neighboring elements exchange information via numerical fluxes. The concept of fluxes is a key ingredient of the numerical scheme as it governs the numerical dispersion and diffusion properties and allows to accommodate for boundary conditions, empirical friction laws of dynamic rupture processes, or the combination of different element types and non-conforming mesh transitions. After introducing fault dynamics into the ADER-DG framework, we will demonstrate its specific advantages in benchmarking test scenarios provided by the SCEC/USGS Spontaneous Rupture Code Verification Exercise. An important result of the benchmark is that the ADER-DG method avoids spurious high-frequency contributions in the slip rate spectra and therefore does not require artificial Kelvin-Voigt damping, filtering or other modifications of the produced synthetic seismograms. To demonstrate the capabilities of the proposed scheme we simulate an earthquake scenario, inspired by the 1992 Landers earthquake, that includes branching and curved fault segments. Furthermore, topography is respected in the discretized model to capture the surface waves correctly. The advanced geometrical flexibility combined with an enhanced accuracy will make the ADER-DG method a useful tool to study earthquake dynamics on complex fault systems in realistic rheologies.
Xyce parallel electronic simulator : users' guide.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-artmore » algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.« less
A new hybrid code (CHIEF) implementing the inertial electron fluid equation without approximation
NASA Astrophysics Data System (ADS)
Muñoz, P. A.; Jain, N.; Kilian, P.; Büchner, J.
2018-03-01
We present a new hybrid algorithm implemented in the code CHIEF (Code Hybrid with Inertial Electron Fluid) for simulations of electron-ion plasmas. The algorithm treats the ions kinetically, modeled by the Particle-in-Cell (PiC) method, and electrons as an inertial fluid, modeled by electron fluid equations without any of the approximations used in most of the other hybrid codes with an inertial electron fluid. This kind of code is appropriate to model a large variety of quasineutral plasma phenomena where the electron inertia and/or ion kinetic effects are relevant. We present here the governing equations of the model, how these are discretized and implemented numerically, as well as six test problems to validate our numerical approach. Our chosen test problems, where the electron inertia and ion kinetic effects play the essential role, are: 0) Excitation of parallel eigenmodes to check numerical convergence and stability, 1) parallel (to a background magnetic field) propagating electromagnetic waves, 2) perpendicular propagating electrostatic waves (ion Bernstein modes), 3) ion beam right-hand instability (resonant and non-resonant), 4) ion Landau damping, 5) ion firehose instability, and 6) 2D oblique ion firehose instability. Our results reproduce successfully the predictions of linear and non-linear theory for all these problems, validating our code. All properties of this hybrid code make it ideal to study multi-scale phenomena between electron and ion scales such as collisionless shocks, magnetic reconnection and kinetic plasma turbulence in the dissipation range above the electron scales.
Estimating non-circular motions in barred galaxies using numerical N-body simulations
NASA Astrophysics Data System (ADS)
Randriamampandry, T. H.; Combes, F.; Carignan, C.; Deg, N.
2015-12-01
The observed velocities of the gas in barred galaxies are a combination of the azimuthally averaged circular velocity and non-circular motions, primarily caused by gas streaming along the bar. These non-circular flows must be accounted for before the observed velocities can be used in mass modelling. In this work, we examine the performance of the tilted-ring method and the DISKFIT algorithm for transforming velocity maps of barred spiral galaxies into rotation curves (RCs) using simulated data. We find that the tilted-ring method, which does not account for streaming motions, under-/overestimates the circular motions when the bar is parallel/perpendicular to the projected major axis. DISKFIT, which does include streaming motions, is limited to orientations where the bar is not aligned with either the major or minor axis of the image. Therefore, we propose a method of correcting RCs based on numerical simulations of galaxies. We correct the RC derived from the tilted-ring method based on a numerical simulation of a galaxy with similar properties and projections as the observed galaxy. Using observations of NGC 3319, which has a bar aligned with the major axis, as a test case, we show that the inferred mass models from the uncorrected and corrected RCs are significantly different. These results show the importance of correcting for the non-circular motions and demonstrate that new methods of accounting for these motions are necessary as current methods fail for specific bar alignments.
Computation of an Underexpanded 3-D Rectangular Jet by the CE/SE Method
NASA Technical Reports Server (NTRS)
Loh, Ching Y.; Himansu, Ananda; Wang, Xiao Y.; Jorgenson, Philip C. E.
2000-01-01
Recently, an unstructured three-dimensional space-time conservation element and solution element (CE/SE) Euler solver was developed. Now it is also developed for parallel computation using METIS for domain decomposition and MPI (message passing interface). The method is employed here to numerically study the near-field of a typical 3-D rectangular under-expanded jet. For the computed case-a jet with Mach number Mj = 1.6. with a very modest grid of 1.7 million tetrahedrons, the flow features such as the shock-cell structures and the axis switching, are in good qualitative agreement with experimental results.
Efficient implementation of a 3-dimensional ADI method on the iPSC/860
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van der Wijngaart, R.F.
1993-12-31
A comparison is made between several domain decomposition strategies for the solution of three-dimensional partial differential equations on a MIMD distributed memory parallel computer. The grids used are structured, and the numerical algorithm is ADI. Important implementation issues regarding load balancing, storage requirements, network latency, and overlap of computations and communications are discussed. Results of the solution of the three-dimensional heat equation on the Intel iPSC/860 are presented for the three most viable methods. It is found that the Bruno-Cappello decomposition delivers optimal computational speed through an almost complete elimination of processor idle time, while providing good memory efficiency.
A multi-block adaptive solving technique based on lattice Boltzmann method
NASA Astrophysics Data System (ADS)
Zhang, Yang; Xie, Jiahua; Li, Xiaoyue; Ma, Zhenghai; Zou, Jianfeng; Zheng, Yao
2018-05-01
In this paper, a CFD parallel adaptive algorithm is self-developed by combining the multi-block Lattice Boltzmann Method (LBM) with Adaptive Mesh Refinement (AMR). The mesh refinement criterion of this algorithm is based on the density, velocity and vortices of the flow field. The refined grid boundary is obtained by extending outward half a ghost cell from the coarse grid boundary, which makes the adaptive mesh more compact and the boundary treatment more convenient. Two numerical examples of the backward step flow separation and the unsteady flow around circular cylinder demonstrate the vortex structure of the cold flow field accurately and specifically.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhaskaran-Nair, Kiran; Kowalski, Karol; Moreno, Juana
Discovery of fullerenes has opened a entirely new chapter in chemistry due to their wide range of properties which holds exciting applications in numerous disciplines of science. The Nobel Prize in Chemistry 1996 was awarded jointly to Robert F. Curl Jr., Sir Harold W. Kroto and Richard E. Smalley in recoginition for their discovery of this new carbon allotrope. In this letter we are reporting ionization potential and electron attachment studies on fullerenes (C60 and C70) obtained with novel parallel implementation of the EA-EOM-CCSD and IP-EOM-CCSD methods in NWChem program package.
NASA Astrophysics Data System (ADS)
Laubscher, Markus; Bourquin, Stéphane; Froehly, Luc; Karamata, Boris; Lasser, Theo
2004-07-01
Current spectroscopic optical coherence tomography (OCT) methods rely on a posteriori numerical calculation. We present an experimental alternative for accessing spectroscopic information in OCT without post-processing based on wavelength de-multiplexing and parallel detection using a diffraction grating and a smart pixel detector array. Both a conventional A-scan with high axial resolution and the spectrally resolved measurement are acquired simultaneously. A proof-of-principle demonstration is given on a dynamically changing absorbing sample. The method's potential for fast spectroscopic OCT imaging is discussed. The spectral measurements obtained with this approach are insensitive to scan non-linearities or sample movements.
Direct numerical simulation of instabilities in parallel flow with spherical roughness elements
NASA Technical Reports Server (NTRS)
Deanna, R. G.
1992-01-01
Results from a direct numerical simulation of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical simulation approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces simulate an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the parallel-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.
NASA Technical Reports Server (NTRS)
Chung, T. J. (Editor); Karr, Gerald R. (Editor)
1989-01-01
Recent advances in computational fluid dynamics are examined in reviews and reports, with an emphasis on finite-element methods. Sections are devoted to adaptive meshes, atmospheric dynamics, combustion, compressible flows, control-volume finite elements, crystal growth, domain decomposition, EM-field problems, FDM/FEM, and fluid-structure interactions. Consideration is given to free-boundary problems with heat transfer, free surface flow, geophysical flow problems, heat and mass transfer, high-speed flow, incompressible flow, inverse design methods, MHD problems, the mathematics of finite elements, and mesh generation. Also discussed are mixed finite elements, multigrid methods, non-Newtonian fluids, numerical dissipation, parallel vector processing, reservoir simulation, seepage, shallow-water problems, spectral methods, supercomputer architectures, three-dimensional problems, and turbulent flows.
NASA Astrophysics Data System (ADS)
Yusa, Yasunori; Okada, Hiroshi; Yamada, Tomonori; Yoshimura, Shinobu
2018-04-01
A domain decomposition method for large-scale elastic-plastic problems is proposed. The proposed method is based on a quasi-Newton method in conjunction with a balancing domain decomposition preconditioner. The use of a quasi-Newton method overcomes two problems associated with the conventional domain decomposition method based on the Newton-Raphson method: (1) avoidance of a double-loop iteration algorithm, which generally has large computational complexity, and (2) consideration of the local concentration of nonlinear deformation, which is observed in elastic-plastic problems with stress concentration. Moreover, the application of a balancing domain decomposition preconditioner ensures scalability. Using the conventional and proposed domain decomposition methods, several numerical tests, including weak scaling tests, were performed. The convergence performance of the proposed method is comparable to that of the conventional method. In particular, in elastic-plastic analysis, the proposed method exhibits better convergence performance than the conventional method.
A hybrid interface tracking - level set technique for multiphase flow with soluble surfactant
NASA Astrophysics Data System (ADS)
Shin, Seungwon; Chergui, Jalel; Juric, Damir; Kahouadji, Lyes; Matar, Omar K.; Craster, Richard V.
2018-04-01
A formulation for soluble surfactant transport in multiphase flows recently presented by Muradoglu and Tryggvason (JCP 274 (2014) 737-757) [17] is adapted to the context of the Level Contour Reconstruction Method, LCRM, (Shin et al. IJNMF 60 (2009) 753-778, [8]) which is a hybrid method that combines the advantages of the Front-tracking and Level Set methods. Particularly close attention is paid to the formulation and numerical implementation of the surface gradients of surfactant concentration and surface tension. Various benchmark tests are performed to demonstrate the accuracy of different elements of the algorithm. To verify surfactant mass conservation, values for surfactant diffusion along the interface are compared with the exact solution for the problem of uniform expansion of a sphere. The numerical implementation of the discontinuous boundary condition for the source term in the bulk concentration is compared with the approximate solution. Surface tension forces are tested for Marangoni drop translation. Our numerical results for drop deformation in simple shear are compared with experiments and results from previous simulations. All benchmarking tests compare well with existing data thus providing confidence that the adapted LCRM formulation for surfactant advection and diffusion is accurate and effective in three-dimensional multiphase flows with a structured mesh. We also demonstrate that this approach applies easily to massively parallel simulations.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haut, T. S.; Babb, T.; Martinsson, P. G.
2015-06-16
Our manuscript demonstrates a technique for efficiently solving the classical wave equation, the shallow water equations, and, more generally, equations of the form ∂u/∂t=Lu∂u/∂t=Lu, where LL is a skew-Hermitian differential operator. The idea is to explicitly construct an approximation to the time-evolution operator exp(τL)exp(τL) for a relatively large time-step ττ. Recently developed techniques for approximating oscillatory scalar functions by rational functions, and accelerated algorithms for computing functions of discretized differential operators are exploited. Principal advantages of the proposed method include: stability even for large time-steps, the possibility to parallelize in time over many characteristic wavelengths and large speed-ups over existingmore » methods in situations where simulation over long times are required. Numerical examples involving the 2D rotating shallow water equations and the 2D wave equation in an inhomogenous medium are presented, and the method is compared to the 4th order Runge–Kutta (RK4) method and to the use of Chebyshev polynomials. The new method achieved high accuracy over long-time intervals, and with speeds that are orders of magnitude faster than both RK4 and the use of Chebyshev polynomials.« less
Philip, Bobby; Berrill, Mark A.; Allu, Srikanth; ...
2015-01-26
We describe an efficient and nonlinearly consistent parallel solution methodology for solving coupled nonlinear thermal transport problems that occur in nuclear reactor applications over hundreds of individual 3D physical subdomains. Efficiency is obtained by leveraging knowledge of the physical domains, the physics on individual domains, and the couplings between them for preconditioning within a Jacobian Free Newton Krylov method. Details of the computational infrastructure that enabled this work, namely the open source Advanced Multi-Physics (AMP) package developed by the authors are described. The details of verification and validation experiments, and parallel performance analysis in weak and strong scaling studies demonstratingmore » the achieved efficiency of the algorithm are presented. Moreover, numerical experiments demonstrate that the preconditioner developed is independent of the number of fuel subdomains in a fuel rod, which is particularly important when simulating different types of fuel rods. Finally, we demonstrate the power of the coupling methodology by considering problems with couplings between surface and volume physics and coupling of nonlinear thermal transport in fuel rods to an external radiation transport code.« less
Research on the Application of Fast-steering Mirror in Stellar Interferometer
NASA Astrophysics Data System (ADS)
Mei, R.; Hu, Z. W.; Xu, T.; Sun, C. S.
2017-07-01
For a stellar interferometer, the fast-steering mirror (FSM) is widely utilized to correct wavefront tilt caused by atmospheric turbulence and internal instrumental vibration due to its high resolution and fast response frequency. In this study, the non-coplanar error between the FSM and actuator deflection axis introduced by manufacture, assembly, and adjustment is analyzed. Via a numerical method, the additional optical path difference (OPD) caused by above factors is studied, and its effects on tracking accuracy of stellar interferometer are also discussed. On the other hand, the starlight parallelism between the beams of two arms is one of the main factors of the loss of fringe visibility. By analyzing the influence of wavefront tilt caused by the atmospheric turbulence on fringe visibility, a simple and efficient real-time correction scheme of starlight parallelism is proposed based on a single array detector. The feasibility of this scheme is demonstrated by laboratory experiment. The results show that starlight parallelism meets the requirement of stellar interferometer in wavefront tilt preliminarily after the correction of fast-steering mirror.
Yang, Daejong; Kang, Kyungnam; Kim, Donghwan; Li, Zhiyong; Park, Inkyu
2015-01-01
A facile top-down/bottom-up hybrid nanofabrication process based on programmable temperature control and parallel chemical supply within microfluidic platform has been developed for the all liquid-phase synthesis of heterogeneous nanomaterial arrays. The synthesized materials and locations can be controlled by local heating with integrated microheaters and guided liquid chemical flow within microfluidic platform. As proofs-of-concept, we have demonstrated the synthesis of two types of nanomaterial arrays: (i) parallel array of TiO2 nanotubes, CuO nanospikes and ZnO nanowires, and (ii) parallel array of ZnO nanowire/CuO nanospike hybrid nanostructures, CuO nanospikes and ZnO nanowires. The laminar flow with negligible ionic diffusion between different precursor solutions as well as localized heating was verified by numerical calculation and experimental result of nanomaterial array synthesis. The devices made of heterogeneous nanomaterial array were utilized as a multiplexed sensor for toxic gases such as NO2 and CO. This method would be very useful for the facile fabrication of functional nanodevices based on highly integrated arrays of heterogeneous nanomaterials. PMID:25634814
Local SAR in Parallel Transmission Pulse Design
Lee, Joonsung; Gebhardt, Matthias; Wald, Lawrence L.; Adalsteinsson, Elfar
2011-01-01
The management of local and global power deposition in human subjects (Specific Absorption Rate, SAR) is a fundamental constraint to the application of parallel transmission (pTx) systems. Even though the pTx and single channel have to meet the same SAR requirements, the complex behavior of the spatial distribution of local SAR for transmission arrays poses problems that are not encountered in conventional single-channel systems and places additional requirements on pTx RF pulse design. We propose a pTx pulse design method which builds on recent work to capture the spatial distribution of local SAR in numerical tissue models in a compressed parameterization in order to incorporate local SAR constraints within computation times that accommodate pTx pulse design during an in vivo MRI scan. Additionally, the algorithm yields a Protocol-specific Ultimate Peak in Local SAR (PUPiL SAR), which is shown to bound the achievable peak local SAR for a given excitation profile fidelity. The performance of the approach was demonstrated using a numerical human head model and a 7T eight-channel transmit array. The method reduced peak local 10g SAR by 14–66% for slice-selective pTx excitations and 2D selective pTx excitations compared to a pTx pulse design constrained only by global SAR. The primary tradeoff incurred for reducing peak local SAR was an increase in global SAR, up to 34% for the evaluated examples, which is favorable in cases where local SAR constraints dominate the pulse applications. PMID:22083594
A survey of parallel programming tools
NASA Technical Reports Server (NTRS)
Cheng, Doreen Y.
1991-01-01
This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
On the conversion of infrared radiation from fission reactor-based photon engine into parallel beam
NASA Astrophysics Data System (ADS)
Gulevich, Andrey V.; Levchenko, Vladislav E.; Loginov, Nicolay I.; Kukharchuk, Oleg F.; Evtodiev, Denis A.; Zrodnikov, Anatoly V.
2002-01-01
The efficiency of infrared radiation conversion from photon engine based on fission reactor into parallel photon beam is discussed. Two different ways of doing that are considered. One of them is to use the parabolic mirror to convert of infrared radiation into parallel photon beam. The another one is based on the use of special lattice consisting of numerous light conductors. The experimental facility and some results are described. .
NASA Astrophysics Data System (ADS)
González, J. A.; Guzmán, F. S.
2018-03-01
We present a method for estimating the velocity of a wandering black hole and the equation of state for the gas around it based on a catalog of numerical simulations. The method uses machine-learning methods based on convolutional neural networks applied to the classification of images resulting from numerical simulations. Specifically we focus on the supersonic velocity regime and choose the direction of the black hole to be parallel to its spin. We build a catalog of 900 simulations by numerically solving Euler's equations onto the fixed space-time background of a black hole, for two parameters: the adiabatic index Γ with values in the range [1.1, 5 /3 ], and the asymptotic relative velocity of the black hole with respect to the surroundings v∞, with values within [0.2 ,0.8 ]c . For each simulation we produce a 2D image of the gas density once the process of accretion has approached a stationary regime. The results obtained show that the implemented convolutional neural networks are able to correctly classify the adiabatic index 87.78% of the time within an uncertainty of ±0.0284 , while the prediction of the velocity is correct 96.67% of the time within an uncertainty of ±0.03 c . We expect that this combination of a massive number of numerical simulations and machine-learning methods will help us analyze more complicated scenarios related to future high-resolution observations of black holes, like those from the Event Horizon Telescope.
Real Time Optima Tracking Using Harvesting Models of the Genetic Algorithm
NASA Technical Reports Server (NTRS)
Baskaran, Subbiah; Noever, D.
1999-01-01
Tracking optima in real time propulsion control, particularly for non-stationary optimization problems is a challenging task. Several approaches have been put forward for such a study including the numerical method called the genetic algorithm. In brief, this approach is built upon Darwinian-style competition between numerical alternatives displayed in the form of binary strings, or by analogy to 'pseudogenes'. Breeding of improved solution is an often cited parallel to natural selection in.evolutionary or soft computing. In this report we present our results of applying a novel model of a genetic algorithm for tracking optima in propulsion engineering and in real time control. We specialize the algorithm to mission profiling and planning optimizations, both to select reduced propulsion needs through trajectory planning and to explore time or fuel conservation strategies.
Relativistic Collisions of Highly-Charged Ions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ionescu, Dorin; Belkacem, Ali
1998-11-19
The physics of elementary atomic processes in relativistic collisions between highly-charged ions and atoms or other ions is briefly discussed, and some recent theoretical and experimental results in this field are summarized. They include excitation, capture, ionization, and electron-positron pair creation. The numerical solution of the two-center Dirac equation in momentum space is shown to be a powerful nonperturbative method for describing atomic processes in relativistic collisions involving heavy and highly-charged ions. By propagating negative-energy wave packets in time the evolution of the QED vacuum around heavy ions in relativistic motion is investigated. Recent results obtained from numerical calculations usingmore » massively parallel processing on the Cray-T3E supercomputer of the National Energy Research Scientific Computer Center (NERSC) at Berkeley National Laboratory are presented.« less
ParaExp Using Leapfrog as Integrator for High-Frequency Electromagnetic Simulations
NASA Astrophysics Data System (ADS)
Merkel, M.; Niyonzima, I.; Schöps, S.
2017-12-01
Recently, ParaExp was proposed for the time integration of linear hyperbolic problems. It splits the time interval of interest into subintervals and computes the solution on each subinterval in parallel. The overall solution is decomposed into a particular solution defined on each subinterval with zero initial conditions and a homogeneous solution propagated by the matrix exponential applied to the initial conditions. The efficiency of the method depends on fast approximations of this matrix exponential based on recent results from numerical linear algebra. This paper deals with the application of ParaExp in combination with Leapfrog to electromagnetic wave problems in time domain. Numerical tests are carried out for a simple toy problem and a realistic spiral inductor model discretized by the Finite Integration Technique.
Multiscale Multilevel Approach to Solution of Nanotechnology Problems
NASA Astrophysics Data System (ADS)
Polyakov, Sergey; Podryga, Viktoriia
2018-02-01
The paper is devoted to a multiscale multilevel approach for the solution of nanotechnology problems on supercomputer systems. The approach uses the combination of continuum mechanics models and the Newton dynamics for individual particles. This combination includes three scale levels: macroscopic, mesoscopic and microscopic. For gas-metal technical systems the following models are used. The quasihydrodynamic system of equations is used as a mathematical model at the macrolevel for gas and solid states. The system of Newton equations is used as a mathematical model at the mesoand microlevels; it is written for nanoparticles of the medium and larger particles moving in the medium. The numerical implementation of the approach is based on the method of splitting into physical processes. The quasihydrodynamic equations are solved by the finite volume method on grids of different types. The Newton equations of motion are solved by Verlet integration in each cell of the grid independently or in groups of connected cells. In the framework of the general methodology, four classes of algorithms and methods of their parallelization are provided. The parallelization uses the principles of geometric parallelism and the efficient partitioning of the computational domain. A special dynamic algorithm is used for load balancing the solvers. The testing of the developed approach was made by the example of the nitrogen outflow from a balloon with high pressure to a vacuum chamber through a micronozzle and a microchannel. The obtained results confirm the high efficiency of the developed methodology.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.
1997-01-01
Applications are described of high-performance computing methods to the numerical simulation of complete jet engines. The methodology focuses on the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field elements. New partitioned analysis procedures to treat this coupled three-component problem were developed. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. The NASA-sponsored ENG10 program was used for the global steady state analysis of the whole engine. This program uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 was developed as well as the capability for the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames.
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue
NASA Astrophysics Data System (ADS)
Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.
2017-11-01
The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
Score Equating and Nominally Parallel Language Tests.
ERIC Educational Resources Information Center
Moy, Raymond
Score equating requires that the forms to be equated are functionally parallel. That is, the two test forms should rank order examinees in a similar fashion. In language proficiency testing situations, this assumption is often put into doubt because of the numerous tests that have been proposed as measures of language proficiency and the…
NDL-v2.0: A new version of the numerical differentiation library for parallel architectures
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.
2014-07-01
We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.
Two-dimensional numerical simulation of a Stirling engine heat exchanger
NASA Technical Reports Server (NTRS)
Ibrahim, Mounir B.; Tew, Roy C.; Dudenhoefer, James E.
1989-01-01
The first phase of an effort to develop multidimensional models of Stirling engine components is described; the ultimate goal is to model an entire engine working space. More specifically, parallel plate and tubular heat exchanger models with emphasis on the central part of the channel (i.e., ignoring hydrodynamic and thermal end effects) are described. The model assumes: laminar, incompressible flow with constant thermophysical properties. In addition, a constant axial temperature gradient is imposed. The governing equations, describing the model, were solved using Crank-Nicloson finite-difference scheme. Model predictions were compared with analytical solutions for oscillating/reversing flow and heat transfer in order to check numerical accuracy. Excellent agreement was obtained for the model predictions with analytical solutions available for both flow in circular tubes and between parallel plates. Also the heat transfer computational results are in good agreement with the heat transfer analytical results for parallel plates.
NASA Astrophysics Data System (ADS)
Huang, Chien-Jung; White, Susan; Huang, Shao-Ching; Mallya, Sanjay; Eldredge, Jeff
2016-11-01
Obstructive sleep apnea (OSA) is a medical condition characterized by repetitive partial or complete occlusion of the airway during sleep. The soft tissues in the upper airway of OSA patients are prone to collapse under the low pressure loads incurred during breathing. The ultimate goal of this research is the development of a versatile numerical tool for simulation of air-tissue interactions in the patient specific upper airway geometry. This tool is expected to capture several phenomena, including flow-induced vibration (snoring) and large deformations during airway collapse of the complex airway geometry in respiratory flow conditions. Here, we present our ongoing progress toward this goal. To avoid mesh regeneration, for flow model, a sharp-interface embedded boundary method is used on Cartesian grids for resolving the fluid-structure interface, while for the structural model, a cut-cell finite element method is used. Also, to properly resolve large displacements, non-linear elasticity model is used. The fluid and structure solvers are connected with the strongly coupled iterative algorithm. The parallel computation is achieved with the numerical library PETSc. Some two- and three- dimensional preliminary results are shown to demonstrate the ability of this tool.
NASA Astrophysics Data System (ADS)
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the parallel context.
Surrogates for numerical simulations; optimization of eddy-promoter heat exchangers
NASA Technical Reports Server (NTRS)
Patera, Anthony T.; Patera, Anthony
1993-01-01
Although the advent of fast and inexpensive parallel computers has rendered numerous previously intractable calculations feasible, many numerical simulations remain too resource-intensive to be directly inserted in engineering optimization efforts. An attractive alternative to direct insertion considers models for computational systems: the expensive simulation is evoked only to construct and validate a simplified, input-output model; this simplified input-output model then serves as a simulation surrogate in subsequent engineering optimization studies. A simple 'Bayesian-validated' statistical framework for the construction, validation, and purposive application of static computer simulation surrogates is presented. As an example, dissipation-transport optimization of laminar-flow eddy-promoter heat exchangers are considered: parallel spectral element Navier-Stokes calculations serve to construct and validate surrogates for the flowrate and Nusselt number; these surrogates then represent the originating Navier-Stokes equations in the ensuing design process.
A reconstruction method for cone-beam differential x-ray phase-contrast computed tomography.
Fu, Jian; Velroyen, Astrid; Tan, Renbo; Zhang, Junwei; Chen, Liyuan; Tapfer, Arne; Bech, Martin; Pfeiffer, Franz
2012-09-10
Most existing differential phase-contrast computed tomography (DPC-CT) approaches are based on three kinds of scanning geometries, described by parallel-beam, fan-beam and cone-beam. Due to the potential of compact imaging systems with magnified spatial resolution, cone-beam DPC-CT has attracted significant interest. In this paper, we report a reconstruction method based on a back-projection filtration (BPF) algorithm for cone-beam DPC-CT. Due to the differential nature of phase contrast projections, the algorithm restrains from differentiation of the projection data prior to back-projection, unlike BPF algorithms commonly used for absorption-based CT data. This work comprises a numerical study of the algorithm and its experimental verification using a dataset measured with a three-grating interferometer and a micro-focus x-ray tube source. Moreover, the numerical simulation and experimental results demonstrate that the proposed method can deal with several classes of truncated cone-beam datasets. We believe that this feature is of particular interest for future medical cone-beam phase-contrast CT imaging applications.
Parallel Aircraft Trajectory Optimization with Analytic Derivatives
NASA Technical Reports Server (NTRS)
Falck, Robert D.; Gray, Justin S.; Naylor, Bret
2016-01-01
Trajectory optimization is an integral component for the design of aerospace vehicles, but emerging aircraft technologies have introduced new demands on trajectory analysis that current tools are not well suited to address. Designing aircraft with technologies such as hybrid electric propulsion and morphing wings requires consideration of the operational behavior as well as the physical design characteristics of the aircraft. The addition of operational variables can dramatically increase the number of design variables which motivates the use of gradient based optimization with analytic derivatives to solve the larger optimization problems. In this work we develop an aircraft trajectory analysis tool using a Legendre-Gauss-Lobatto based collocation scheme, providing analytic derivatives via the OpenMDAO multidisciplinary optimization framework. This collocation method uses an implicit time integration scheme that provides a high degree of sparsity and thus several potential options for parallelization. The performance of the new implementation was investigated via a series of single and multi-trajectory optimizations using a combination of parallel computing and constraint aggregation. The computational performance results show that in order to take full advantage of the sparsity in the problem it is vital to parallelize both the non-linear analysis evaluations and the derivative computations themselves. The constraint aggregation results showed a significant numerical challenge due to difficulty in achieving tight convergence tolerances. Overall, the results demonstrate the value of applying analytic derivatives to trajectory optimization problems and lay the foundation for future application of this collocation based method to the design of aircraft with where operational scheduling of technologies is key to achieving good performance.
On some Aitken-like acceleration of the Schwarz method
NASA Astrophysics Data System (ADS)
Garbey, M.; Tromeur-Dervout, D.
2002-12-01
In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
Detecting opportunities for parallel observations on the Hubble Space Telescope
NASA Technical Reports Server (NTRS)
Lucks, Michael
1992-01-01
The presence of multiple scientific instruments aboard the Hubble Space Telescope provides opportunities for parallel science, i.e., the simultaneous use of different instruments for different observations. Determining whether candidate observations are suitable for parallel execution depends on numerous criteria (some involving quantitative tradeoffs) that may change frequently. A knowledge based approach is presented for constructing a scoring function to rank candidate pairs of observations for parallel science. In the Parallel Observation Matching System (POMS), spacecraft knowledge and schedulers' preferences are represented using a uniform set of mappings, or knowledge functions. Assessment of parallel science opportunities is achieved via composition of the knowledge functions in a prescribed manner. The knowledge acquisition, and explanation facilities of the system are presented. The methodology is applicable to many other multiple criteria assessment problems.
Numerical Prediction of CCV in a PFI Engine using a Parallel LES Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ameen, Muhsin M; Mirzaeian, Mohsen; Millo, Federico
Cycle-to-cycle variability (CCV) is detrimental to IC engine operation and can lead to partial burn, misfire, and knock. Predicting CCV numerically is extremely challenging due to two key reasons. Firstly, high-fidelity methods such as large eddy simulation (LES) are required to accurately resolve the incylinder turbulent flowfield both spatially and temporally. Secondly, CCV is experienced over long timescales and hence the simulations need to be performed for hundreds of consecutive cycles. Ameen et al. (Int. J. Eng. Res., 2017) developed a parallel perturbation model (PPM) approach to dissociate this long time-scale problem into several shorter timescale problems. The strategy ismore » to perform multiple single-cycle simulations in parallel by effectively perturbing the initial velocity field based on the intensity of the in-cylinder turbulence. This strategy was demonstrated for motored engine and it was shown that the mean and variance of the in-cylinder flowfield was captured reasonably well by this approach. In the present study, this PPM approach is extended to simulate the CCV in a fired port-fuel injected (PFI) SI engine. Two operating conditions are considered – a medium CCV operating case corresponding to 2500 rpm and 16 bar BMEP and a low CCV case corresponding to 4000 rpm and 12 bar BMEP. The predictions from this approach are also shown to be similar to the consecutive LES cycles. Both the consecutive and PPM LES cycles are observed to under-predict the variability in the early stage of combustion. The parallel approach slightly underpredicts the cyclic variability at all stages of combustion as compared to the consecutive LES cycles. However, it is shown that the parallel approach is able to predict the coefficient of variation (COV) of the in-cylinder pressure and burn rate related parameters with sufficient accuracy, and is also able to predict the qualitative trends in CCV with changing operating conditions. The convergence of the statistics predicted by the PPM approach with respect to the number of consecutive cycles required for each parallel simulation is also investigated. It is shown that this new approach is able to give accurate predictions of the CCV in fired engines in less than one-tenth of the time required for the conventional approach of simulating consecutive engine cycles.« less
Numerical investigation of heat transfer in parallel channels with water at supercritical pressure.
Shitsi, Edward; Kofi Debrah, Seth; Yao Agbodemegbe, Vincent; Ampomah-Amoako, Emmanuel
2017-11-01
Thermal phenomena such as heat transfer enhancement, heat transfer deterioration, and flow instability observed at supercritical pressures as a result of fluid property variations have the potential to affect the safety of design and operation of Supercritical Water-cooled Reactor SCWR, and also challenge the capabilities of both heat transfer correlations and Computational Fluid Dynamics CFD physical models. These phenomena observed at supercritical pressures need to be thoroughly investigated. An experimental study was carried out by Xi to investigate flow instability in parallel channels at supercritical pressures under different mass flow rates, pressures, and axial power shapes. Experimental data on flow instability at inlet of the heated channels were obtained but no heat transfer data along the axial length was obtained. This numerical study used 3D numerical tool STAR-CCM+ to investigate heat transfer at supercritical pressures along the axial lengths of the parallel channels with water ahead of experimental data. Homogeneous axial power shape HAPS was adopted and the heating powers adopted in this work were below the experimental threshold heating powers obtained for HAPS by Xi. The results show that the Fluid Centre-line Temperature FCLT increased linearly below and above the PCT region, but flattened at the PCT region for all the system parameters considered. The inlet temperature, heating power, pressure, gravity and mass flow rate have effects on WT (wall temperature) values in the NHT (normal heat transfer), EHT (enhanced heat transfer), DHT (deteriorated heat transfer) and recovery from DHT regions. While variation of all other system parameters in the EHT and PCT regions showed no significant difference in the WT and FCLT values respectively, the WT and FCLT values respectively increased with pressure in these regions. For most of the system parameters considered, the FCLT and WT values obtained in the two channels were nearly the same. The numerical study was not quantitatively compared with experimental data along the axial lengths of the parallel channels, but it was observed that the numerical tool STAR-CCM+ adopted was able to capture the trends for NHT, EHT, DHT and recovery from DHT regions. The heating powers used for the various simulations were below the experimentally observed threshold heating powers, but heat transfer deterioration HTD was observed, confirming the previous finding that HTD could occur before the occurrence of unstable behavior at supercritical pressures. For purposes of comparing the results of numerical simulations with experimental data, the heat transfer data on temperature oscillations obtained at the outlet of the heated channels and instability boundary results obtained at the inlet of the heated channels were compared. The numerical results obtained quite well agree with the experimental data. This work calls for provision of experimental data on heat transfer in parallel channels at supercritical pressures for validation of similar numerical studies.
Method and apparatus for data sampling
Odell, D.M.C.
1994-04-19
A method and apparatus for sampling radiation detector outputs and determining event data from the collected samples is described. The method uses high speed sampling of the detector output, the conversion of the samples to digital values, and the discrimination of the digital values so that digital values representing detected events are determined. The high speed sampling and digital conversion is performed by an A/D sampler that samples the detector output at a rate high enough to produce numerous digital samples for each detected event. The digital discrimination identifies those digital samples that are not representative of detected events. The sampling and discrimination also provides for temporary or permanent storage, either serially or in parallel, to a digital storage medium. 6 figures.
Direct Numerical Simulation of Turbulent Flow Over Complex Bathymetry
NASA Astrophysics Data System (ADS)
Yue, L.; Hsu, T. J.
2017-12-01
Direct numerical simulation (DNS) is regarded as a powerful tool in the investigation of turbulent flow featured with a wide range of time and spatial scales. With the application of coordinate transformation in a pseudo-spectral scheme, a parallelized numerical modeling system was created aiming at simulating flow over complex bathymetry with high numerical accuracy and efficiency. The transformed governing equations were integrated in time using a third-order low-storage Runge-Kutta method. For spatial discretization, the discrete Fourier expansion was adopted in the streamwise and spanwise direction, enforcing the periodic boundary condition in both directions. The Chebyshev expansion on Chebyshev-Gauss-Lobatto points was used in the wall-normal direction, assuming there is no-slip on top and bottom walls. The diffusion terms were discretized with a Crank-Nicolson scheme, while the advection terms dealiased with the 2/3 rule were discretized with an Adams-Bashforth scheme. In the prediction step, the velocity was calculated in physical domain by solving the resulting linear equation directly. However, the extra terms introduced by coordinate transformation impose a strict limitation to time step and an iteration method was applied to overcome this restriction in the correction step for pressure by solving the Helmholtz equation. The numerical solver is written in object-oriented C++ programing language utilizing Armadillo linear algebra library for matrix computation. Several benchmarking cases in laminar and turbulent flow were carried out to verify/validate the numerical model and very good agreements are achieved. Ongoing work focuses on implementing sediment transport capability for multiple sediment classes and parameterizations for flocculation processes.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. Copyright © 2013 Wiley Periodicals, Inc.
Pebay, Philippe; Terriberry, Timothy B.; Kolla, Hemanth; ...
2016-03-29
Formulas for incremental or parallel computation of second order central moments have long been known, and recent extensions of these formulas to univariate and multivariate moments of arbitrary order have been developed. Such formulas are of key importance in scenarios where incremental results are required and in parallel and distributed systems where communication costs are high. We survey these recent results, and improve them with arbitrary-order, numerically stable one-pass formulas which we further extend with weighted and compound variants. We also develop a generalized correction factor for standard two-pass algorithms that enables the maintenance of accuracy over nearly the fullmore » representable range of the input, avoiding the need for extended-precision arithmetic. We then empirically examine algorithm correctness for pairwise update formulas up to order four as well as condition number and relative error bounds for eight different central moment formulas, each up to degree six, to address the trade-offs between numerical accuracy and speed of the various algorithms. Finally, we demonstrate the use of the most elaborate among the above mentioned formulas, with the utilization of the compound moments for a practical large-scale scientific application.« less
Massive parallel 3D PIC simulation of negative ion extraction
NASA Astrophysics Data System (ADS)
Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu
2017-09-01
The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
Discontinuous finite element method for vector radiative transfer
NASA Astrophysics Data System (ADS)
Wang, Cun-Hai; Yi, Hong-Liang; Tan, He-Ping
2017-03-01
The discontinuous finite element method (DFEM) is applied to solve the vector radiative transfer in participating media. The derivation in a discrete form of the vector radiation governing equations is presented, in which the angular space is discretized by the discrete-ordinates approach with a local refined modification, and the spatial domain is discretized into finite non-overlapped discontinuous elements. The elements in the whole solution domain are connected by modelling the boundary numerical flux between adjacent elements, which makes the DFEM numerically stable for solving radiative transfer equations. Several various problems of vector radiative transfer are tested to verify the performance of the developed DFEM, including vector radiative transfer in a one-dimensional parallel slab containing a Mie/Rayleigh/strong forward scattering medium and a two-dimensional square medium. The fact that DFEM results agree very well with the benchmark solutions in published references shows that the developed DFEM in this paper is accurate and effective for solving vector radiative transfer problems.
Parallel-vector out-of-core equation solver for computational mechanics
NASA Technical Reports Server (NTRS)
Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.
1993-01-01
A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
NASA Astrophysics Data System (ADS)
Vasko, I.; Agapitov, O. V.; Mozer, F.; Bonnell, J. W.; Krasnoselskikh, V.; Artemyev, A.; Drake, J. F.
2017-12-01
Chorus waves observed in the Earth inner magnetosphere sometimes exhibit significantly distorted (nonharmonic) parallel electric field waveform. In spectrograms these waveform features show up as overtones of chorus wave. In this work we show that the chorus wave parallel electric field is distorted due to finite temperature of electrons. The distortion of the parallel electric field is described analytically and reproduced in the numerical fluid simulations. Due to this effect the chorus energy is transferred to higher frequencies making possible efficient scattering of low ( a few keV) energy electrons.
Algorithm-Based Fault Tolerance for Numerical Subroutines
NASA Technical Reports Server (NTRS)
Tumon, Michael; Granat, Robert; Lou, John
2007-01-01
A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hua, T.Q.; Walker, J.S.; Picologlou, B.F.
1988-07-01
Magnetohydrodynamic flows of liquid metals in rectangular ducts with thin conducting walls in the presence of strong nonuniform transverse magnetic fields are examined. The interaction parameter and Hartmann number are assumed to be large, whereas the magnetic Reynolds number is assumed to be small. Under these assumptions, viscous and inertial effects are confined in very thin boundary layers adjacent to the walls. A significant fraction of the fluid flow is concentrated in the boundary layers adjacent to the side walls which are parallel to the magnetic field. This paper describes the analysis and numerical methods for obtaining 3-D solutions formore » flow parameters outside these layers, without solving explicitly for the layers themselves. Numerical solutions are presented for cases which are relevant to the flows of liquid metals in fusion reactor blankets. Experimental results obtained from the ALEX experiments at Argonne National Laboratory are used to validate the numerical code. In general, the agreement is excellent. 5 refs., 14 figs.« less
Efficient Simulation of Compressible, Viscous Fluids using Multi-rate Time Integration
NASA Astrophysics Data System (ADS)
Mikida, Cory; Kloeckner, Andreas; Bodony, Daniel
2017-11-01
In the numerical simulation of problems of compressible, viscous fluids with single-rate time integrators, the global timestep used is limited to that of the finest mesh point or fastest physical process. This talk discusses the application of multi-rate Adams-Bashforth (MRAB) integrators to an overset mesh framework to solve compressible viscous fluid problems of varying scale with improved efficiency, with emphasis on the strategy of timescale separation and the application of the resulting numerical method to two sample problems: subsonic viscous flow over a cylinder and a viscous jet in crossflow. The results presented indicate the numerical efficacy of MRAB integrators, outline a number of outstanding code challenges, demonstrate the expected reduction in time enabled by MRAB, and emphasize the need for proper load balancing through spatial decomposition in order for parallel runs to achieve the predicted time-saving benefit. This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374.
NASA Astrophysics Data System (ADS)
Raeli, Alice; Bergmann, Michel; Iollo, Angelo
2018-02-01
We consider problems governed by a linear elliptic equation with varying coefficients across internal interfaces. The solution and its normal derivative can undergo significant variations through these internal boundaries. We present a compact finite-difference scheme on a tree-based adaptive grid that can be efficiently solved using a natively parallel data structure. The main idea is to optimize the truncation error of the discretization scheme as a function of the local grid configuration to achieve second-order accuracy. Numerical illustrations are presented in two and three-dimensional configurations.
Efficient calculation of atomic rate coefficients in dense plasmas
NASA Astrophysics Data System (ADS)
Aslanyan, Valentin; Tallents, Greg J.
2017-03-01
Modelling electron statistics in a cold, dense plasma by the Fermi-Dirac distribution leads to complications in the calculations of atomic rate coefficients. The Pauli exclusion principle slows down the rate of collisions as electrons must find unoccupied quantum states and adds a further computational cost. Methods to calculate these coefficients by direct numerical integration with a high degree of parallelism are presented. This degree of optimization allows the effects of degeneracy to be incorporated into a time-dependent collisional-radiative model. Example results from such a model are presented.
A study of optical scattering methods in laboratory plasma diagnosis
NASA Technical Reports Server (NTRS)
Phipps, C. R., Jr.
1972-01-01
Electron velocity distributions are deduced along axes parallel and perpendicular to the magnetic field in a pulsed, linear Penning discharge in hydrogen by means of a laser Thomson scattering experiment. Results obtained are numerical averages of many individual measurements made at specific space-time points in the plasma evolution. Because of the high resolution in k-space and the relatively low maximum electron density 2 x 10 to the 13th power/cu cm, special techniques were required to obtain measurable scattering signals. These techniques are discussed and experimental results are presented.
UPC++ Programmer’s Guide (v1.0 2017.9)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bachan, J.; Baden, S.; Bonachea, D.
UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
UPC++ Programmer’s Guide, v1.0-2018.3.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bachan, J.; Baden, S.; Bonachea, Dan
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operationsmore » that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less