parallel computational fluid: Topics by Science.gov

Sample records for parallel computational fluid

Parallel computational fluid dynamics '91; Conference Proceedings, Stuttgart, Germany, Jun. 10-12, 1991

NASA Technical Reports Server (NTRS)

Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)

1992-01-01

A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
A comparative study of serial and parallel aeroelastic computations of wings

NASA Technical Reports Server (NTRS)

Byun, Chansup; Guruswamy, Guru P.

1994-01-01

A procedure for computing the aeroelasticity of wings on parallel multiple-instruction, multiple-data (MIMD) computers is presented. In this procedure, fluids are modeled using Euler equations, and structures are modeled using modal or finite element equations. The procedure is designed in such a way that each discipline can be developed and maintained independently by using a domain decomposition approach. In the present parallel procedure, each computational domain is scalable. A parallel integration scheme is used to compute aeroelastic responses by solving fluid and structural equations concurrently. The computational efficiency issues of parallel integration of both fluid and structural equations are investigated in detail. This approach, which reduces the total computational time by a factor of almost 2, is demonstrated for a typical aeroelastic wing by using various numbers of processors on the Intel iPSC/860.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

NASA Astrophysics Data System (ADS)

Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.

1995-03-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simunovic, S.; Zacharia, T.; Baltas, N.

1995-04-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Parallel aeroelastic computations for wing and wing-body configurations

NASA Technical Reports Server (NTRS)

Byun, Chansup

1994-01-01

The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Parallel Computational Fluid Dynamics: Current Status and Future Requirements

NASA Technical Reports Server (NTRS)

Simon, Horst D.; VanDalsem, William R.; Dagum, Leonardo; Kutler, Paul (Technical Monitor)

1994-01-01

One or the key objectives of the Applied Research Branch in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Allies Research Center is the accelerated introduction of highly parallel machines into a full operational environment. In this report we discuss the performance results obtained from the implementation of some computational fluid dynamics (CFD) applications on the Connection Machine CM-2 and the Intel iPSC/860. We summarize some of the experiences made so far with the parallel testbed machines at the NAS Applied Research Branch. Then we discuss the long term computational requirements for accomplishing some of the grand challenge problems in computational aerosciences. We argue that only massively parallel machines will be able to meet these grand challenge requirements, and we outline the computer science and algorithm research challenges ahead.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers

NASA Technical Reports Server (NTRS)

Guruswamy, Guru; VanDalsem, William (Technical Monitor)

1994-01-01

Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
Parallel Three-Dimensional Computation of Fluid Dynamics and Fluid-Structure Interactions of Ram-Air Parachutes

NASA Technical Reports Server (NTRS)

Tezduyar, Tayfun E.

1998-01-01

This is a final report as far as our work at University of Minnesota is concerned. The report describes our research progress and accomplishments in development of high performance computing methods and tools for 3D finite element computation of aerodynamic characteristics and fluid-structure interactions (FSI) arising in airdrop systems, namely ram-air parachutes and round parachutes. This class of simulations involves complex geometries, flexible structural components, deforming fluid domains, and unsteady flow patterns. The key components of our simulation toolkit are a stabilized finite element flow solver, a nonlinear structural dynamics solver, an automatic mesh moving scheme, and an interface between the fluid and structural solvers; all of these have been developed within a parallel message-passing paradigm.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Simulating coupled dynamics of a rigid-flexible multibody system and compressible fluid

NASA Astrophysics Data System (ADS)

Hu, Wei; Tian, Qiang; Hu, HaiYan

2018-04-01

As a subsequent work of previous studies of authors, a new parallel computation approach is proposed to simulate the coupled dynamics of a rigid-flexible multibody system and compressible fluid. In this approach, the smoothed particle hydrodynamics (SPH) method is used to model the compressible fluid, the natural coordinate formulation (NCF) and absolute nodal coordinate formulation (ANCF) are used to model the rigid and flexible bodies, respectively. In order to model the compressible fluid properly and efficiently via SPH method, three measures are taken as follows. The first is to use the Riemann solver to cope with the fluid compressibility, the second is to define virtual particles of SPH to model the dynamic interaction between the fluid and the multibody system, and the third is to impose the boundary conditions of periodical inflow and outflow to reduce the number of SPH particles involved in the computation process. Afterwards, a parallel computation strategy is proposed based on the graphics processing unit (GPU) to detect the neighboring SPH particles and to solve the dynamic equations of SPH particles in order to improve the computation efficiency. Meanwhile, the generalized-alpha algorithm is used to solve the dynamic equations of the multibody system. Finally, four case studies are given to validate the proposed parallel computation approach.
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmalz, Mark S

2011-07-24

Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations.

DTIC Science & Technology

1994-11-01

inside wind musical instruments. Typical simulations achieve $80\\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed...TERMS AI, MIT, Artificial Intelligence, Distributed Computing, Workstation Cluster, Network, Fluid Dynamics, Musical Instruments 17. SECURITY...for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP
Automating the parallel processing of fluid and structural dynamics calculations

NASA Technical Reports Server (NTRS)

Arpasi, Dale J.; Cole, Gary L.

1987-01-01

The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Computer program MCAP-TOSS calculates steady-state fluid dynamics of coolant in parallel channels and temperature distribution in surrounding heat-generating solid

NASA Technical Reports Server (NTRS)

Lee, A. Y.

1967-01-01

Computer program calculates the steady state fluid distribution, temperature rise, and pressure drop of a coolant, the material temperature distribution of a heat generating solid, and the heat flux distributions at the fluid-solid interfaces. It performs the necessary iterations automatically within the computer, in one machine run.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.

1999-10-14

Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less
Parallel-Processing Test Bed For Simulation Software

NASA Technical Reports Server (NTRS)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Parallel computation of three-dimensional aeroelastic fluid-structure interaction

NASA Astrophysics Data System (ADS)

Sadeghi, Mani

This dissertation presents a numerical method for the parallel computation of aeroelasticity (ParCAE). A flow solver is coupled to a structural solver by use of a fluid-structure interface method. The integration of the three-dimensional unsteady Navier-Stokes equations is performed in the time domain, simultaneously to the integration of a modal three-dimensional structural model. The flow solution is accelerated by using a multigrid method and a parallel multiblock approach. Fluid-structure coupling is achieved by subiteration. A grid-deformation algorithm is developed to interpolate the deformation of the structural boundaries onto the flow grid. The code is formulated to allow application to general, three-dimensional, complex configurations with multiple independent structures. Computational results are presented for various configurations, such as turbomachinery blade rows and aircraft wings. Investigations are performed on vortex-induced vibrations, effects of cascade mistuning on flutter, and cases of nonlinear cascade and wing flutter.
Aeroelasticity of wing and wing-body configurations on parallel computers

NASA Technical Reports Server (NTRS)

Byun, Chansup

1995-01-01

The objective of this research is to develop computationally efficient methods for solving aeroelasticity problems on parallel computers. Both uncoupled and coupled methods are studied in this research. For the uncoupled approach, the conventional U-g method is used to determine the flutter boundary. The generalized aerodynamic forces required are obtained by the pulse transfer-function analysis method. For the coupled approach, the fluid-structure interaction is obtained by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.

Hierarchial parallel computer architecture defined by computational multidisciplinary mechanics

NASA Technical Reports Server (NTRS)

Padovan, Joe; Gute, Doug; Johnson, Keith

1989-01-01

The goal is to develop an architecture for parallel processors enabling optimal handling of multi-disciplinary computation of fluid-solid simulations employing finite element and difference schemes. The goals, philosphical and modeling directions, static and dynamic poly trees, example problems, interpolative reduction, the impact on solvers are shown in viewgraph form.
Parallelization of implicit finite difference schemes in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel

1990-01-01

Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
Implementation of a 3D mixing layer code on parallel computers

NASA Technical Reports Server (NTRS)

Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.

1995-01-01

This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.
Box schemes and their implementation on the iPSC/860

NASA Technical Reports Server (NTRS)

Chattot, J. J.; Merriam, M. L.

1991-01-01

Research on algoriths for efficiently solving fluid flow problems on massively parallel computers is continued in the present paper. Attention is given to the implementation of a box scheme on the iPSC/860, a massively parallel computer with a peak speed of 10 Gflops and a memory of 128 Mwords. A domain decomposition approach to parallelism is used.
3D Parallel Multigrid Methods for Real-Time Fluid Simulation

NASA Astrophysics Data System (ADS)

Wan, Feifei; Yin, Yong; Zhang, Suiyu

2018-03-01

The multigrid method is widely used in fluid simulation because of its strong convergence. In addition to operating accuracy, operational efficiency is also an important factor to consider in order to enable real-time fluid simulation in computer graphics. For this problem, we compared the performance of the Algebraic Multigrid and the Geometric Multigrid in the V-Cycle and Full-Cycle schemes respectively, and analyze the convergence and speed of different methods. All the calculations are done on the parallel computing of GPU in this paper. Finally, we experiment with the 3D-grid for each scale, and give the exact experimental results.
Lattice Boltzmann computation of creeping fluid flow in roll-coating applications

NASA Astrophysics Data System (ADS)

Rajan, Isac; Kesana, Balashanker; Perumal, D. Arumuga

2018-04-01

Lattice Boltzmann Method (LBM) has advanced as a class of Computational Fluid Dynamics (CFD) methods used to solve complex fluid systems and heat transfer problems. It has ever-increasingly attracted the interest of researchers in computational physics to solve challenging problems of industrial and academic importance. In this current study, LBM is applied to simulate the creeping fluid flow phenomena commonly encountered in manufacturing technologies. In particular, we apply this novel method to simulate the fluid flow phenomena associated with the "meniscus roll coating" application. This prevalent industrial problem encountered in polymer processing and thin film coating applications is modelled as standard lid-driven cavity problem to which creeping flow analysis is applied. This incompressible viscous flow problem is studied in various speed ratios, the ratio of upper to lower lid speed in two different configurations of lid movement - parallel and anti-parallel wall motion. The flow exhibits interesting patterns which will help in design of roll coaters.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)

2001-01-01

A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers

NASA Technical Reports Server (NTRS)

Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)

1997-01-01

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids

NASA Astrophysics Data System (ADS)

Ma, Xinrong; Duan, Zhijian

2018-04-01

High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Computational fluid dynamics study of the end-side and sequential coronary artery bypass anastomoses in a native coronary occlusion model.

PubMed

Matsuura, Kaoru; Jin, Wei Wei; Liu, Hao; Matsumiya, Goro

2018-04-01

The objective of this study was to evaluate the haemodynamic patterns in each anastomosis fashion using a computational fluid dynamic study in a native coronary occlusion model. Fluid dynamic computations were carried out with ANSYS CFX (ANSYS Inc., Canonsburg, PA, USA) software. The incision lengths for parallel and diamond anastomoses were fixed at 2 mm. Native vessels were set to be totally occluded. The diameter of both the native and graft vessels was set to be 2 mm. The inlet boundary condition was set by a sample of the transient time flow measurement which was measured intraoperatively. The diamond anastomosis was observed to reduce flow to the native outlet and increase flow to the bypass outlet; the opposite was observed in the parallel anastomosis. Total energy efficiency was higher in the diamond anastomosis than the parallel anastomosis. Wall shear stress was higher in the diamond anastomosis than in the parallel anastomosis; it was the highest at the top of the outlet. A high oscillatory shear index was observed at the bypass inlet in the parallel anastomosis and at the native inlet in the diamond anastomosis. The diamond sequential anastomosis would be an effective option for multiple sequential bypasses because of the better flow to the bypass outlet than with the parallel anastomosis. However, flow competition should be kept in mind while using the diamond anastomosis for moderately stenotic vessels because of worsened flow to the native outlet. Care should be taken to ensure that the fluid dynamics patterns are optimal and prevent future native and bypass vessel disease progression.
Application of parallel distributed Lagrange multiplier technique to simulate coupled Fluid-Granular flows in pipes with varying Cross-Sectional area

DOE PAGES

Kanarska, Yuliya; Walton, Otis

2015-11-30

Fluid-granular flows are common phenomena in nature and industry. Here, an efficient computational technique based on the distributed Lagrange multiplier method is utilized to simulate complex fluid-granular flows. Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. The particle–particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEMmore » method with some modifications using the volume of an overlapping region as an input to the contact forces. Here, a parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library.« less
Performance of a parallel code for the Euler equations on hypercube computers

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.

1990-01-01

The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.
Visualization of unsteady computational fluid dynamics

NASA Astrophysics Data System (ADS)

Haimes, Robert

1994-11-01

A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
Visualization of unsteady computational fluid dynamics

NASA Technical Reports Server (NTRS)

Haimes, Robert

1994-01-01

A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
High-performance computational fluid dynamics: a custom-code approach

NASA Astrophysics Data System (ADS)

Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.

2016-07-01

We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
A parallel offline CFD and closed-form approximation strategy for computationally efficient analysis of complex fluid flows

NASA Astrophysics Data System (ADS)

Allphin, Devin

Computational fluid dynamics (CFD) solution approximations for complex fluid flow problems have become a common and powerful engineering analysis technique. These tools, though qualitatively useful, remain limited in practice by their underlying inverse relationship between simulation accuracy and overall computational expense. While a great volume of research has focused on remedying these issues inherent to CFD, one traditionally overlooked area of resource reduction for engineering analysis concerns the basic definition and determination of functional relationships for the studied fluid flow variables. This artificial relationship-building technique, called meta-modeling or surrogate/offline approximation, uses design of experiments (DOE) theory to efficiently approximate non-physical coupling between the variables of interest in a fluid flow analysis problem. By mathematically approximating these variables, DOE methods can effectively reduce the required quantity of CFD simulations, freeing computational resources for other analytical focuses. An idealized interpretation of a fluid flow problem can also be employed to create suitably accurate approximations of fluid flow variables for the purposes of engineering analysis. When used in parallel with a meta-modeling approximation, a closed-form approximation can provide useful feedback concerning proper construction, suitability, or even necessity of an offline approximation tool. It also provides a short-circuit pathway for further reducing the overall computational demands of a fluid flow analysis, again freeing resources for otherwise unsuitable resource expenditures. To validate these inferences, a design optimization problem was presented requiring the inexpensive estimation of aerodynamic forces applied to a valve operating on a simulated piston-cylinder heat engine. The determination of these forces was to be found using parallel surrogate and exact approximation methods, thus evidencing the comparative benefits of this technique. For the offline approximation, latin hypercube sampling (LHS) was used for design space filling across four (4) independent design variable degrees of freedom (DOF). Flow solutions at the mapped test sites were converged using STAR-CCM+ with aerodynamic forces from the CFD models then functionally approximated using Kriging interpolation. For the closed-form approximation, the problem was interpreted as an ideal 2-D converging-diverging (C-D) nozzle, where aerodynamic forces were directly mapped by application of the Euler equation solutions for isentropic compression/expansion. A cost-weighting procedure was finally established for creating model-selective discretionary logic, with a synthesized parallel simulation resource summary provided.
Computation of Coupled Thermal-Fluid Problems in Distributed Memory Environment

NASA Technical Reports Server (NTRS)

Wei, H.; Shang, H. M.; Chen, Y. S.

2001-01-01

The thermal-fluid coupling problems are very important to aerospace and engineering applications. Instead of analyzing heat transfer and fluid flow separately, this study merged two well-accepted engineering solution methods, SINDA for thermal analysis and FDNS for fluid flow simulation, into a unified multi-disciplinary thermal fluid prediction method. A fully conservative patched grid interface algorithm for arbitrary two-dimensional and three-dimensional geometry has been developed. The state-of-the-art parallel computing concept was used to couple SINDA and FDNS for the communication of boundary conditions through PVM (Parallel Virtual Machine) libraries. Therefore, the thermal analysis performed by SINDA and the fluid flow calculated by FDNS are fully coupled to obtain steady state or transient solutions. The natural convection between two thick-walled eccentric tubes was calculated and the predicted results match the experiment data perfectly. A 3-D rocket engine model and a real 3-D SSME geometry were used to test the current model, and the reasonable temperature field was obtained.

FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics

NASA Astrophysics Data System (ADS)

Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio

2016-12-01

Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1996-01-01

This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures

NASA Astrophysics Data System (ADS)

Biazewicz, Marek; Kurowski, Krzysztof; Ludwiczak, Bogdan; Napieraia, Krystyna

2010-09-01

Computational Fluid Dynamics (CFD) is one of the branches of fluid mechanics, which uses numerical methods and algorithms to solve and analyze fluid flows. CFD is used in various domains, such as oil and gas reservoir uncertainty analysis, aerodynamic body shapes optimization (e.g. planes, cars, ships, sport helmets, skis), natural phenomena analysis, numerical simulation for weather forecasting or realistic visualizations. CFD problem is very complex and needs a lot of computational power to obtain the results in a reasonable time. We have implemented a parallel application for two-dimensional CFD simulation with a free surface approximation (MAC method) using new hardware architectures, in particular multi-GPU and hybrid computing environments. For this purpose we decided to use NVIDIA graphic cards with CUDA environment due to its simplicity of programming and good computations performance. We used finite difference discretization of Navier-Stokes equations, where fluid is propagated over an Eulerian Grid. In this model, the behavior of the fluid inside the cell depends only on the properties of local, surrounding cells, therefore it is well suited for the GPU-based architecture. In this paper we demonstrate how to use efficiently the computing power of GPUs for CFD. Additionally, we present some best practices to help users analyze and improve the performance of CFD applications executed on GPU. Finally, we discuss various challenges around the multi-GPU implementation on the example of matrix multiplication.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)

2002-01-01

A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU

NASA Astrophysics Data System (ADS)

Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha

2018-03-01

Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.
Development and parallelization of a direct numerical simulation to study the formation and transport of nanoparticle clusters in a viscous fluid

NASA Astrophysics Data System (ADS)

Sloan, Gregory James

The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Internal fluid mechanics research on supercomputers for aerospace propulsion systems

NASA Technical Reports Server (NTRS)

Miller, Brent A.; Anderson, Bernhard H.; Szuch, John R.

1988-01-01

The Internal Fluid Mechanics Division of the NASA Lewis Research Center is combining the key elements of computational fluid dynamics, aerothermodynamic experiments, and advanced computational technology to bring internal computational fluid mechanics (ICFM) to a state of practical application for aerospace propulsion systems. The strategies used to achieve this goal are to: (1) pursue an understanding of flow physics, surface heat transfer, and combustion via analysis and fundamental experiments, (2) incorporate improved understanding of these phenomena into verified 3-D CFD codes, and (3) utilize state-of-the-art computational technology to enhance experimental and CFD research. Presented is an overview of the ICFM program in high-speed propulsion, including work in inlets, turbomachinery, and chemical reacting flows. Ongoing efforts to integrate new computer technologies, such as parallel computing and artificial intelligence, into high-speed aeropropulsion research are described.
Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

NASA Technical Reports Server (NTRS)

Weed, Richard Allen; Sankar, L. N.

1994-01-01

An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
Application of computational physics within Northrop

NASA Technical Reports Server (NTRS)

George, M. W.; Ling, R. T.; Mangus, J. F.; Thompkins, W. T.

1987-01-01

An overview of Northrop programs in computational physics is presented. These programs depend on access to today's supercomputers, such as the Numerical Aerodynamical Simulator (NAS), and future growth on the continuing evolution of computational engines. Descriptions here are concentrated on the following areas: computational fluid dynamics (CFD), computational electromagnetics (CEM), computer architectures, and expert systems. Current efforts and future directions in these areas are presented. The impact of advances in the CFD area is described, and parallels are drawn to analagous developments in CEM. The relationship between advances in these areas and the development of advances (parallel) architectures and expert systems is also presented.
A multiarchitecture parallel-processing development environment

NASA Technical Reports Server (NTRS)

Townsend, Scott; Blech, Richard; Cole, Gary

1993-01-01

A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
Hybrid Parallelization of Adaptive MHD-Kinetic Module in Multi-Scale Fluid-Kinetic Simulation Suite

DOE PAGES

Borovikov, Sergey; Heerikhuisen, Jacob; Pogorelov, Nikolai

2013-04-01

The Multi-Scale Fluid-Kinetic Simulation Suite has a computational tool set for solving partially ionized flows. In this paper we focus on recent developments of the kinetic module which solves the Boltzmann equation using the Monte-Carlo method. The module has been recently redesigned to utilize intra-node hybrid parallelization. We describe in detail the redesign process, implementation issues, and modifications made to the code. Finally, we conduct a performance analysis.
Computational Performance of Intel MIC, Sandy Bridge, and GPU Architectures: Implementation of a 1D c++/OpenMP Electrostatic Particle-In-Cell Code

DTIC Science & Technology

2014-05-01

fusion, space and astrophysical plasmas, but still the general picture can be presented quite well with the fluid approach [6, 7]. The microscopic...purpose computing CPU for algorithms where processing of large blocks of data is done in parallel. The reason for that is the GPU’s highly effective...parallel structure. Most of the image and video processing computations involve heavy matrix and vector op- erations over large amounts of data and
Implementation of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Research in Parallel Algorithms and Software for Computational Aerosciences

DOT National Transportation Integrated Search

1996-04-01

Phase I is complete for the development of a Computational Fluid Dynamics : with automatic grid generation and adaptation for the Euler : analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian : grid code developed at Lockheed...
High order parallel numerical schemes for solving incompressible flows

NASA Technical Reports Server (NTRS)

Lin, Avi; Milner, Edward J.; Liou, May-Fun; Belch, Richard A.

1992-01-01

The use of parallel computers for numerically solving flow fields has gained much importance in recent years. This paper introduces a new high order numerical scheme for computational fluid dynamics (CFD) specifically designed for parallel computational environments. A distributed MIMD system gives the flexibility of treating different elements of the governing equations with totally different numerical schemes in different regions of the flow field. The parallel decomposition of the governing operator to be solved is the primary parallel split. The primary parallel split was studied using a hypercube like architecture having clusters of shared memory processors at each node. The approach is demonstrated using examples of simple steady state incompressible flows. Future studies should investigate the secondary split because, depending on the numerical scheme that each of the processors applies and the nature of the flow in the specific subdomain, it may be possible for a processor to seek better, or higher order, schemes for its particular subcase.
The Advanced Software Development and Commercialization Project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gallopoulos, E.; Canfield, T.R.; Minkoff, M.

1990-09-01

This is the first of a series of reports pertaining to progress in the Advanced Software Development and Commercialization Project, a joint collaborative effort between the Center for Supercomputing Research and Development of the University of Illinois and the Computing and Telecommunications Division of Argonne National Laboratory. The purpose of this work is to apply techniques of parallel computing that were pioneered by University of Illinois researchers to mature computational fluid dynamics (CFD) and structural dynamics (SD) computer codes developed at Argonne. The collaboration in this project will bring this unique combination of expertise to bear, for the first time,more » on industrially important problems. By so doing, it will expose the strengths and weaknesses of existing techniques for parallelizing programs and will identify those problems that need to be solved in order to enable wide spread production use of parallel computers. Secondly, the increased efficiency of the CFD and SD codes themselves will enable the simulation of larger, more accurate engineering models that involve fluid and structural dynamics. In order to realize the above two goals, we are considering two production codes that have been developed at ANL and are widely used by both industry and Universities. These are COMMIX and WHAMS-3D. The first is a computational fluid dynamics code that is used for both nuclear reactor design and safety and as a design tool for the casting industry. The second is a three-dimensional structural dynamics code used in nuclear reactor safety as well as crashworthiness studies. These codes are currently available for both sequential and vector computers only. Our main goal is to port and optimize these two codes on shared memory multiprocessors. In so doing, we shall establish a process that can be followed in optimizing other sequential or vector engineering codes for parallel processors.« less
Tensor Arithmetic, Geometric and Mathematic Principles of Fluid Mechanics in Implementation of Direct Computational Experiments

NASA Astrophysics Data System (ADS)

Bogdanov, Alexander; Khramushin, Vasily

2016-02-01

The architecture of a digital computing system determines the technical foundation of a unified mathematical language for exact arithmetic-logical description of phenomena and laws of continuum mechanics for applications in fluid mechanics and theoretical physics. The deep parallelization of the computing processes results in functional programming at a new technological level, providing traceability of the computing processes with automatic application of multiscale hybrid circuits and adaptive mathematical models for the true reproduction of the fundamental laws of physics and continuum mechanics.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

Algorithmic trends in computational fluid dynamics; The Institute for Computer Applications in Science and Engineering (ICASE)/LaRC Workshop, NASA Langley Research Center, Hampton, VA, US, Sep. 15-17, 1991

NASA Technical Reports Server (NTRS)

Hussaini, M. Y. (Editor); Kumar, A. (Editor); Salas, M. D. (Editor)

1993-01-01

The purpose here is to assess the state of the art in the areas of numerical analysis that are particularly relevant to computational fluid dynamics (CFD), to identify promising new developments in various areas of numerical analysis that will impact CFD, and to establish a long-term perspective focusing on opportunities and needs. Overviews are given of discretization schemes, computational fluid dynamics, algorithmic trends in CFD for aerospace flow field calculations, simulation of compressible viscous flow, and massively parallel computation. Also discussed are accerelation methods, spectral and high-order methods, multi-resolution and subcell resolution schemes, and inherently multidimensional schemes.
Parallel programming of industrial applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heroux, M; Koniges, A; Simon, H

1998-07-21

In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from thesemore » applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).« less
GPU accelerated study of heat transfer and fluid flow by lattice Boltzmann method on CUDA

NASA Astrophysics Data System (ADS)

Ren, Qinlong

Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1,000 US dollars. The release of NVIDIA's CUDA architecture which includes both hardware and programming environment in 2007 makes GPU computing attractive. Due to its highly parallel nature, lattice Boltzmann method is successfully ported into GPU with a performance benefit during the recent years. In the current work, LBM CUDA code is developed for different fluid flow and heat transfer problems. In this dissertation, lattice Boltzmann method and immersed boundary method are used to study natural convection in an enclosure with an array of conduting obstacles, double-diffusive convection in a vertical cavity with Soret and Dufour effects, PCM melting process in a latent heat thermal energy storage system with internal fins, mixed convection in a lid-driven cavity with a sinusoidal cylinder, and AC electrothermal pumping in microfluidic systems on a CUDA computational platform. It is demonstrated that LBM is an efficient method to simulate complex heat transfer problems using GPU on CUDA.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Earth Sciences Division; Zhang, Keni; Zhang, Keni

TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
Implementation of NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry

2000-01-01

A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal/Fluid Team

SIERRA/Aero is a compressible fluid dynamics program intended to solve a wide variety compressible fluid flows including transonic and hypersonic problems. This document describes the commands for assembling a fluid model for analysis with this module, henceforth referred to simply as Aero for brevity. Aero is an application developed using the SIERRA Toolkit (STK). The intent of STK is to provide a set of tools for handling common tasks that programmers encounter when developing a code for numerical simulation. For example, components of STK provide field allocation and management, and parallel input/output of field and mesh data. These services alsomore » allow the development of coupled mechanics analysis software for a massively parallel computing environment. In the definitions of the commands that follow, the term Real_Max denotes the largest floating point value that can be represented on a given computer. Int_Max is the largest such integer value.« less
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.

1994-01-01

This research program deals with the application of high-performance computing methods for the analysis of complete jet engines. We have entitled this program by applying the two dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition, and solution capabilities were successfully tested. We then focused attention on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion that results from these structural displacements. This is treated by a new arbitrary Lagrangian-Eulerian (ALE) technique that models the fluid mesh motion as that of a fictitious mass-spring network. New partitioned analysis procedures to treat this coupled three-component problem are developed. These procedures involved delayed corrections and subcycling. Preliminary results on the stability, accuracy, and MPP computational efficiency are reported.
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Chen, P.-S.; Gumaste, U.; Leoinne, M.; Stern, P.

1995-01-01

This research program deals with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a by-pass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by an ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled 3-component problem were developed in 1994. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 has been developed. It is planned to use the steady-state global solution provided by ENG10 as input to a localized three-dimensional FSI analysis for engine regions where aeroelastic effects may be important.
Tensor methodology and computational geometry in direct computational experiments in fluid mechanics

NASA Astrophysics Data System (ADS)

Degtyarev, Alexander; Khramushin, Vasily; Shichkina, Julia

2017-07-01

The paper considers a generalized functional and algorithmic construction of direct computational experiments in fluid dynamics. Notation of tensor mathematics is naturally embedded in the finite - element operation in the construction of numerical schemes. Large fluid particle, which have a finite size, its own weight, internal displacement and deformation is considered as an elementary computing object. Tensor representation of computational objects becomes strait linear and uniquely approximation of elementary volumes and fluid particles inside them. The proposed approach allows the use of explicit numerical scheme, which is an important condition for increasing the efficiency of the algorithms developed by numerical procedures with natural parallelism. It is shown that advantages of the proposed approach are achieved among them by considering representation of large particles of a continuous medium motion in dual coordinate systems and computing operations in the projections of these two coordinate systems with direct and inverse transformations. So new method for mathematical representation and synthesis of computational experiment based on large particle method is proposed.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools

NASA Technical Reports Server (NTRS)

Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
Overview 1993: Computational applications

NASA Technical Reports Server (NTRS)

Benek, John A.

1993-01-01

Computational applications include projects that apply or develop computationally intensive computer programs. Such programs typically require supercomputers to obtain solutions in a timely fashion. This report describes two CSTAR projects involving Computational Fluid Dynamics (CFD) technology. The first, the Parallel Processing Initiative, is a joint development effort and the second, the Chimera Technology Development, is a transfer of government developed technology to American industry.
Adaptive particle-based pore-level modeling of incompressible fluid flow in porous media: a direct and parallel approach

NASA Astrophysics Data System (ADS)

Ovaysi, S.; Piri, M.

2009-12-01

We present a three-dimensional fully dynamic parallel particle-based model for direct pore-level simulation of incompressible viscous fluid flow in disordered porous media. The model was developed from scratch and is capable of simulating flow directly in three-dimensional high-resolution microtomography images of naturally occurring or man-made porous systems. It reads the images as input where the position of the solid walls are given. The entire medium, i.e., solid and fluid, is then discretized using particles. The model is based on Moving Particle Semi-implicit (MPS) technique. We modify this technique in order to improve its stability. The model handles highly irregular fluid-solid boundaries effectively. It takes into account viscous pressure drop in addition to the gravity forces. It conserves mass and can automatically detect any false connectivity with fluid particles in the neighboring pores and throats. It includes a sophisticated algorithm to automatically split and merge particles to maintain hydraulic connectivity of extremely narrow conduits. Furthermore, it uses novel methods to handle particle inconsistencies and open boundaries. To handle the computational load, we present a fully parallel version of the model that runs on distributed memory computer clusters and exhibits excellent scalability. The model is used to simulate unsteady-state flow problems under different conditions starting from straight noncircular capillary tubes with different cross-sectional shapes, i.e., circular/elliptical, square/rectangular and triangular cross-sections. We compare the predicted dimensionless hydraulic conductances with the data available in the literature and observe an excellent agreement. We then test the scalability of our parallel model with two samples of an artificial sandstone, samples A and B, with different volumes and different distributions (non-uniform and uniform) of solid particles among the processors. An excellent linear scalability is obtained for sample B that has more uniform distribution of solid particles leading to a superior load balancing. The model is then used to simulate fluid flow directly in REV size three-dimensional x-ray images of a naturally occurring sandstone. We analyze the quality and consistency of the predicted flow behavior and calculate absolute permeability, which compares well with the available network modeling and Lattice-Boltzmann permeabilities available in the literature for the same sandstone. We show that the model conserves mass very well and is stable computationally even at very narrow fluid conduits. The transient- and the steady-state fluid flow patterns are presented as well as the steady-state flow rates to compute absolute permeability. Furthermore, we discuss the vital role of our adaptive particle resolution scheme in preserving the original pore connectivity of the samples and their narrow channels through splitting and merging of fluid particles.
Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

NASA Astrophysics Data System (ADS)

Marx, Alain; Lütjens, Hinrich

2017-03-01

A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
Optimising the Parallelisation of OpenFOAM Simulations

DTIC Science & Technology

2014-06-01

UNCLASSIFIED UNCLASSIFIED Optimising the Parallelisation of OpenFOAM Simulations Shannon Keough Maritime Division Defence...Science and Technology Organisation DSTO-TR-2987 ABSTRACT The OpenFOAM computational fluid dynamics toolbox allows parallel computation of...performance of a given high performance computing cluster with several OpenFOAM cases, running using a combination of MPI libraries and corresponding MPI
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Summary of research in applied mathematics, numerical analysis, and computer sciences

NASA Technical Reports Server (NTRS)

1986-01-01

The major categories of current ICASE research programs addressed include: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effective numerical methods; computational problems in engineering and physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and computer systems and software, especially vector and parallel computers.
Finite Element Modeling of Coupled Flexible Multibody Dynamics and Liquid Sloshing

DTIC Science & Technology

2006-09-01

tanks is presented. The semi-discrete combined solid and fluid equations of motions are integrated using a time- accurate parallel explicit solver...Incompressible fluid flow in a moving/deforming container including accurate modeling of the free-surface, turbulence, and viscous effects ...paper, a single computational code which uses a time- accurate explicit solution procedure is used to solve both the solid and fluid equations of
Distributed-Memory Computing With the Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA)

NASA Technical Reports Server (NTRS)

Riley, Christopher J.; Cheatwood, F. McNeil

1997-01-01

The Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA), a Navier-Stokes solver, has been modified for use in a parallel, distributed-memory environment using the Message-Passing Interface (MPI) standard. A standard domain decomposition strategy is used in which the computational domain is divided into subdomains with each subdomain assigned to a processor. Performance is examined on dedicated parallel machines and a network of desktop workstations. The effect of domain decomposition and frequency of boundary updates on performance and convergence is also examined for several realistic configurations and conditions typical of large-scale computational fluid dynamic analysis.
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).

Computation of the Fluid and Optical Fields About the Stratospheric Observatory for Infrared Astronomy (SOFIA) and the Coupling of Fluids, Dynamics, and Control Laws on Parallel Computers

NASA Technical Reports Server (NTRS)

Atwood, Christopher A.

1993-01-01

The June 1992 to May 1993 grant NCC-2-677 provided for the continued demonstration of Computational Fluid Dynamics (CFD) as applied to the Stratospheric Observatory for Infrared Astronomy (SOFIA). While earlier grant years allowed validation of CFD through comparison against experiments, this year a new design proposal was evaluated. The new configuration would place the cavity aft of the wing, as opposed to the earlier baseline which was located immediately aft of the cockpit. This aft cavity placement allows for simplified structural and aircraft modification requirements, thus lowering the program cost of this national astronomy resource. Three appendices concerning this subject are presented.
Hall effects on unsteady MHD reactive flow of second grade fluid through porous medium in a rotating parallel plate channel

NASA Astrophysics Data System (ADS)

Krishna, M. Veera; Swarnalathamma, B. V.

2017-07-01

We considered the transient MHD flow of a reactive second grade fluid through porous medium between two infinitely long horizontal parallel plates when one of the plate is set into uniform accelerated motion in the presence of a uniform transverse magnetic field under Arrhenius reaction rate. The governing equations are solved by Laplace transform technique. The effects of the pertinent parameters on the velocity, temperature are discussed in detail. The shear stress and Nusselt number at the plates are also obtained analytically and computationally discussed with reference to governing parameters.
Visualization and Tracking of Parallel CFD Simulations

NASA Technical Reports Server (NTRS)

Vaziri, Arsi; Kremenetsky, Mark

1995-01-01

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
A heterogeneous computing environment for simulating astrophysical fluid flows

NASA Technical Reports Server (NTRS)

Cazes, J.

1994-01-01

In the Concurrent Computing Laboratory in the Department of Physics and Astronomy at Louisiana State University we have constructed a heterogeneous computing environment that permits us to routinely simulate complicated three-dimensional fluid flows and to readily visualize the results of each simulation via three-dimensional animation sequences. An 8192-node MasPar MP-1 computer with 0.5 GBytes of RAM provides 250 MFlops of execution speed for our fluid flow simulations. Utilizing the parallel virtual machine (PVM) language, at periodic intervals data is automatically transferred from the MP-1 to a cluster of workstations where individual three-dimensional images are rendered for inclusion in a single animation sequence. Work is underway to replace executions on the MP-1 with simulations performed on the 512-node CM-5 at NCSA and to simultaneously gain access to more potent volume rendering workstations.
Parallel Simulation of Three-Dimensional Free-Surface Fluid Flow Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

BAER,THOMAS A.; SUBIA,SAMUEL R.; SACKINGER,PHILIP A.

2000-01-18

We describe parallel simulations of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact lines. The Galerlin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of problem unknowns. Issues concerning the proper constraints along the solid-fluid dynamic contact line inmore » three dimensions are discussed. Parallel computations are carried out for an example taken from the coating flow industry, flow in the vicinity of a slot coater edge. This is a three-dimensional free-surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another part of the flow domain. Discussion focuses on parallel speedups for fixed problem size, a class of problems of immediate practical importance.« less
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1996-01-01

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1997-01-01

We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Automotive applications of superconductors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ginsberg, M.

1987-01-01

These proceedings compile papers on supercomputers in the automobile industry. Titles include: An automotive engineer's guide to the effective use of scalar, vector, and parallel computers; fluid mechanics, finite elements, and supercomputers; and Automotive crashworthiness performance on a supercomputer.
Exploiting Data Similarity to Reduce Memory Footprints

DTIC Science & Technology

2011-01-01

leslie3d Fortran Computational Fluid Dynamics (CFD) application 122. tachyon C Parallel Ray Tracing application 128.GAPgeofem C and Fortran Simulates...benefits most from SBLLmalloc; LAMMPS, which shows moderate similarity from primarily zero pages; and 122. tachyon , a parallel ray- tracing application...similarity across MPI tasks. They primarily are zero- pages although a small fraction (≈10%) are non-zero pages. 122. tachyon is an image rendering
Research in applied mathematics, numerical analysis, and computer science

NASA Technical Reports Server (NTRS)

1984-01-01

Research conducted at the Institute for Computer Applications in Science and Engineering (ICASE) in applied mathematics, numerical analysis, and computer science is summarized and abstracts of published reports are presented. The major categories of the ICASE research program are: (1) numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; (2) control and parameter identification; (3) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and (4) computer systems and software, especially vector and parallel computers.
Semiannual report, 1 April - 30 September 1991

NASA Technical Reports Server (NTRS)

1991-01-01

The major categories of the current Institute for Computer Applications in Science and Engineering (ICASE) research program are: (1) numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; (2) control and parameter identification problems, with emphasis on effective numerical methods; (3) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and (4) computer systems and software for parallel computers. Research in these areas is discussed.
LB3D: A parallel implementation of the Lattice-Boltzmann method for simulation of interacting amphiphilic fluids

NASA Astrophysics Data System (ADS)

Schmieschek, S.; Shamardin, L.; Frijters, S.; Krüger, T.; Schiller, U. D.; Harting, J.; Coveney, P. V.

2017-08-01

We introduce the lattice-Boltzmann code LB3D, version 7.1. Building on a parallel program and supporting tools which have enabled research utilising high performance computing resources for nearly two decades, LB3D version 7 provides a subset of the research code functionality as an open source project. Here, we describe the theoretical basis of the algorithm as well as computational aspects of the implementation. The software package is validated against simulations of meso-phases resulting from self-assembly in ternary fluid mixtures comprising immiscible and amphiphilic components such as water-oil-surfactant systems. The impact of the surfactant species on the dynamics of spinodal decomposition are tested and quantitative measurement of the permeability of a body centred cubic (BCC) model porous medium for a simple binary mixture is described. Single-core performance and scaling behaviour of the code are reported for simulations on current supercomputer architectures.
Numerical Modeling of Internal Flow Aerodynamics. Part 2: Unsteady Flows

DTIC Science & Technology

2004-01-01

fluid- structure coupling, ...). • • • • • Prediction: in this simulation, we want to assess the effect of a change in SRM geometry, propellant...surface reaches the structure ). The third characteristic time describes the slow evolution of the internal geometry. The last characteristic time...incorporates fluid- structure coupling facility, and is parallel. MOPTI® manages exchanges between two principal computational modules: • • A varying
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

NASA Technical Reports Server (NTRS)

Morgan, Philip E.

2004-01-01

This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Performance and Scalability of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1997-01-01

Applications are described of high-performance computing methods to the numerical simulation of complete jet engines. The methodology focuses on the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field elements. New partitioned analysis procedures to treat this coupled three-component problem were developed. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. The NASA-sponsored ENG10 program was used for the global steady state analysis of the whole engine. This program uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 was developed as well as the capability for the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames.
SediFoam: A general-purpose, open-source CFD-DEM solver for particle-laden flow with emphasis on sediment transport

NASA Astrophysics Data System (ADS)

Sun, Rui; Xiao, Heng

2016-04-01

With the growth of available computational resource, CFD-DEM (computational fluid dynamics-discrete element method) becomes an increasingly promising and feasible approach for the study of sediment transport. Several existing CFD-DEM solvers are applied in chemical engineering and mining industry. However, a robust CFD-DEM solver for the simulation of sediment transport is still desirable. In this work, the development of a three-dimensional, massively parallel, and open-source CFD-DEM solver SediFoam is detailed. This solver is built based on open-source solvers OpenFOAM and LAMMPS. OpenFOAM is a CFD toolbox that can perform three-dimensional fluid flow simulations on unstructured meshes; LAMMPS is a massively parallel DEM solver for molecular dynamics. Several validation tests of SediFoam are performed using cases of a wide range of complexities. The results obtained in the present simulations are consistent with those in the literature, which demonstrates the capability of SediFoam for sediment transport applications. In addition to the validation test, the parallel efficiency of SediFoam is studied to test the performance of the code for large-scale and complex simulations. The parallel efficiency tests show that the scalability of SediFoam is satisfactory in the simulations using up to O(107) particles.
NRMC - A GPU code for N-Reverse Monte Carlo modeling of fluids in confined media

NASA Astrophysics Data System (ADS)

Sánchez-Gil, Vicente; Noya, Eva G.; Lomba, Enrique

2017-08-01

NRMC is a parallel code for performing N-Reverse Monte Carlo modeling of fluids in confined media [V. Sánchez-Gil, E.G. Noya, E. Lomba, J. Chem. Phys. 140 (2014) 024504]. This method is an extension of the usual Reverse Monte Carlo method to obtain structural models of confined fluids compatible with experimental diffraction patterns, specifically designed to overcome the problem of slow diffusion that can appear under conditions of tight confinement. Most of the computational time in N-Reverse Monte Carlo modeling is spent in the evaluation of the structure factor for each trial configuration, a calculation that can be easily parallelized. Implementation of the structure factor evaluation in NVIDIA® CUDA so that the code can be run on GPUs leads to a speed up of up to two orders of magnitude.
Hall effects on MHD flow of heat generating/absorbing fluid through porous medium in a rotating parallel plate channel

NASA Astrophysics Data System (ADS)

Swarnalathamma, B. V.; Krishna, M. Veera

2017-07-01

We studied heat transfer on MHD convective flow of viscous electrically conducting heat generating/absorbing fluid through porous medium in a rotating channel under uniform transverse magnetic field normal to the channel and taking Hall current. The flow is governed by the Brinkman's model. The diagnostic solutions for the velocity and temperature are obtained by perturbation technique and computationally discussed with respect to flow parameters through the graphs. The skin friction and Nusselt number are also evaluated and computationally discussed with reference to pertinent parameters in detail.

Special purpose computer system with highly parallel pipelines for flow visualization using holography technology

NASA Astrophysics Data System (ADS)

Masuda, Nobuyuki; Sugie, Takashige; Ito, Tomoyoshi; Tanaka, Shinjiro; Hamada, Yu; Satake, Shin-ichi; Kunugi, Tomoaki; Sato, Kazuho

2010-12-01

We have designed a PC cluster system with special purpose computer boards for visualization of fluid flow using digital holographic particle tracking velocimetry (DHPTV). In this board, there is a Field Programmable Gate Array (FPGA) chip in which is installed a pipeline for calculating the intensity of an object from a hologram by fast Fourier transform (FFT). This cluster system can create 1024 reconstructed images from a 1024×1024-grid hologram in 0.77 s. It is expected that this system will contribute to the analysis of fluid flow using DHPTV.
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
Molecular dynamics studies of transport properties and equation of state of supercritical fluids

NASA Astrophysics Data System (ADS)

Nwobi, Obika C.

Many chemical propulsion systems operate with one or more of the reactants above the critical point in order to enhance their performance. Most of the computational fluid dynamics (CFD) methods used to predict these flows require accurate information on the transport properties and equation of state at these supercritical conditions. This work involves the determination of transport coefficients and equation of state of supercritical fluids by equilibrium molecular dynamics (MD) simulations on parallel computers using the Green-Kubo formulae and the virial equation of state, respectively. MD involves the solution of equations of motion of a system of molecules that interact with each other through an intermolecular potential. Provided that an accurate potential can be found for the system of interest, MD can be used regardless of the phase and thermodynamic conditions of the substances involved. The MD program uses the effective Lennard-Jones potential, with system sizes of 1000-1200 molecules and, simulations of 2,000,000 time-steps for computing transport coefficients and 200,000 time-steps for pressures. The computer code also uses linked cell lists for efficient sorting of molecules, periodic boundary conditions, and a modified velocity Verlet algorithm for particle displacement. Particle decomposition is used for distributing the molecules to different processors of a parallel computer. Simulations have been carried out on pure argon, nitrogen, oxygen and ethylene at various supercritical conditions, with self-diffusion coefficients, shear viscosity coefficients, thermal conductivity coefficients and pressures computed for most of the conditions. Results compare well with experimental and the National Institute of Standards and Technology (NIST) values. The results show that the number of molecules and the potential cut-off radius have no significant effect on the computed coefficients, while long-time integration is necessary for accurate determination of the coefficients.
NASA Exhibits

NASA Technical Reports Server (NTRS)

Deardorff, Glenn; Djomehri, M. Jahed; Freeman, Ken; Gambrel, Dave; Green, Bryan; Henze, Chris; Hinke, Thomas; Hood, Robert; Kiris, Cetin; Moran, Patrick;

2001-01-01

A series of NASA presentations for the Supercomputing 2001 conference are summarized. The topics include: (1) Mars Surveyor Landing Sites "Collaboratory"; (2) Parallel and Distributed CFD for Unsteady Flows with Moving Overset Grids; (3) IP Multicast for Seamless Support of Remote Science; (4) Consolidated Supercomputing Management Office; (5) Growler: A Component-Based Framework for Distributed/Collaborative Scientific Visualization and Computational Steering; (6) Data Mining on the Information Power Grid (IPG); (7) Debugging on the IPG; (8) Debakey Heart Assist Device: (9) Unsteady Turbopump for Reusable Launch Vehicle; (10) Exploratory Computing Environments Component Framework; (11) OVERSET Computational Fluid Dynamics Tools; (12) Control and Observation in Distributed Environments; (13) Multi-Level Parallelism Scaling on NASA's Origin 1024 CPU System; (14) Computing, Information, & Communications Technology; (15) NAS Grid Benchmarks; (16) IPG: A Large-Scale Distributed Computing and Data Management System; and (17) ILab: Parameter Study Creation and Submission on the IPG.

Coordinate Systems, Numerical Objects and Algorithmic Operations of Computational Experiment in Fluid Mechanics

NASA Astrophysics Data System (ADS)

Degtyarev, Alexander; Khramushin, Vasily

2016-02-01

The paper deals with the computer implementation of direct computational experiments in fluid mechanics, constructed on the basis of the approach developed by the authors. The proposed approach allows the use of explicit numerical scheme, which is an important condition for increasing the effciency of the algorithms developed by numerical procedures with natural parallelism. The paper examines the main objects and operations that let you manage computational experiments and monitor the status of the computation process. Special attention is given to a) realization of tensor representations of numerical schemes for direct simulation; b) realization of representation of large particles of a continuous medium motion in two coordinate systems (global and mobile); c) computing operations in the projections of coordinate systems, direct and inverse transformation in these systems. Particular attention is paid to the use of hardware and software of modern computer systems.
New cellular automaton model for magnetohydrodynamics

NASA Technical Reports Server (NTRS)

Chen, Hudong; Matthaeus, William H.

1987-01-01

A new type of two-dimensional cellular automation method is introduced for computation of magnetohydrodynamic fluid systems. Particle population is described by a 36-component tensor referred to a hexagonal lattice. By appropriate choice of the coefficients that control the modified streaming algorithm and the definition of the macroscopic fields, it is possible to compute both Lorentz-force and magnetic-induction effects. The method is local in the microscopic space and therefore suited to massively parallel computations.
New Computational Methods for the Prediction and Analysis of Helicopter Noise

NASA Technical Reports Server (NTRS)

Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

This paper describes several new methods to predict and analyze rotorcraft noise. These methods are: 1) a combined computational fluid dynamics and Kirchhoff scheme for far-field noise predictions, 2) parallel computer implementation of the Kirchhoff integrations, 3) audio and visual rendering of the computed acoustic predictions over large far-field regions, and 4) acoustic tracebacks to the Kirchhoff surface to pinpoint the sources of the rotor noise. The paper describes each method and presents sample results for three test cases. The first case consists of in-plane high-speed impulsive noise and the other two cases show idealized parallel and oblique blade-vortex interactions. The computed results show good agreement with available experimental data but convey much more information about the far-field noise propagation. When taken together, these new analysis methods exploit the power of new computer technologies and offer the potential to significantly improve our prediction and understanding of rotorcraft noise.
Modeling the nanoscale viscoelasticity of fluids by bridging non-Markovian fluctuating hydrodynamics and molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Voulgarakis, Nikolaos K.; Satish, Siddarth; Chu, Jhih-Wei

2009-12-01

A multiscale computational method is developed to model the nanoscale viscoelasticity of fluids by bridging non-Markovian fluctuating hydrodynamics (FHD) and molecular dynamics (MD) simulations. To capture the elastic responses that emerge at small length scales, we attach an additional rheological model parallel to the macroscopic constitutive equation of a fluid. The widely used linear Maxwell model is employed as a working choice; other models can be used as well. For a fluid that is Newtonian in the macroscopic limit, this approach results in a parallel Newtonian-Maxwell model. For water, argon, and an ionic liquid, the power spectrum of momentum field autocorrelation functions of the parallel Newtonian-Maxwell model agrees very well with those calculated from all-atom MD simulations. To incorporate thermal fluctuations, we generalize the equations of FHD to work with non-Markovian rheological models and colored noise. The fluctuating stress tensor (white noise) is integrated in time in the same manner as its dissipative counterpart and numerical simulations indicate that this approach accurately preserves the set temperature in a FHD simulation. By mapping position and velocity vectors in the molecular representation onto field variables, we bridge the non-Markovian FHD with atomistic MD simulations. Through this mapping, we quantitatively determine the transport coefficients of the parallel Newtonian-Maxwell model for water and argon from all-atom MD simulations. For both fluids, a significant enhancement in elastic responses is observed as the wave number of hydrodynamic modes is reduced to a few nanometers. The mapping from particle to field representations and the perturbative strategy of developing constitutive equations provide a useful framework for modeling the nanoscale viscoelasticity of fluids.
Tomographic methods in flow diagnostics

NASA Technical Reports Server (NTRS)

Decker, Arthur J.

1993-01-01

This report presents a viewpoint of tomography that should be well adapted to currently available optical measurement technology as well as the needs of computational and experimental fluid dynamists. The goals in mind are to record data with the fastest optical array sensors; process the data with the fastest parallel processing technology available for small computers; and generate results for both experimental and theoretical data. An in-depth example treats interferometric data as it might be recorded in an aeronautics test facility, but the results are applicable whenever fluid properties are to be measured or applied from projections of those properties. The paper discusses both computed and neural net calibration tomography. The report also contains an overview of key definitions and computational methods, key references, computational problems such as ill-posedness, artifacts, missing data, and some possible and current research topics.
Predicting Flows of Rarefied Gases

NASA Technical Reports Server (NTRS)

LeBeau, Gerald J.; Wilmoth, Richard G.

2005-01-01

DSMC Analysis Code (DAC) is a flexible, highly automated, easy-to-use computer program for predicting flows of rarefied gases -- especially flows of upper-atmospheric, propulsion, and vented gases impinging on spacecraft surfaces. DAC implements the direct simulation Monte Carlo (DSMC) method, which is widely recognized as standard for simulating flows at densities so low that the continuum-based equations of computational fluid dynamics are invalid. DAC enables users to model complex surface shapes and boundary conditions quickly and easily. The discretization of a flow field into computational grids is automated, thereby relieving the user of a traditionally time-consuming task while ensuring (1) appropriate refinement of grids throughout the computational domain, (2) determination of optimal settings for temporal discretization and other simulation parameters, and (3) satisfaction of the fundamental constraints of the method. In so doing, DAC ensures an accurate and efficient simulation. In addition, DAC can utilize parallel processing to reduce computation time. The domain decomposition needed for parallel processing is completely automated, and the software employs a dynamic load-balancing mechanism to ensure optimal parallel efficiency throughout the simulation.
Parallel Calculation of Sensitivity Derivatives for Aircraft Design using Automatic Differentiation

NASA Technical Reports Server (NTRS)

Bischof, c. H.; Green, L. L.; Haigler, K. J.; Knauff, T. L., Jr.

1994-01-01

Sensitivity derivative (SD) calculation via automatic differentiation (AD) typical of that required for the aerodynamic design of a transport-type aircraft is considered. Two ways of computing SD via code generated by the ADIFOR automatic differentiation tool are compared for efficiency and applicability to problems involving large numbers of design variables. A vector implementation on a Cray Y-MP computer is compared with a coarse-grained parallel implementation on an IBM SP1 computer, employing a Fortran M wrapper. The SD are computed for a swept transport wing in turbulent, transonic flow; the number of geometric design variables varies from 1 to 60 with coupling between a wing grid generation program and a state-of-the-art, 3-D computational fluid dynamics program, both augmented for derivative computation via AD. For a small number of design variables, the Cray Y-MP implementation is much faster. As the number of design variables grows, however, the IBM SP1 becomes an attractive alternative in terms of compute speed, job turnaround time, and total memory available for solutions with large numbers of design variables. The coarse-grained parallel implementation also can be moved easily to a network of workstations.
A parallelization method for time periodic steady state in simulation of radio frequency sheath dynamics

NASA Astrophysics Data System (ADS)

Kwon, Deuk-Chul; Shin, Sung-Sik; Yu, Dong-Hun

2017-10-01

In order to reduce the computing time in simulation of radio frequency (rf) plasma sources, various numerical schemes were developed. It is well known that the upwind, exponential, and power-law schemes can efficiently overcome the limitation on the grid size for fluid transport simulations of high density plasma discharges. Also, the semi-implicit method is a well-known numerical scheme to overcome on the simulation time step. However, despite remarkable advances in numerical techniques and computing power over the last few decades, efficient multi-dimensional modeling of low temperature plasma discharges has remained a considerable challenge. In particular, there was a difficulty on parallelization in time for the time periodic steady state problems such as capacitively coupled plasma discharges and rf sheath dynamics because values of plasma parameters in previous time step are used to calculate new values each time step. Therefore, we present a parallelization method for the time periodic steady state problems by using period-slices. In order to evaluate the efficiency of the developed method, one-dimensional fluid simulations are conducted for describing rf sheath dynamics. The result shows that speedup can be achieved by using a multithreading method.
Optimization of a new flow design for solid oxide cells using computational fluid dynamics modelling

NASA Astrophysics Data System (ADS)

Duhn, Jakob Dragsbæk; Jensen, Anker Degn; Wedel, Stig; Wix, Christian

2016-12-01

Design of a gas distributor to distribute gas flow into parallel channels for Solid Oxide Cells (SOC) is optimized, with respect to flow distribution, using Computational Fluid Dynamics (CFD) modelling. The CFD model is based on a 3d geometric model and the optimized structural parameters include the width of the channels in the gas distributor and the area in front of the parallel channels. The flow of the optimized design is found to have a flow uniformity index value of 0.978. The effects of deviations from the assumptions used in the modelling (isothermal and non-reacting flow) are evaluated and it is found that a temperature gradient along the parallel channels does not affect the flow uniformity, whereas a temperature difference between the channels does. The impact of the flow distribution on the maximum obtainable conversion during operation is also investigated and the obtainable overall conversion is found to be directly proportional to the flow uniformity. Finally the effect of manufacturing errors is investigated. The design is shown to be robust towards deviations from design dimensions of at least ±0.1 mm which is well within obtainable tolerances.
Micro-Macro Simulation of Viscoelastic Fluids in Three Dimensions

NASA Astrophysics Data System (ADS)

Rüttgers, Alexander; Griebel, Michael

2012-11-01

The development of the chemical industry resulted in various complex fluids that cannot be correctly described by classical fluid mechanics. For instance, this includes paint, engine oils with polymeric additives and toothpaste. We currently perform multiscale viscoelastic flow simulations for which we have coupled our three-dimensional Navier-Stokes solver NaSt3dGPF with the stochastic Brownian configuration field method on the micro-scale. In this method, we represent a viscoelastic fluid as a dumbbell system immersed in a three-dimensional Newtonian liquid which leads to a six-dimensional problem in space. The approach requires large computational resources and therefore depends on an efficient parallelisation strategy. Our flow solver is parallelised with a domain decomposition approach using MPI. It shows excellent scale-up results for up to 128 processors. In this talk, we present simulation results for viscoelastic fluids in square-square contractions due to their relevance for many engineering applications such as extrusion. Another aspect of the talk is the parallel implementation in NaSt3dGPF and the parallel scale-up and speed-up behaviour.
High Order Semi-Lagrangian Advection Scheme

NASA Astrophysics Data System (ADS)

Malaga, Carlos; Mandujano, Francisco; Becerra, Julian

2014-11-01

In most fluid phenomena, advection plays an important roll. A numerical scheme capable of making quantitative predictions and simulations must compute correctly the advection terms appearing in the equations governing fluid flow. Here we present a high order forward semi-Lagrangian numerical scheme specifically tailored to compute material derivatives. The scheme relies on the geometrical interpretation of material derivatives to compute the time evolution of fields on grids that deform with the material fluid domain, an interpolating procedure of arbitrary order that preserves the moments of the interpolated distributions, and a nonlinear mapping strategy to perform interpolations between undeformed and deformed grids. Additionally, a discontinuity criterion was implemented to deal with discontinuous fields and shocks. Tests of pure advection, shock formation and nonlinear phenomena are presented to show performance and convergence of the scheme. The high computational cost is considerably reduced when implemented on massively parallel architectures found in graphic cards. The authors acknowledge funding from Fondo Sectorial CONACYT-SENER Grant Number 42536 (DGAJ-SPI-34-170412-217).
A study of the compatibility of an existing CFD package with a broader class of material constitutions

NASA Technical Reports Server (NTRS)

French, K. W., Jr.

1985-01-01

The flexibility of the PHOENICS computational fluid dynamics package was assessed along two general avenues; parallel modeling and analog modeling. In parallel modeling the dependent and independent variables retain their identity within some scaling factors, even though the boundary conditions and especially the constitutive relations do not correspond to any realistic fluid dynamic situation. PHOENICS was used to generate a CFD model that should exhibit the physical anomalies of a granular medium and permit reasonable similarity with boundary conditions typical to membrane or porous piston loading. A considerable portion of the study was spent prying into the existing code with a prejudice toward rate type and disarming any inherent fluid behavior. The final stages of the study were directed at the more specific problem of multiaxis loading of cylindrical geometry with a concern for the appearance of bulging, cross slab shear failure modes.
Parallel computation of GA search for the artery shape determinants with CFD

NASA Astrophysics Data System (ADS)

Himeno, M.; Noda, S.; Fukasaku, K.; Himeno, R.

2010-06-01

We studied which factors play important role to determine the shape of arteries at the carotid artery bifurcation by performing multi-objective optimization with computation fluid dynamics (CFD) and the genetic algorithm (GA). To perform it, the most difficult problem is how to reduce turn-around time of the GA optimization with 3D unsteady computation of blood flow. We devised two levels of parallel computation method with the following features: level 1: parallel CFD computation with appropriate number of cores; level 2: parallel jobs generated by "master", which finds quickly available job cue and dispatches jobs, to reduce turn-around time. As a result, the turn-around time of one GA trial, which would have taken 462 days with one core, was reduced to less than two days on RIKEN supercomputer system, RICC, with 8192 cores. We performed a multi-objective optimization to minimize the maximum mean WSS and to minimize the sum of circumference for four different shapes and obtained a set of trade-off solutions for each shape. In addition, we found that the carotid bulb has the feature of the minimum local mean WSS and minimum local radius. We confirmed that our method is effective for examining determinants of artery shapes.
Application of a distributed network in computational fluid dynamic simulations

NASA Technical Reports Server (NTRS)

Deshpande, Manish; Feng, Jinzhang; Merkle, Charles L.; Deshpande, Ashish

1994-01-01

A general-purpose 3-D, incompressible Navier-Stokes algorithm is implemented on a network of concurrently operating workstations using parallel virtual machine (PVM) and compared with its performance on a CRAY Y-MP and on an Intel iPSC/860. The problem is relatively computationally intensive, and has a communication structure based primarily on nearest-neighbor communication, making it ideally suited to message passing. Such problems are frequently encountered in computational fluid dynamics (CDF), and their solution is increasingly in demand. The communication structure is explicitly coded in the implementation to fully exploit the regularity in message passing in order to produce a near-optimal solution. Results are presented for various grid sizes using up to eight processors.
Parallel-plate heat pipe apparatus having a shaped wick structure

DOEpatents

Rightley, Michael J.; Adkins, Douglas R.; Mulhall, James J.; Robino, Charles V.; Reece, Mark; Smith, Paul M.; Tigges, Chris P.

2004-12-07

A parallel-plate heat pipe is disclosed that utilizes a plurality of evaporator regions at locations where heat sources (e.g. semiconductor chips) are to be provided. A plurality of curvilinear capillary grooves are formed on one or both major inner surfaces of the heat pipe to provide an independent flow of a liquid working fluid to the evaporator regions to optimize heat removal from different-size heat sources and to mitigate the possibility of heat-source shadowing. The parallel-plate heat pipe has applications for heat removal from high-density microelectronics and laptop computers.
Drekar v.2.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seefeldt, Ben; Sondak, David; Hensinger, David M.

Drekar is an application code that solves partial differential equations for fluids that can be optionally coupled to electromagnetics. Drekar solves low-mach compressible and incompressible computational fluid dynamics (CFD), compressible and incompressible resistive magnetohydrodynamics (MHD), and multiple species plasmas interacting with electromagnetic fields. Drekar discretization technology includes continuous and discontinuous finite element formulations, stabilized finite element formulations, mixed integration finite element bases (nodal, edge, face, volume) and an initial arbitrary Lagrangian Eulerian (ALE) capability. Drekar contains the implementation of the discretized physics and leverages the open source Trilinos project for both parallel solver capabilities and general finite element discretization tools.more » The code will be released open source under a BSD license. The code is used for fundamental research for simulation of fluids and plasmas on high performance computing environments.« less

Parallel Multiscale Algorithms for Astrophysical Fluid Dynamics Simulations

NASA Technical Reports Server (NTRS)

Norman, Michael L.

1997-01-01

Our goal is to develop software libraries and applications for astrophysical fluid dynamics simulations in multidimensions that will enable us to resolve the large spatial and temporal variations that inevitably arise due to gravity, fronts and microphysical phenomena. The software must run efficiently on parallel computers and be general enough to allow the incorporation of a wide variety of physics. Cosmological structure formation with realistic gas physics is the primary application driver in this work. Accurate simulations of e.g. galaxy formation require a spatial dynamic range (i.e., ratio of system scale to smallest resolved feature) of 104 or more in three dimensions in arbitrary topologies. We take this as our technical requirement. We have achieved, and in fact, surpassed these goals.
A heterogeneous system based on GPU and multi-core CPU for real-time fluid and rigid body simulation

NASA Astrophysics Data System (ADS)

da Silva Junior, José Ricardo; Gonzalez Clua, Esteban W.; Montenegro, Anselmo; Lage, Marcos; Dreux, Marcelo de Andrade; Joselli, Mark; Pagliosa, Paulo A.; Kuryla, Christine Lucille

2012-03-01

Computational fluid dynamics in simulation has become an important field not only for physics and engineering areas but also for simulation, computer graphics, virtual reality and even video game development. Many efficient models have been developed over the years, but when many contact interactions must be processed, most models present difficulties or cannot achieve real-time results when executed. The advent of parallel computing has enabled the development of many strategies for accelerating the simulations. Our work proposes a new system which uses some successful algorithms already proposed, as well as a data structure organisation based on a heterogeneous architecture using CPUs and GPUs, in order to process the simulation of the interaction of fluids and rigid bodies. This successfully results in a two-way interaction between them and their surrounding objects. As far as we know, this is the first work that presents a computational collaborative environment which makes use of two different paradigms of hardware architecture for this specific kind of problem. Since our method achieves real-time results, it is suitable for virtual reality, simulation and video game fluid simulation problems.
Experimental Program to Stimulate Competitive Research (EPSCoR)

NASA Technical Reports Server (NTRS)

Dingerson, Michael R.

1997-01-01

Report includes: (1) CLUSTER: "Studies in Macromolecular Behavior in Microgravity Environment": The Role of Protein Oligomers in Protein Crystallization; Phase Separation Phenomena in Microgravity; Traveling Front Polymerizations; Investigating Mechanisms Affecting Phase Transition Response and Changes in Thermal Transport Properties in ER-Fluids under Normal and Microgravity Conditions. (2) CLUSTER: "Computational/Parallel Processing Studies": Flows in Local Chemical Equilibrium; A Computational Method for Solving Very Large Problems; Modeling of Cavitating Flows.
Quasi-one-dimensional compressible flow across face seals and narrow slots. 2: Computer program

NASA Technical Reports Server (NTRS)

Zuk, J.; Smith, P. J.

1972-01-01

A computer program is presented for compressible fluid flow with friction across face seals and through narrow slots. The computer program carries out a quasi-one-dimensional flow analysis which is valid for laminar and turbulent flows under both subsonic and choked flow conditions for parallel surfaces. The program is written in FORTRAN IV. The input and output variables are in either the International System of Units (SI) or the U.S. customary system.
Implementation of a fully-balanced periodic tridiagonal solver on a parallel distributed memory architecture

NASA Technical Reports Server (NTRS)

Eidson, T. M.; Erlebacher, G.

1994-01-01

While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.
Fluid Structure Interaction Techniques For Extrusion And Mixing Processes

NASA Astrophysics Data System (ADS)

Valette, Rudy; Vergnes, Bruno; Coupez, Thierry

2007-05-01

This work focuses on the development of numerical techniques devoted to the simulation of mixing processes of complex fluids such as twin-screw extrusion or batch mixing. In mixing process simulation, the absence of symmetry of the moving boundaries (the screws or the rotors) implies that their rigid body motion has to be taken into account by using a special treatment We therefore use a mesh immersion technique (MIT), which consists in using a P1+/P1-based (MINI-element) mixed finite element method for solving the velocity-pressure problem and then solving the problem in the whole barrel cavity by imposing a rigid motion (rotation) to nodes found located inside the so called immersed domain, each sub-domain (screw, rotor) being represented by a surface CAD mesh (or its mathematical equation in simple cases). The independent meshes are immersed into a unique background computational mesh by computing the distance function to their boundaries. Intersections of meshes are accounted for, allowing to compute a fill factor usable as for the VOF methodology. This technique, combined with the use of parallel computing, allows to compute the time-dependent flow of generalized Newtonian fluids including yield stress fluids in a complex system such as a twin screw extruder, including moving free surfaces, which are treated by a "level set" and Hamilton-Jacobi method.
First-order finite-Larmor-radius fluid modeling of tearing and relaxation in a plasma pincha)

NASA Astrophysics Data System (ADS)

King, J. R.; Sovinec, C. R.; Mirnov, V. V.

2012-05-01

Drift and Hall effects on magnetic tearing, island evolution, and relaxation in pinch configurations are investigated using a non-reduced first-order finite-Larmor-radius (FLR) fluid model with the nonideal magnetohydrodynamics (MHD) with rotation, open discussion (NIMROD) code [C.R. Sovinec and J. R. King, J. Comput. Phys. 229, 5803 (2010)]. An unexpected result with a uniform pressure profile is a drift effect that reduces the growth rate when the ion sound gyroradius (ρs) is smaller than the tearing-layer width. This drift is present only with warm-ion FLR modeling, and analytics show that it arises from ∇B and poloidal curvature represented in the Braginskii gyroviscous stress. Nonlinear single-helicity computations with experimentally relevant ρs values show that the warm-ion gyroviscous effects reduce saturated-island widths. Computations with multiple nonlinearly interacting tearing fluctuations find that m = 1 core-resonant-fluctuation amplitudes are reduced by a factor of two relative to single-fluid modeling by the warm-ion effects. These reduced core-resonant-fluctuation amplitudes compare favorably to edge coil measurements in the Madison Symmetric Torus (MST) reversed-field pinch [R. N. Dexter et al., Fusion Technol. 19, 131 (1991)]. The computations demonstrate that fluctuations induce both MHD- and Hall-dynamo emfs during relaxation events. The presence of a Hall-dynamo emf implies a fluctuation-induced Maxwell stress, and the simulation results show net transport of parallel momentum. The computed magnitude of force densities from the Maxwell and competing Reynolds stresses, and changes in the parallel flow profile, are qualitatively and semi-quantitatively similar to measurements during relaxation in MST.
First-order finite-Larmor-radius fluid modeling of tearing and relaxation in a plasma pinch

DOE Office of Scientific and Technical Information (OSTI.GOV)

King, J. R.; Tech-X Corporation, 5621 Arapahoe Ave., Suite A Boulder, Colorado 80303; Sovinec, C. R.

Drift and Hall effects on magnetic tearing, island evolution, and relaxation in pinch configurations are investigated using a non-reduced first-order finite-Larmor-radius (FLR) fluid model with the nonideal magnetohydrodynamics (MHD) with rotation, open discussion (NIMROD) code [C.R. Sovinec and J. R. King, J. Comput. Phys. 229, 5803 (2010)]. An unexpected result with a uniform pressure profile is a drift effect that reduces the growth rate when the ion sound gyroradius ({rho}{sub s}) is smaller than the tearing-layer width. This drift is present only with warm-ion FLR modeling, and analytics show that it arises from {nabla}B and poloidal curvature represented in themore » Braginskii gyroviscous stress. Nonlinear single-helicity computations with experimentally relevant {rho}{sub s} values show that the warm-ion gyroviscous effects reduce saturated-island widths. Computations with multiple nonlinearly interacting tearing fluctuations find that m = 1 core-resonant-fluctuation amplitudes are reduced by a factor of two relative to single-fluid modeling by the warm-ion effects. These reduced core-resonant-fluctuation amplitudes compare favorably to edge coil measurements in the Madison Symmetric Torus (MST) reversed-field pinch [R. N. Dexter et al., Fusion Technol. 19, 131 (1991)]. The computations demonstrate that fluctuations induce both MHD- and Hall-dynamo emfs during relaxation events. The presence of a Hall-dynamo emf implies a fluctuation-induced Maxwell stress, and the simulation results show net transport of parallel momentum. The computed magnitude of force densities from the Maxwell and competing Reynolds stresses, and changes in the parallel flow profile, are qualitatively and semi-quantitatively similar to measurements during relaxation in MST.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Song

CFD (Computational Fluid Dynamics) is a widely used technique in engineering design field. It uses mathematical methods to simulate and predict flow characteristics in a certain physical space. Since the numerical result of CFD computation is very hard to understand, VR (virtual reality) and data visualization techniques are introduced into CFD post-processing to improve the understandability and functionality of CFD computation. In many cases CFD datasets are very large (multi-gigabytes), and more and more interactions between user and the datasets are required. For the traditional VR application, the limitation of computing power is a major factor to prevent visualizing largemore » dataset effectively. This thesis presents a new system designing to speed up the traditional VR application by using parallel computing and distributed computing, and the idea of using hand held device to enhance the interaction between a user and VR CFD application as well. Techniques in different research areas including scientific visualization, parallel computing, distributed computing and graphical user interface designing are used in the development of the final system. As the result, the new system can flexibly be built on heterogeneous computing environment, dramatically shorten the computation time.« less
Large-Scale Distributed Computational Fluid Dynamics on the Information Power Grid Using Globus

NASA Technical Reports Server (NTRS)

Barnard, Stephen; Biswas, Rupak; Saini, Subhash; VanderWijngaart, Robertus; Yarrow, Maurice; Zechtzer, Lou; Foster, Ian; Larsson, Olle

1999-01-01

This paper describes an experiment in which a large-scale scientific application development for tightly-coupled parallel machines is adapted to the distributed execution environment of the Information Power Grid (IPG). A brief overview of the IPG and a description of the computational fluid dynamics (CFD) algorithm are given. The Globus metacomputing toolkit is used as the enabling device for the geographically-distributed computation. Modifications related to latency hiding and Load balancing were required for an efficient implementation of the CFD application in the IPG environment. Performance results on a pair of SGI Origin 2000 machines indicate that real scientific applications can be effectively implemented on the IPG; however, a significant amount of continued effort is required to make such an environment useful and accessible to scientists and engineers.
Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kirkpatrick, R. James

This document serves as the final report for United States Department of Energy Basic Energy Sciences Grant DE-FG02-08ER15929, “Computational and Spectroscopic Investigations of the Molecular Scale Structure and Dynamics of Geologically Important Fluids and Mineral-Fluid Interfaces” (R. James Kirkpatrick, P.I., A. O. Yazaydin, co-P.I.). The research under this grant was intimately tied to that supported by the parallel the grant of the same title at Alfred (DOE DE-FG02-10ER16128; Geoffrey M. Bowers, P.I.).
Instrumentation, performance visualization, and debugging tools for multiprocessors

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

1991-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Performance of a plasma fluid code on the Intel parallel computers

NASA Technical Reports Server (NTRS)

Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.

1992-01-01

One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
Massively parallel simulations of relativistic fluid dynamics on graphics processing units with CUDA

NASA Astrophysics Data System (ADS)

Bazow, Dennis; Heinz, Ulrich; Strickland, Michael

2018-04-01

Relativistic fluid dynamics is a major component in dynamical simulations of the quark-gluon plasma created in relativistic heavy-ion collisions. Simulations of the full three-dimensional dissipative dynamics of the quark-gluon plasma with fluctuating initial conditions are computationally expensive and typically require some degree of parallelization. In this paper, we present a GPU implementation of the Kurganov-Tadmor algorithm which solves the 3 + 1d relativistic viscous hydrodynamics equations including the effects of both bulk and shear viscosities. We demonstrate that the resulting CUDA-based GPU code is approximately two orders of magnitude faster than the corresponding serial implementation of the Kurganov-Tadmor algorithm. We validate the code using (semi-)analytic tests such as the relativistic shock-tube and Gubser flow.
Domain decomposition methods in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Gropp, William D.; Keyes, David E.

1991-01-01

The divide-and-conquer paradigm of iterative domain decomposition, or substructuring, has become a practical tool in computational fluid dynamic applications because of its flexibility in accommodating adaptive refinement through locally uniform (or quasi-uniform) grids, its ability to exploit multiple discretizations of the operator equations, and the modular pathway it provides towards parallelism. These features are illustrated on the classic model problem of flow over a backstep using Newton's method as the nonlinear iteration. Multiple discretizations (second-order in the operator and first-order in the preconditioner) and locally uniform mesh refinement pay dividends separately, and they can be combined synergistically. Sample performance results are included from an Intel iPSC/860 hypercube implementation.
Computational Modeling of the Dolphin Kick in Competitive Swimming

NASA Astrophysics Data System (ADS)

Loebbeck, A.; Mark, R.; Bhanot, G.

2005-11-01

Numerical simulations are being used to study the fluid dynamics of the dolphin kick in competitive swimming. This stroke is performed underwater after starts and turns and involves an undulatory motion of the body. Highly detailed laser body scans of elite swimmers are used and the kinematics of the dolphin kick is recreated from videos of Olympic level swimmers. We employ a parallelized immersed boundary method to simulate the flow associated with this stroke in all its complexity. The simulations provide a first of its kind glimpse of the fluid and vortex dynamics associated with this stroke and hydrodynamic force computations allow us to gain a better understanding of the thrust producing mechanisms.
A Parallel-Plate Flow Chamber for Mechanical Characterization of Endothelial Cells Exposed to Laminar Shear Stress

PubMed Central

Wong, Andrew K.; LLanos, Pierre; Boroda, Nickolas; Rosenberg, Seth R.; Rabbany, Sina Y.

2017-01-01

Shear stresses induced by laminar fluid flow are essential to properly recapitulate the physiological microenvironment experienced by endothelial cells (ECs). ECs respond to these stresses via mechanotransduction by modulating their phenotype and biomechanical characteristics, which can be characterized by Atomic Force Microscopy (AFM). Parallel Plate Flow Chambers (PPFCs) apply unidirectional laminar fluid flow to EC monolayers in vitro. Since ECs in sealed PPFCs are inaccessible to AFM probes, cone-and-plate viscometers (CPs) are commonly used to apply shear stress. This paper presents a comparison of the efficacies of both methods. Computational Fluid Dynamic simulation and validation testing using EC responses as a metric have indicated limitations in the use of CPs to apply laminar shear stress. Monolayers subjected to laminar fluid flow in a PPFC respond by increasing cortical stiffness, elongating, and aligning filamentous actin in the direction of fluid flow to a greater extent than CP devices. Limitations using CP devices to provide laminar flow across an EC monolayer suggest they are better suited when studying EC response for disturbed flow conditions. PPFC platforms allow for exposure of ECs to laminar fluid flow conditions, recapitulating cellular biomechanical behaviors, whereas CP platforms allow for mechanical characterization of ECs under secondary flow. PMID:28989541
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Turbulence modeling of free shear layers for high performance aircraft

NASA Technical Reports Server (NTRS)

Sondak, Douglas

1993-01-01

In many flowfield computations, accuracy of the turbulence model employed is frequently a limiting factor in the overall accuracy of the computation. This is particularly true for complex flowfields such as those around full aircraft configurations. Free shear layers such as wakes, impinging jets (in V/STOL applications), and mixing layers over cavities are often part of these flowfields. Although flowfields have been computed for full aircraft, the memory and CPU requirements for these computations are often excessive. Additional computer power is required for multidisciplinary computations such as coupled fluid dynamics and conduction heat transfer analysis. Massively parallel computers show promise in alleviating this situation, and the purpose of this effort was to adapt and optimize CFD codes to these new machines. The objective of this research effort was to compute the flowfield and heat transfer for a two-dimensional jet impinging normally on a cool plate. The results of this research effort were summarized in an AIAA paper titled 'Parallel Implementation of the k-epsilon Turbulence Model'. Appendix A contains the full paper.
Design and Analysis of Turbomachinery for Space Applications

NASA Technical Reports Server (NTRS)

Dorney, D.; Garcia, Roberto (Technical Monitor)

2002-01-01

This presentation provides an overview of CORSAIR, a three dimensional computational fluid dynamics software code for the analysis of turbomachinery components available from NASA, and discusses its potential use in the design of these parts. Topics covered include: time-dependent equations of motion, grid topology, turbulence models, boundary conditions, parallel simulations and miscellaneous capabilities.

Review Of Applied Mathematical Models For Describing The Behaviour Of Aqueous Humor In Eye Structures

NASA Astrophysics Data System (ADS)

Dzierka, M.; Jurczak, P.

2015-12-01

In the paper, currently used methods for modeling the flow of the aqueous humor through eye structures are presented. Then a computational model based on rheological models of Newtonian and non-Newtonian fluids is proposed. The proposed model may be used for modeling the flow of the aqueous humor through the trabecular meshwork. The trabecular meshwork is modeled as an array of rectilinear parallel capillary tubes. The flow of Newtonian and non-Newtonian fluids is considered. As a results of discussion mathematical equations of permeability of porous media and velocity of fluid flow through porous media have been received.
Operation of the Institute for Computer Applications in Science and Engineering

NASA Technical Reports Server (NTRS)

1975-01-01

The ICASE research program is described in detail; it consists of four major categories: (1) efficient use of vector and parallel computers, with particular emphasis on the CDC STAR-100; (2) numerical analysis, with particular emphasis on the development and analysis of basic numerical algorithms; (3) analysis and planning of large-scale software systems; and (4) computational research in engineering and the natural sciences, with particular emphasis on fluid dynamics. The work in each of these areas is described in detail; other activities are discussed, a prognosis of future activities are included.
Load Balancing Strategies for Multi-Block Overset Grid Applications

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)

2002-01-01

The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
The Role of Multiphysics Simulation in Multidisciplinary Analysis

NASA Technical Reports Server (NTRS)

Rifai, Steven M.; Ferencz, Robert M.; Wang, Wen-Ping; Spyropoulos, Evangelos T.; Lawrence, Charles; Melis, Matthew E.

1998-01-01

This article describes the applications of the Spectrum(Tm) Solver in Multidisciplinary Analysis (MDA). Spectrum, a multiphysics simulation software based on the finite element method, addresses compressible and incompressible fluid flow, structural, and thermal modeling as well as the interaction between these disciplines. Multiphysics simulation is based on a single computational framework for the modeling of multiple interacting physical phenomena. Interaction constraints are enforced in a fully-coupled manner using the augmented-Lagrangian method. Within the multiphysics framework, the finite element treatment of fluids is based on Galerkin-Least-Squares (GLS) method with discontinuity capturing operators. The arbitrary-Lagrangian-Eulerian method is utilized to account for deformable fluid domains. The finite element treatment of solids and structures is based on the Hu-Washizu variational principle. The multiphysics architecture lends itself naturally to high-performance parallel computing. Aeroelastic, propulsion, thermal management and manufacturing applications are presented.
Application of electron closures in extended MHD

NASA Astrophysics Data System (ADS)

Held, Eric; Adair, Brett; Taylor, Trevor

2017-10-01

Rigorous closure of the extended MHD equations in plasma fluid codes includes the effects of electron heat conduction along perturbed magnetic fields and contributions of the electron collisional friction and stress to the extended Ohms law. In this work we discuss application of a continuum numerical solution to the Chapman-Enskog-like electron drift kinetic equation using the NIMROD code. The implementation is a tightly-coupled fluid/kinetic system that carefully addresses time-centering in the advance of the fluid variables with their kinetically-computed closures. Comparisons of spatial accuracy, computational efficiency and required velocity space resolution are presented for applications involving growing magnetic islands in cylindrical and toroidal geometry. The reduction in parallel heat conduction due to particle trapping in toroidal geometry is emphasized. Work supported by DOE under Grant Nos. DE-FC02-08ER54973 and DE-FG02-04ER54746.
Advanced computational multi-fluid dynamics: a new model for understanding electrokinetic phenomena in porous media

NASA Astrophysics Data System (ADS)

Gulamali, M. Y.; Saunders, J. H.; Jackson, M. D.; Pain, C. C.

2009-04-01

We present results from a new computational multi-fluid dynamics code, designed to model the transport of heat, mass and chemical species during flow of single or multiple immiscible fluid phases through porous media, including gravitational effects and compressibility. The model also captures the electrical phenomena which may arise through electrokinetic, electrochemical and electrothermal coupling. Building on the advanced computational technology of the Imperial College Ocean Model, this new development leads the way towards a complex multiphase code using arbitrary unstructured and adaptive meshes, and domains decomposed to run in parallel over a cluster of workstations or a dedicated parallel computer. These facilities will allow efficient and accurate modelling of multiphase flows which capture large- and small-scale transport phenomena, while preserving the important geology and/or surface topology to make the results physically meaningful and realistic. Applications include modelling of contaminant transport in aquifers, multiphase flow during hydrocarbon production, migration of carbon dioxide during sequestration, and evaluation of the design and safety of nuclear reactors. Simulations of the streaming potential resulting from multiphase flow in laboratory- and field-scale models demonstrate that streaming potential signals originate at fluid fronts, and at geologic boundaries where fluid saturation changes. This suggests that downhole measurements of streaming potential may be used to inform production strategies in oil and gas reservoirs. As water encroaches on an oil production well, the streaming-potential signal associated with the water front encompasses the well even when the front is up to 100 m away, so the potential measured at the well starts to change significantly relative to a distant reference electrode. Variations in the geometry of the encroaching water front could be characterized using an array of electrodes positioned along the well, but a good understanding of the local reservoir geology will be required to identify signals caused by the front. The streaming potential measured at a well will be maximized in low-permeability reservoirs produced at a high rate, and in thick reservoirs with low shale content.
TMVOC-MP: a parallel numerical simulator for Three-PhaseNon-isothermal Flows of Multicomponent Hydrocarbon Mixtures inporous/fractured media

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Yamamoto, Hajime; Pruess, Karsten

2008-02-15

TMVOC-MP is a massively parallel version of the TMVOC code (Pruess and Battistelli, 2002), a numerical simulator for three-phase non-isothermal flow of water, gas, and a multicomponent mixture of volatile organic chemicals (VOCs) in multidimensional heterogeneous porous/fractured media. TMVOC-MP was developed by introducing massively parallel computing techniques into TMVOC. It retains the physical process model of TMVOC, designed for applications to contamination problems that involve hydrocarbon fuels or organic solvents in saturated and unsaturated zones. TMVOC-MP can model contaminant behavior under 'natural' environmental conditions, as well as for engineered systems, such as soil vapor extraction, groundwater pumping, or steam-assisted sourcemore » remediation. With its sophisticated parallel computing techniques, TMVOC-MP can handle much larger problems than TMVOC, and can be much more computationally efficient. TMVOC-MP models multiphase fluid systems containing variable proportions of water, non-condensible gases (NCGs), and water-soluble volatile organic chemicals (VOCs). The user can specify the number and nature of NCGs and VOCs. There are no intrinsic limitations to the number of NCGs or VOCs, although the arrays for fluid components are currently dimensioned as 20, accommodating water plus 19 components that may be either NCGs or VOCs. Among them, NCG arrays are dimensioned as 10. The user may select NCGs from a data bank provided in the software. The currently available choices include O{sub 2}, N{sub 2}, CO{sub 2}, CH{sub 4}, ethane, ethylene, acetylene, and air (a pseudo-component treated with properties averaged from N{sub 2} and O{sub 2}). Thermophysical property data of VOCs can be selected from a chemical data bank, included with TMVOC-MP, that provides parameters for 26 commonly encountered chemicals. Users also can input their own data for other fluids. The fluid components may partition (volatilize and/or dissolve) among gas, aqueous, and NAPL phases. Any combination of the three phases may present, and phases may appear and disappear in the course of a simulation. In addition, VOCs may be adsorbed by the porous medium, and may biodegrade according to a simple half-life model. Detailed discussion of physical processes, assumptions, and fluid properties used in TMVOC-MP can be found in the TMVOC user's guide (Pruess and Battistelli, 2002). TMVOC-MP was developed based on the parallel framework of the TOUGH2-MP code (Zhang et al. 2001, Wu et al. 2002). It uses the MPI (Message Passing Forum, 1994) for parallel implementation. A domain decomposition approach is adopted for the parallelization. The code partitions a simulation domain, defined by an unstructured grid, using partitioning algorithm from the METIS software package (Karypsis and Kumar, 1998). In parallel simulation, each processor is in charge of one part of the simulation domain for assembling mass and energy balance equations, solving linear equation systems, updating thermophysical properties, and performing other local computations. The local linear-equation systems are solved in parallel by multiple processors with the Aztec linear solver package (Tuminaro et al., 1999). Although each processor solves the linearized equations of subdomains independently, the entire linear equation system is solved together by all processors collaboratively via communication between neighboring processors during each iteration. Detailed discussion of the prototype of the data-exchange scheme can be found in Elmroth et al. (2001). In addition, FORTRAN 90 features are introduced to TMVOC-MP, such as dynamic memory allocation, array operation, matrix manipulation, and replacing 'common blocks' (used in the original TMVOC) with modules. All new subroutines are written in FORTRAN 90. Program units imported from the original TMVOC remain in standard FORTRAN 77. This report provides a quick starting guide for using the TMVOC-MP program. We suppose that the users have basic knowledge of using the original TMVOC code. The users can find the detailed technical description of the physical processes modeled, and the mathematical and numerical methods in the user's guide for TMVOC (Pruess and Battistelli, 2002).« less
High Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.

1994-01-01

In order to predict the dynamic response of a flexible structure in a fluid flow, the equations of motion of the structure and the fluid must be solved simultaneously. In this paper, we present several partitioned procedures for time-integrating this focus coupled problem and discuss their merits in terms of accuracy, stability, heterogeneous computing, I/O transfers, subcycling, and parallel processing. All theoretical results are derived for a one-dimensional piston model problem with a compressible flow, because the complete three-dimensional aeroelastic problem is difficult to analyze mathematically. However, the insight gained from the analysis of the coupled piston problem and the conclusions drawn from its numerical investigation are confirmed with the numerical simulation of the two-dimensional transient aeroelastic response of a flexible panel in a transonic nonlinear Euler flow regime.
Data Parallel Line Relaxation (DPLR) Code User Manual: Acadia - Version 4.01.1

NASA Technical Reports Server (NTRS)

Wright, Michael J.; White, Todd; Mangini, Nancy

2009-01-01

Data-Parallel Line Relaxation (DPLR) code is a computational fluid dynamic (CFD) solver that was developed at NASA Ames Research Center to help mission support teams generate high-value predictive solutions for hypersonic flow field problems. The DPLR Code Package is an MPI-based, parallel, full three-dimensional Navier-Stokes CFD solver with generalized models for finite-rate reaction kinetics, thermal and chemical non-equilibrium, accurate high-temperature transport coefficients, and ionized flow physics incorporated into the code. DPLR also includes a large selection of generalized realistic surface boundary conditions and links to enable loose coupling with external thermal protection system (TPS) material response and shock layer radiation codes.
Parallelized modelling and solution scheme for hierarchically scaled simulations

NASA Technical Reports Server (NTRS)

Padovan, Joe

1995-01-01

This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Multiple grid problems on concurrent-processing computers

NASA Technical Reports Server (NTRS)

Eberhardt, D. S.; Baganoff, D.

1986-01-01

Three computer codes were studied which make use of concurrent processing computer architectures in computational fluid dynamics (CFD). The three parallel codes were tested on a two processor multiple-instruction/multiple-data (MIMD) facility at NASA Ames Research Center, and are suggested for efficient parallel computations. The first code is a well-known program which makes use of the Beam and Warming, implicit, approximate factored algorithm. This study demonstrates the parallelism found in a well-known scheme and it achieved speedups exceeding 1.9 on the two processor MIMD test facility. The second code studied made use of an embedded grid scheme which is used to solve problems having complex geometries. The particular application for this study considered an airfoil/flap geometry in an incompressible flow. The scheme eliminates some of the inherent difficulties found in adapting approximate factorization techniques onto MIMD machines and allows the use of chaotic relaxation and asynchronous iteration techniques. The third code studied is an application of overset grids to a supersonic blunt body problem. The code addresses the difficulties encountered when using embedded grids on a compressible, and therefore nonlinear, problem. The complex numerical boundary system associated with overset grids is discussed and several boundary schemes are suggested. A boundary scheme based on the method of characteristics achieved the best results.
Visualization of Unsteady Computational Fluid Dynamics

NASA Technical Reports Server (NTRS)

Haimes, Robert

1997-01-01

The current compute environment that most researchers are using for the calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a super-computer class machine. The Massively Parallel Processors (MPP's) such as the 160 node IBM SP2 at NAS and clusters of workstations acting as a single MPP (like NAS's SGI Power-Challenge array and the J90 cluster) provide the required computation bandwidth for CFD calculations of transient problems. If we follow the traditional computational analysis steps for CFD (and we wish to construct an interactive visualizer) we need to be aware of the following: (1) Disk space requirements. A single snap-shot must contain at least the values (primitive variables) stored at the appropriate locations within the mesh. For most simple 3D Euler solvers that means 5 floating point words. Navier-Stokes solutions with turbulence models may contain 7 state-variables. (2) Disk speed vs. Computational speeds. The time required to read the complete solution of a saved time frame from disk is now longer than the compute time for a set number of iterations from an explicit solver. Depending, on the hardware and solver an iteration of an implicit code may also take less time than reading the solution from disk. If one examines the performance improvements in the last decade or two, it is easy to see that depending on disk performance (vs. CPU improvement) may not be the best method for enhancing interactivity. (3) Cluster and Parallel Machine I/O problems. Disk access time is much worse within current parallel machines and cluster of workstations that are acting in concert to solve a single problem. In this case we are not trying to read the volume of data, but are running the solver and the solver outputs the solution. These traditional network interfaces must be used for the file system. (4) Numerics of particle traces. Most visualization tools can work upon a single snap shot of the data but some visualization tools for transient problems require dealing with time.
The force on the flex: Global parallelism and portability

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1986-01-01

A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Nunes, Ruben; Almeida, José A.

2010-08-01

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.
State-of-the-art review of computational fluid dynamics modeling for fluid-solids systems

NASA Astrophysics Data System (ADS)

Lyczkowski, R. W.; Bouillard, J. X.; Ding, J.; Chang, S. L.; Burge, S. W.

1994-05-01

As the result of 15 years of research (50 staff years of effort) Argonne National Laboratory (ANL), through its involvement in fluidized-bed combustion, magnetohydrodynamics, and a variety of environmental programs, has produced extensive computational fluid dynamics (CFD) software and models to predict the multiphase hydrodynamic and reactive behavior of fluid-solids motions and interactions in complex fluidized-bed reactors (FBR's) and slurry systems. This has resulted in the FLUFIX, IRF, and SLUFIX computer programs. These programs are based on fluid-solids hydrodynamic models and can predict information important to the designer of atmospheric or pressurized bubbling and circulating FBR, fluid catalytic cracking (FCC) and slurry units to guarantee optimum efficiency with minimum release of pollutants into the environment. This latter issue will become of paramount importance with the enactment of the Clean Air Act Amendment (CAAA) of 1995. Solids motion is also the key to understanding erosion processes. Erosion rates in FBR's and pneumatic and slurry components are computed by ANL's EROSION code to predict the potential metal wastage of FBR walls, intervals, feed distributors, and cyclones. Only the FLUFIX and IRF codes will be reviewed in the paper together with highlights of the validations because of length limitations. It is envisioned that one day, these codes with user-friendly pre- and post-processor software and tailored for massively parallel multiprocessor shared memory computational platforms will be used by industry and researchers to assist in reducing and/or eliminating the environmental and economic barriers which limit full consideration of coal, shale, and biomass as energy sources; to retain energy security; and to remediate waste and ecological problems.
Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.
Lattice Boltzmann multi-phase simulations in porous media using Multiple GPUs

NASA Astrophysics Data System (ADS)

Toelke, J.; De Prisco, G.; Mu, Y.

2011-12-01

Ingrain's digital rock physics lab computes the physical properties and fluid flow characteristics of oil and gas reservoir rocks including shales, carbonates and sandstones. Ingrain uses advanced lattice Boltzmann methods (LBM) to simulate multiphase flow in the rocks (porous media). We present a very efficient implementation of these methods based on CUDA. Because LBM operates on a finite difference grid, is explicit in nature, and requires only next-neighbor interactions, it is suitable for implementation on GPUs. Since GPU hardware allows for very fine grain parallelism, every lattice site can be handled by a different core. Data has to be loaded from and stored to the device memory in such a way that dense access to the memory is ensured. This can be achieved by accessing the lattice nodes with respect to their contiguous memory locations [1,2]. The simulation engine uses a sparse data structure to represent the grid and advanced algorithms to handle the moving fluid-fluid interface. The simulations are accelerated on one GPU by one order of magnitude compared to a state of the art multicore desktop computer. The engine is parallelized using MPI and runs on multiple GPUs in the same node or across the Infiniband network. Simulations with up to 50 GPUs in parallel are presented. With this simulator using it is possible to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media in a commercial manner and to predict important rock properties like absolute permeability, relative permeabilites and capillary pressure [3,4]. Results and videos of these simulations in complex real world porous media and rocks are presented and discussed.
Charon Message-Passing Toolkit for Scientific Computations

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Yan, Jerry (Technical Monitor)

2000-01-01

Charon is a library, callable from C and Fortran, that aids the conversion of structured-grid legacy codes-such as those used in the numerical computation of fluid flows-into parallel, high- performance codes. Key are functions that define distributed arrays, that map between distributed and non-distributed arrays, and that allow easy specification of common communications on structured grids. The library is based on the widely accepted MPI message passing standard. We present an overview of the functionality of Charon, and some representative results.
The Computational Fluid Dynamics Rupture Challenge 2013—Phase I: prediction of rupture status in intracranial aneurysms.

PubMed

Janiga, G; Berg, P; Sugiyama, S; Kono, K; Steinman, D A

2015-03-01

Rupture risk assessment for intracranial aneurysms remains challenging, and risk factors, including wall shear stress, are discussed controversially. The primary purpose of the presented challenge was to determine how consistently aneurysm rupture status and rupture site could be identified on the basis of computational fluid dynamics. Two geometrically similar MCA aneurysms were selected, 1 ruptured, 1 unruptured. Participating computational fluid dynamics groups were blinded as to which case was ruptured. Participants were provided with digitally segmented lumen geometries and, for this phase of the challenge, were free to choose their own flow rates, blood rheologies, and so forth. Participants were asked to report which case had ruptured and the likely site of rupture. In parallel, lumen geometries were provided to a group of neurosurgeons for their predictions of rupture status and site. Of 26 participating computational fluid dynamics groups, 21 (81%) correctly identified the ruptured case. Although the known rupture site was associated with low and oscillatory wall shear stress, most groups identified other sites, some of which also experienced low and oscillatory shear. Of the 43 participating neurosurgeons, 39 (91%) identified the ruptured case. None correctly identified the rupture site. Geometric or hemodynamic considerations favor identification of rupture status; however, retrospective identification of the rupture site remains a challenge for both engineers and clinicians. A more precise understanding of the hemodynamic factors involved in aneurysm wall pathology is likely required for computational fluid dynamics to add value to current clinical decision-making regarding rupture risk. © 2015 by American Journal of Neuroradiology.
User's Guide for ENSAERO_FE Parallel Finite Element Solver

NASA Technical Reports Server (NTRS)

Eldred, Lloyd B.; Guruswamy, Guru P.

1999-01-01

A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.

Recycling isoelectric focusing with computer controlled data acquisition system. [for high resolution electrophoretic separation and purification of biomolecules

NASA Technical Reports Server (NTRS)

Egen, N. B.; Twitty, G. E.; Bier, M.

1979-01-01

Isoelectric focusing is a high-resolution technique for separating and purifying large peptides, proteins, and other biomolecules. The apparatus described in the present paper constitutes a new approach to fluid stabilization and increased throughput. Stabilization is achieved by flowing the process fluid uniformly through an array of closely spaced filter elements oriented parallel both to the electrodes and the direction of the flow. This seems to overcome the major difficulties of parabolic flow and electroosmosis at the walls, while limiting the convection to chamber compartments defined by adjacent spacers. Increased throughput is achieved by recirculating the process fluid through external heat exchange reservoirs, where the Joule heat is dissipated.
Pressure gradients fail to predict diffusio-osmosis

NASA Astrophysics Data System (ADS)

Liu, Yawei; Ganti, Raman; Frenkel, Daan

2018-05-01

We present numerical simulations of diffusio-osmotic flow, i.e. the fluid flow generated by a concentration gradient along a solid-fluid interface. In our study, we compare a number of distinct approaches that have been proposed for computing such flows and compare them with a reference calculation based on direct, non-equilibrium molecular dynamics simulations. As alternatives, we consider schemes that compute diffusio-osmotic flow from the gradient of the chemical potentials of the constituent species and from the gradient of the component of the pressure tensor parallel to the interface. We find that the approach based on treating chemical potential gradients as external forces acting on various species agrees with the direct simulations, thereby supporting the approach of Marbach et al (2017 J. Chem. Phys. 146 194701). In contrast, an approach based on computing the gradients of the microscopic pressure tensor does not reproduce the direct non-equilibrium results.
ISCFD Nagoya 1989 - International Symposium on Computational Fluid Dynamics, 3rd, Nagoya, Japan, Aug. 28-31, 1989, Technical Papers

NASA Astrophysics Data System (ADS)

Recent advances in computational fluid dynamics are discussed in reviews and reports. Topics addressed include large-scale LESs for turbulent pipe and channel flows, numerical solutions of the Euler and Navier-Stokes equations on parallel computers, multigrid methods for steady high-Reynolds-number flow past sudden expansions, finite-volume methods on unstructured grids, supersonic wake flow on a blunt body, a grid-characteristic method for multidimensional gas dynamics, and CIC numerical simulation of a wave boundary layer. Consideration is given to vortex simulations of confined two-dimensional jets, supersonic viscous shear layers, spectral methods for compressible flows, shock-wave refraction at air/water interfaces, oscillatory flow in a two-dimensional collapsible channel, the growth of randomness in a spatially developing wake, and an efficient simplex algorithm for the finite-difference and dynamic linear-programming method in optimal potential control.
Temporal parallelization of edge plasma simulations using the parareal algorithm and the SOLPS code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samaddar, Debasmita; Coster, D. P.; Bonnin, X.

We show that numerical modelling of edge plasma physics may be successfully parallelized in time. The parareal algorithm has been employed for this purpose and the SOLPS code package coupling the B2.5 finite-volume fluid plasma solver with the kinetic Monte-Carlo neutral code Eirene has been used as a test bed. The complex dynamics of the plasma and neutrals in the scrape-off layer (SOL) region makes this a unique application. It is demonstrated that a significant computational gain (more than an order of magnitude) may be obtained with this technique. The use of the IPS framework for event-based parareal implementation optimizesmore » resource utilization and has been shown to significantly contribute to the computational gain.« less
Temporal parallelization of edge plasma simulations using the parareal algorithm and the SOLPS code

DOE PAGES

Samaddar, Debasmita; Coster, D. P.; Bonnin, X.; ...

2017-07-31

We show that numerical modelling of edge plasma physics may be successfully parallelized in time. The parareal algorithm has been employed for this purpose and the SOLPS code package coupling the B2.5 finite-volume fluid plasma solver with the kinetic Monte-Carlo neutral code Eirene has been used as a test bed. The complex dynamics of the plasma and neutrals in the scrape-off layer (SOL) region makes this a unique application. It is demonstrated that a significant computational gain (more than an order of magnitude) may be obtained with this technique. The use of the IPS framework for event-based parareal implementation optimizesmore » resource utilization and has been shown to significantly contribute to the computational gain.« less
Cumulative reports and publications through December 31, 1989

NASA Technical Reports Server (NTRS)

1990-01-01

A complete list of reports from the Institute for Computer Applications in Science and Engineering (ICASE) is presented. The major categories of the current ICASE research program are: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effectual numerical methods; computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, structural analysis, and chemistry; computer systems and software, especially vector and parallel computers, microcomputers, and data management. Since ICASE reports are intended to be preprints of articles that will appear in journals or conference proceedings, the published reference is included when it is available.
Finite element analysis in fluids; Proceedings of the Seventh International Conference on Finite Element Methods in Flow Problems, University of Alabama, Huntsville, Apr. 3-7, 1989

NASA Technical Reports Server (NTRS)

Chung, T. J. (Editor); Karr, Gerald R. (Editor)

1989-01-01

Recent advances in computational fluid dynamics are examined in reviews and reports, with an emphasis on finite-element methods. Sections are devoted to adaptive meshes, atmospheric dynamics, combustion, compressible flows, control-volume finite elements, crystal growth, domain decomposition, EM-field problems, FDM/FEM, and fluid-structure interactions. Consideration is given to free-boundary problems with heat transfer, free surface flow, geophysical flow problems, heat and mass transfer, high-speed flow, incompressible flow, inverse design methods, MHD problems, the mathematics of finite elements, and mesh generation. Also discussed are mixed finite elements, multigrid methods, non-Newtonian fluids, numerical dissipation, parallel vector processing, reservoir simulation, seepage, shallow-water problems, spectral methods, supercomputer architectures, three-dimensional problems, and turbulent flows.
Perspectives on the Future of CFD

NASA Technical Reports Server (NTRS)

Kwak, Dochan

2000-01-01

This viewgraph presentation gives an overview of the future of computational fluid dynamics (CFD), which in the past has pioneered the field of flow simulation. Over time CFD has progressed as computing power. Numerical methods have been advanced as CPU and memory capacity increases. Complex configurations are routinely computed now and direct numerical simulations (DNS) and large eddy simulations (LES) are used to study turbulence. As the computing resources changed to parallel and distributed platforms, computer science aspects such as scalability (algorithmic and implementation) and portability and transparent codings have advanced. Examples of potential future (or current) challenges include risk assessment, limitations of the heuristic model, and the development of CFD and information technology (IT) tools.
Time-Dependent Simulations of Turbopump Flows

NASA Technical Reports Server (NTRS)

Kiris, Cetin; Kwak, Dochan; Chan, William; Williams, Robert

2002-01-01

Unsteady flow simulations for RLV (Reusable Launch Vehicles) 2nd Generation baseline turbopump for one and half impeller rotations have been completed by using a 34.3 Million grid points model. MLP (Multi-Level Parallelism) shared memory parallelism has been implemented in INS3D, and benchmarked. Code optimization for cash based platforms will be completed by the end of September 2001. Moving boundary capability is obtained by using DCF module. Scripting capability from CAD (computer aided design) geometry to solution has been developed. Data compression is applied to reduce data size in post processing. Fluid/Structure coupling has been initiated.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
OFF, Open source Finite volume Fluid dynamics code: A free, high-order solver based on parallel, modular, object-oriented Fortran API

NASA Astrophysics Data System (ADS)

Zaghi, S.

2014-07-01

OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git has been adopted in order to facilitate the collaborative maintenance and improvement of the code; CopyrightsOFF is a free software that anyone can use, copy, distribute, study, change and improve under the GNU Public License version 3. The present paper is a manifesto of OFF code and presents the currently implemented features and ongoing developments. This work is focused on the computational techniques adopted and a detailed description of the main API characteristics is reported. OFF capabilities are demonstrated by means of one and two dimensional examples and a three dimensional real application.
Domain Immersion Technique And Free Surface Computations Applied To Extrusion And Mixing Processes

NASA Astrophysics Data System (ADS)

Valette, Rudy; Vergnes, Bruno; Basset, Olivier; Coupez, Thierry

2007-04-01

This work focuses on the development of numerical techniques devoted to the simulation of mixing processes of complex fluids such as twin-screw extrusion or batch mixing. In mixing process simulation, the absence of symmetry of the moving boundaries (the screws or the rotors) implies that their rigid body motion has to be taken into account by using a special treatment. We therefore use a mesh immersion technique (MIT), which consists in using a P1+/P1-based (MINI-element) mixed finite element method for solving the velocity-pressure problem and then solving the problem in the whole barrel cavity by imposing a rigid motion (rotation) to nodes found located inside the so called immersed domain, each subdomain (screw, rotor) being represented by a surface CAD mesh (or its mathematical equation in simple cases). The independent meshes are immersed into a unique backgound computational mesh by computing the distance function to their boundaries. Intersections of meshes are accounted for, allowing to compute a fill factor usable as for the VOF methodology. This technique, combined with the use of parallel computing, allows to compute the time-dependent flow of generalized Newtonian fluids including yield stress fluids in a complex system such as a twin screw extruder, including moving free surfaces, which are treated by a "level set" and Hamilton-Jacobi method.
Moose: An Open-Source Framework to Enable Rapid Development of Collaborative, Multi-Scale, Multi-Physics Simulation Tools

NASA Astrophysics Data System (ADS)

Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.

2014-12-01

The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
Aerodynamic Shape Optimization of Supersonic Aircraft Configurations via an Adjoint Formulation on Parallel Computers

NASA Technical Reports Server (NTRS)

Reuther, James; Alonso, Juan Jose; Rimlinger, Mark J.; Jameson, Antony

1996-01-01

This work describes the application of a control theory-based aerodynamic shape optimization method to the problem of supersonic aircraft design. The design process is greatly accelerated through the use of both control theory and a parallel implementation on distributed memory computers. Control theory is employed to derive the adjoint differential equations whose solution allows for the evaluation of design gradient information at a fraction of the computational cost required by previous design methods. The resulting problem is then implemented on parallel distributed memory architectures using a domain decomposition approach, an optimized communication schedule, and the MPI (Message Passing Interface) Standard for portability and efficiency. The final result achieves very rapid aerodynamic design based on higher order computational fluid dynamics methods (CFD). In our earlier studies, the serial implementation of this design method was shown to be effective for the optimization of airfoils, wings, wing-bodies, and complex aircraft configurations using both the potential equation and the Euler equations. In our most recent paper, the Euler method was extended to treat complete aircraft configurations via a new multiblock implementation. Furthermore, during the same conference, we also presented preliminary results demonstrating that this basic methodology could be ported to distributed memory parallel computing architectures. In this paper, our concern will be to demonstrate that the combined power of these new technologies can be used routinely in an industrial design environment by applying it to the case study of the design of typical supersonic transport configurations. A particular difficulty of this test case is posed by the propulsion/airframe integration.
A 3D, fully Eulerian, VOF-based solver to study the interaction between two fluids and moving rigid bodies using the fictitious domain method

NASA Astrophysics Data System (ADS)

Pathak, Ashish; Raessi, Mehdi

2016-04-01

We present a three-dimensional (3D) and fully Eulerian approach to capturing the interaction between two fluids and moving rigid structures by using the fictitious domain and volume-of-fluid (VOF) methods. The solid bodies can have arbitrarily complex geometry and can pierce the fluid-fluid interface, forming contact lines. The three-phase interfaces are resolved and reconstructed by using a VOF-based methodology. Then, a consistent scheme is employed for transporting mass and momentum, allowing for simulations of three-phase flows of large density ratios. The Eulerian approach significantly simplifies numerical resolution of the kinematics of rigid bodies of complex geometry and with six degrees of freedom. The fluid-structure interaction (FSI) is computed using the fictitious domain method. The methodology was developed in a message passing interface (MPI) parallel framework accelerated with graphics processing units (GPUs). The computationally intensive solution of the pressure Poisson equation is ported to GPUs, while the remaining calculations are performed on CPUs. The performance and accuracy of the methodology are assessed using an array of test cases, focusing individually on the flow solver and the FSI in surface-piercing configurations. Finally, an application of the proposed methodology in simulations of the ocean wave energy converters is presented.
Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

NASA Astrophysics Data System (ADS)

Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

2016-06-01

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
Multi-component fluid flow through porous media by interacting lattice gas computer simulation

NASA Astrophysics Data System (ADS)

Cueva-Parra, Luis Alberto

In this work we study structural and transport properties such as power-law behavior of trajectory of each constituent and their center of mass, density profile, mass flux, permeability, velocity profile, phase separation, segregation, and mixing of miscible and immiscible multicomponent fluid flow through rigid and non-consolidated porous media. The considered parameters are the mass ratio of the components, temperature, external pressure, and porosity. Due to its solid theoretical foundation and computational simplicity, the selected approaches are the Interacting Lattice Gas with Monte Carlo Method (Metropolis Algorithm) and direct sampling, combined with particular collision rules. The percolation mechanism is used for modeling initial random porous media. The introduced collision rules allow to model non-consolidated porous media, because part of the kinetic energy of the fluid particles is transfered to barrier particles, which are the components of the porous medium. Having gained kinetic energy, the barrier particles can move. A number of interesting results are observed. Some findings include, (i) phase separation in immiscible fluid flow through a medium with no barrier particles (porosity p P = 1). (ii) For the flow of miscible fluids through rigid porous medium with porosity close to percolation threshold (p C), the flux density (measure of permeability) shows a power law increase ∝ (pC - p) mu with mu = 2.0, and the density profile is found to decay with height ∝ exp(-mA/Bh), consistent with the barometric height law. (iii) Sedimentation and driving of barrier particles in fluid flow through non-consolidated porous medium. This study involves developing computer simulation models with efficient serial and parallel codes, extensive data analysis via graphical utilities, and computer visualization techniques.
Computational modeling of magnetic particle margination within blood flow through LAMMPS

NASA Astrophysics Data System (ADS)

Ye, Huilin; Shen, Zhiqiang; Li, Ying

2017-11-01

We develop a multiscale and multiphysics computational method to investigate the transport of magnetic particles as drug carriers in blood flow under influence of hydrodynamic interaction and external magnetic field. A hybrid coupling method is proposed to handle red blood cell (RBC)-fluid interface (CFI) and magnetic particle-fluid interface (PFI), respectively. Immersed boundary method (IBM)-based velocity coupling is used to account for CFI, which is validated by tank-treading and tumbling behaviors of a single RBC in simple shear flow. While PFI is captured by IBM-based force coupling, which is verified through movement of a single magnetic particle under non-uniform external magnetic field and breakup of a magnetic chain in rotating magnetic field. These two components are seamlessly integrated within the LAMMPS framework, which is a highly parallelized molecular dynamics solver. In addition, we also implement a parallelized lattice Boltzmann simulator within LAMMPS to handle the fluid flow simulation. Based on the proposed method, we explore the margination behaviors of magnetic particles and magnetic chains within blood flow. We find that the external magnetic field can be used to guide the motion of these magnetic materials and promote their margination to the vascular wall region. Moreover, the scaling performance and speedup test further confirm the high efficiency and robustness of proposed computational method. Therefore, it provides an efficient way to simulate the transport of nanoparticle-based drug carriers within blood flow in a large scale. The simulation results can be applied in the design of efficient drug delivery vehicles that optimally accumulate within diseased tissue, thus providing better imaging sensitivity, therapeutic efficacy and lower toxicity.
An object-oriented approach for parallel self adaptive mesh refinement on block structured grids

NASA Technical Reports Server (NTRS)

Lemke, Max; Witsch, Kristian; Quinlan, Daniel

1993-01-01

Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently.
An Analysis of Performance Enhancement Techniques for Overset Grid Applications

NASA Technical Reports Server (NTRS)

Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)

2002-01-01

The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.

Parallel computation of fluid-structural interactions using high resolution upwind schemes

NASA Astrophysics Data System (ADS)

Hu, Zongjun

An efficient and accurate solver is developed to simulate the non-linear fluid-structural interactions in turbomachinery flutter flows. A new low diffusion E-CUSP scheme, Zha CUSP scheme, is developed to improve the efficiency and accuracy of the inviscid flux computation. The 3D unsteady Navier-Stokes equations with the Baldwin-Lomax turbulence model are solved using the finite volume method with the dual-time stepping scheme. The linearized equations are solved with Gauss-Seidel line iterations. The parallel computation is implemented using MPI protocol. The solver is validated with 2D cases for its turbulence modeling, parallel computation and unsteady calculation. The Zha CUSP scheme is validated with 2D cases, including a supersonic flat plate boundary layer, a transonic converging-diverging nozzle and a transonic inlet diffuser. The Zha CUSP2 scheme is tested with 3D cases, including a circular-to-rectangular nozzle, a subsonic compressor cascade and a transonic channel. The Zha CUSP schemes are proved to be accurate, robust and efficient in these tests. The steady and unsteady separation flows in a 3D stationary cascade under high incidence and three inlet Mach numbers are calculated to study the steady state separation flow patterns and their unsteady oscillation characteristics. The leading edge vortex shedding is the mechanism behind the unsteady characteristics of the high incidence separated flows. The separation flow characteristics is affected by the inlet Mach number. The blade aeroelasticity of a linear cascade with forced oscillating blades is studied using parallel computation. A simplified two-passage cascade with periodic boundary condition is first calculated under a medium frequency and a low incidence. The full scale cascade with 9 blades and two end walls is then studied more extensively under three oscillation frequencies and two incidence angles. The end wall influence and the blade stability are studied and compared under different frequencies and incidence angles. The Zha CUSP schemes are the first time to be applied in moving grid systems and 2D and 3D calculations. The implicit Gauss-Seidel iteration with dual time stepping is the first time to be used for moving grid systems. The NASA flutter cascade is the first time to be calculated in full scale.
The Navier-Stokes computer

NASA Technical Reports Server (NTRS)

Nosenchuck, D. M.; Littman, M. G.

1986-01-01

The Navier-Stokes computer (NSC) has been developed for solving problems in fluid mechanics involving complex flow simulations that require more speed and capacity than provided by current and proposed Class VI supercomputers. The machine is a parallel processing supercomputer with several new architectural elements which can be programmed to address a wide range of problems meeting the following criteria: (1) the problem is numerically intensive, and (2) the code makes use of long vectors. A simulation of two-dimensional nonsteady viscous flows is presented to illustrate the architecture, programming, and some of the capabilities of the NSC.
Product selectivity control induced by using liquid-liquid parallel laminar flow in a microreactor.

PubMed

Amemiya, Fumihiro; Matsumoto, Hideyuki; Fuse, Keishi; Kashiwagi, Tsuneo; Kuroda, Chiaki; Fuchigami, Toshio; Atobe, Mahito

2011-06-07

Product selectivity control based on a liquid-liquid parallel laminar flow has been successfully demonstrated by using a microreactor. Our electrochemical microreactor system enables regioselective cross-coupling reaction of aldehyde with allylic chloride via chemoselective cathodic reduction of substrate by the combined use of suitable flow mode and corresponding cathode material. The formation of liquid-liquid parallel laminar flow in the microreactor was supported by the estimation of benzaldehyde diffusion coefficient and computational fluid dynamics simulation. The diffusion coefficient for benzaldehyde in Bu(4)NClO(4)-HMPA medium was determined to be 1.32 × 10(-7) cm(2) s(-1) by electrochemical measurements, and the flow simulation using this value revealed the formation of clear concentration gradient of benzaldehyde in the microreactor channel over a specific channel length. In addition, the necessity of the liquid-liquid parallel laminar flow was confirmed by flow mode experiments.
Global MHD simulation of magnetosphere using HPF

NASA Astrophysics Data System (ADS)

Ogino, T.

We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5% using 56 PEs of Fujitsu VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Multidimensional bioseparation with modular microfluidics

DOEpatents

Chirica, Gabriela S.; Renzi, Ronald F.

2013-08-27

A multidimensional chemical separation and analysis system is described including a prototyping platform and modular microfluidic components capable of rapid and convenient assembly, alteration and disassembly of numerous candidate separation systems. Partial or total computer control of the separation system is possible. Single or multiple alternative processing trains can be tested, optimized and/or run in parallel. Examples related to the separation and analysis of human bodily fluids are given.
Acoustic resonances in cylinder bundles oscillating in a compressibile fluid

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, W.H.; Raptis, A.C.

1984-12-01

This paper deals with an analytical study on acoustic resonances of elastic oscillations of a group of parallel, circular, thin cylinders in an unbounded volume of barotropic, compressible, inviscid fluid. The perturbed motion of the fluid is assumed due entirely to the flexural oscillations of the cylinders. The motion of the fluid disturbances is first formulated in a three-dimensional wave form and then casted into a two-dimensional Helmholtz equation for the harmonic motion in time and in axial space. The acoustic motion in the fluid and the elastic motion in the cylinders are solved simultaneously. Acoustic resonances were approximately determinedmore » from the secular (eigenvalue) equation by the method of successive iteration with the use of digital computers for a given set of the fluid properties and the cylinders' geometry and properties. Effects of the flexural wavenumber and the configuration of and the spacing between the cylinders on the acoustic resonances were thoroughly investigated.« less
Study of the mapping of Navier-Stokes algorithms onto multiple-instruction/multiple-data-stream computers

NASA Technical Reports Server (NTRS)

Eberhardt, D. S.; Baganoff, D.; Stevens, K.

1984-01-01

Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.
Cumulative reports and publications through 31 December 1983

NASA Technical Reports Server (NTRS)

1983-01-01

All reports for the calendar years 1975 through December 1983 are listed by author. Since ICASE reports are intended to be preprints of articles for journals and conference proceedings, the published reference is included when available. Thirteen older journal and conference proceedings references are included as well as five additional reports by ICASE personnel. Major categories of research covered include: (1) numerical methods, with particular emphasis on the development and analysis of basic algorithms; (2) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, structural analysis, and chemistry; and (3) computer systems and software, especially vector and parallel computers, microcomputers, and data management.
A message passing kernel for the hypercluster parallel processing test bed

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Quealy, Angela; Cole, Gary L.

1989-01-01

A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.
Domain decomposition algorithms and computation fluid dynamics

NASA Technical Reports Server (NTRS)

Chan, Tony F.

1988-01-01

In the past several years, domain decomposition was a very popular topic, partly motivated by the potential of parallelization. While a large body of theory and algorithms were developed for model elliptic problems, they are only recently starting to be tested on realistic applications. The application of some of these methods to two model problems in computational fluid dynamics are investigated. Some examples are two dimensional convection-diffusion problems and the incompressible driven cavity flow problem. The construction and analysis of efficient preconditioners for the interface operator to be used in the iterative solution of the interface solution is described. For the convection-diffusion problems, the effect of the convection term and its discretization on the performance of some of the preconditioners is discussed. For the driven cavity problem, the effectiveness of a class of boundary probe preconditioners is discussed.
National Combustion Code: A Multidisciplinary Combustor Design System

NASA Technical Reports Server (NTRS)

Stubbs, Robert M.; Liu, Nan-Suey

1997-01-01

The Internal Fluid Mechanics Division conducts both basic research and technology, and system technology research for aerospace propulsion systems components. The research within the division, which is both computational and experimental, is aimed at improving fundamental understanding of flow physics in inlets, ducts, nozzles, turbomachinery, and combustors. This article and the following three articles highlight some of the work accomplished in 1996. A multidisciplinary combustor design system is critical for optimizing the combustor design process. Such a system should include sophisticated computer-aided design (CAD) tools for geometry creation, advanced mesh generators for creating solid model representations, a common framework for fluid flow and structural analyses, modern postprocessing tools, and parallel processing. The goal of the present effort is to develop some of the enabling technologies and to demonstrate their overall performance in an integrated system called the National Combustion Code.
Tough2{_}MP: A parallel version of TOUGH2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris

2003-04-09

TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
A method for data handling numerical results in parallel OpenFOAM simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anton, Alin; Muntean, Sebastian

Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
Overview of ICE Project: Integration of Computational Fluid Dynamics and Experiments

NASA Technical Reports Server (NTRS)

Stegeman, James D.; Blech, Richard A.; Babrauckas, Theresa L.; Jones, William H.

2001-01-01

Researchers at the NASA Glenn Research Center have developed a prototype integrated environment for interactively exploring, analyzing, and validating information from computational fluid dynamics (CFD) computations and experiments. The Integrated CFD and Experiments (ICE) project is a first attempt at providing a researcher with a common user interface for control, manipulation, analysis, and data storage for both experiments and simulation. ICE can be used as a live, on-tine system that displays and archives data as they are gathered; as a postprocessing system for dataset manipulation and analysis; and as a control interface or "steering mechanism" for simulation codes while visualizing the results. Although the full capabilities of ICE have not been completely demonstrated, this report documents the current system. Various applications of ICE are discussed: a low-speed compressor, a supersonic inlet, real-time data visualization, and a parallel-processing simulation code interface. A detailed data model for the compressor application is included in the appendix.
Aerodynamic Shape Optimization of Supersonic Aircraft Configurations via an Adjoint Formulation on Parallel Computers

NASA Technical Reports Server (NTRS)

Reuther, James; Alonso, Juan Jose; Rimlinger, Mark J.; Jameson, Antony

1996-01-01

This work describes the application of a control theory-based aerodynamic shape optimization method to the problem of supersonic aircraft design. The design process is greatly accelerated through the use of both control theory and a parallel implementation on distributed memory computers. Control theory is employed to derive the adjoint differential equations whose solution allows for the evaluation of design gradient information at a fraction of the computational cost required by previous design methods (13, 12, 44, 38). The resulting problem is then implemented on parallel distributed memory architectures using a domain decomposition approach, an optimized communication schedule, and the MPI (Message Passing Interface) Standard for portability and efficiency. The final result achieves very rapid aerodynamic design based on higher order computational fluid dynamics methods (CFD). In our earlier studies, the serial implementation of this design method (19, 20, 21, 23, 39, 25, 40, 41, 42, 43, 9) was shown to be effective for the optimization of airfoils, wings, wing-bodies, and complex aircraft configurations using both the potential equation and the Euler equations (39, 25). In our most recent paper, the Euler method was extended to treat complete aircraft configurations via a new multiblock implementation. Furthermore, during the same conference, we also presented preliminary results demonstrating that the basic methodology could be ported to distributed memory parallel computing architectures [241. In this paper, our concem will be to demonstrate that the combined power of these new technologies can be used routinely in an industrial design environment by applying it to the case study of the design of typical supersonic transport configurations. A particular difficulty of this test case is posed by the propulsion/airframe integration.
Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

NASA Astrophysics Data System (ADS)

Newman, Gregory A.; Commer, Michael

2009-07-01

Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
Application of a single-fluid model for the steam condensing flow prediction

NASA Astrophysics Data System (ADS)

Smołka, K.; Dykas, S.; Majkut, M.; Strozik, M.

2016-10-01

One of the results of many years of research conducted in the Institute of Power Engineering and Turbomachinery of the Silesian University of Technology are computational algorithms for modelling steam flows with a non-equilibrium condensation process. In parallel with theoretical and numerical research, works were also started on experimental testing of the steam condensing flow. This paper presents a comparison of calculations of a flow field modelled by means of a single-fluid model using both an in-house CFD code and the commercial Ansys CFX v16.2 software package. The calculation results are compared to inhouse experimental testing.
A scalable nonlinear fluid-structure interaction solver based on a Schwarz preconditioner with isogeometric unstructured coarse spaces in 3D

NASA Astrophysics Data System (ADS)

Kong, Fande; Cai, Xiao-Chuan

2017-07-01

Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear in many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexact Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here "geometry" includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.
The implementation of an aeronautical CFD flow code onto distributed memory parallel systems

NASA Astrophysics Data System (ADS)

Ierotheou, C. S.; Forsey, C. R.; Leatham, M.

2000-04-01

The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

NASA Astrophysics Data System (ADS)

Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

2017-05-01

We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.

Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak

2003-01-01

The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids

NASA Technical Reports Server (NTRS)

Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)

2001-01-01

This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
CFD research, parallel computation and aerodynamic optimization

NASA Technical Reports Server (NTRS)

Ryan, James S.

1995-01-01

Over five years of research in Computational Fluid Dynamics and its applications are covered in this report. Using CFD as an established tool, aerodynamic optimization on parallel architectures is explored. The objective of this work is to provide better tools to vehicle designers. Submarine design requires accurate force and moment calculations in flow with thick boundary layers and large separated vortices. Low noise production is critical, so flow into the propulsor region must be predicted accurately. The High Speed Civil Transport (HSCT) has been the subject of recent work. This vehicle is to be a passenger vehicle with the capability of cutting overseas flight times by more than half. A successful design must surpass the performance of comparable planes. Fuel economy, other operational costs, environmental impact, and range must all be improved substantially. For all these reasons, improved design tools are required, and these tools must eventually integrate optimization, external aerodynamics, propulsion, structures, heat transfer and other disciplines.
Simulation of dilute polymeric fluids in a three-dimensional contraction using a multiscale FENE model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Griebel, M., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de; Rüttgers, A., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de

The multiscale FENE model is applied to a 3D square-square contraction flow problem. For this purpose, the stochastic Brownian configuration field method (BCF) has been coupled with our fully parallelized three-dimensional Navier-Stokes solver NaSt3DGPF. The robustness of the BCF method enables the numerical simulation of high Deborah number flows for which most macroscopic methods suffer from stability issues. The results of our simulations are compared with that of experimental measurements from literature and show a very good agreement. In particular, flow phenomena such as a strong vortex enhancement, streamline divergence and a flow inversion for highly elastic flows are reproduced.more » Due to their computational complexity, our simulations require massively parallel computations. Using a domain decomposition approach with MPI, the implementation achieves excellent scale-up results for up to 128 processors.« less
Programming the Navier-Stokes computer: An abstract machine model and a visual editor

NASA Technical Reports Server (NTRS)

Middleton, David; Crockett, Tom; Tomboulian, Sherry

1988-01-01

The Navier-Stokes computer is a parallel computer designed to solve Computational Fluid Dynamics problems. Each processor contains several floating point units which can be configured under program control to implement a vector pipeline with several inputs and outputs. Since the development of an effective compiler for this computer appears to be very difficult, machine level programming seems necessary and support tools for this process have been studied. These support tools are organized into a graphical program editor. A programming process is described by which appropriate computations may be efficiently implemented on the Navier-Stokes computer. The graphical editor would support this programming process, verifying various programmer choices for correctness and deducing values such as pipeline delays and network configurations. Step by step details are provided and demonstrated with two example programs.
Particle-mesh techniques

NASA Technical Reports Server (NTRS)

Macneice, Peter

1995-01-01

This is an introduction to numerical Particle-Mesh techniques, which are commonly used to model plasmas, gravitational N-body systems, and both compressible and incompressible fluids. The theory behind this approach is presented, and its practical implementation, both for serial and parallel machines, is discussed. This document is based on a four-hour lecture course presented by the author at the NASA Summer School for High Performance Computational Physics, held at Goddard Space Flight Center.
A fast non-Fourier method for Landau-fluid operators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dimits, A. M., E-mail: dimits1@llnl.gov; Joseph, I.; Umansky, M. V.

An efficient and versatile non-Fourier method for the computation of Landau-fluid (LF) closure operators [Hammett and Perkins, Phys. Rev. Lett. 64, 3019 (1990)] is presented, based on an approximation by a sum of modified-Helmholtz-equation solves (SMHS) in configuration space. This method can yield fast-Fourier-like scaling of the computational time requirements and also provides a very compact data representation of these operators, even for plasmas with large spatial nonuniformity. As a result, the method can give significant savings compared with direct application of “delocalization kernels” [e.g., Schurtz et al., Phys. Plasmas 7, 4238 (2000)], both in terms of computational cost andmore » memory requirements. The method is of interest for the implementation of Landau-fluid models in situations where the spatial nonuniformity, particular geometry, or boundary conditions render a Fourier implementation difficult or impossible. Systematic procedures have been developed to optimize the resulting operators for accuracy and computational cost. The four-moment Landau-fluid model of Hammett and Perkins has been implemented in the BOUT++ code using the SMHS method for LF closure. Excellent agreement has been obtained for the one-dimensional plasma density response function between driven initial-value calculations using this BOUT++ implementation and matrix eigenvalue calculations using both Fourier and SMHS non-Fourier implementations of the LF closures. The SMHS method also forms the basis for the implementation, which has been carried out in the BOUT++ code, of the parallel and toroidal drift-resonance LF closures. The method is a key enabling tool for the extension of gyro-Landau-fluid models [e.g., Beer and Hammett, Phys. Plasmas 3, 4046 (1996)] to codes that treat regions with strong profile variation, such as the tokamak edge and scrapeoff-layer.« less
A fast non-Fourier method for Landau-fluid operatorsa)

NASA Astrophysics Data System (ADS)

Dimits, A. M.; Joseph, I.; Umansky, M. V.

2014-05-01

An efficient and versatile non-Fourier method for the computation of Landau-fluid (LF) closure operators [Hammett and Perkins, Phys. Rev. Lett. 64, 3019 (1990)] is presented, based on an approximation by a sum of modified-Helmholtz-equation solves (SMHS) in configuration space. This method can yield fast-Fourier-like scaling of the computational time requirements and also provides a very compact data representation of these operators, even for plasmas with large spatial nonuniformity. As a result, the method can give significant savings compared with direct application of "delocalization kernels" [e.g., Schurtz et al., Phys. Plasmas 7, 4238 (2000)], both in terms of computational cost and memory requirements. The method is of interest for the implementation of Landau-fluid models in situations where the spatial nonuniformity, particular geometry, or boundary conditions render a Fourier implementation difficult or impossible. Systematic procedures have been developed to optimize the resulting operators for accuracy and computational cost. The four-moment Landau-fluid model of Hammett and Perkins has been implemented in the BOUT++ code using the SMHS method for LF closure. Excellent agreement has been obtained for the one-dimensional plasma density response function between driven initial-value calculations using this BOUT++ implementation and matrix eigenvalue calculations using both Fourier and SMHS non-Fourier implementations of the LF closures. The SMHS method also forms the basis for the implementation, which has been carried out in the BOUT++ code, of the parallel and toroidal drift-resonance LF closures. The method is a key enabling tool for the extension of gyro-Landau-fluid models [e.g., Beer and Hammett, Phys. Plasmas 3, 4046 (1996)] to codes that treat regions with strong profile variation, such as the tokamak edge and scrapeoff-layer.
A bioconvection model for a squeezing flow of nanofluid between parallel plates in the presence of gyrotactic microorganisms

NASA Astrophysics Data System (ADS)

Bin-Mohsin, Bandar; Ahmed, Naveed; Adnan; Khan, Umar; Tauseef Mohyud-Din, Syed

2017-04-01

This article deals with the bioconvection flow in a parallel-plate channel. The plates are parallel and the flowing fluid is saturated with nanoparticles, and water is considered as a base fluid because microorganisms can survive only in water. A highly nonlinear and coupled system of partial differential equations presenting the model of bioconvection flow between parallel plates is reduced to a nonlinear and coupled system (nondimensional bioconvection flow model) of ordinary differential equations with the help of feasible nondimensional variables. In order to find the convergent solution of the system, a semi-analytical technique is utilized called variation of parameters method (VPM). Numerical solution is also computed and the Runge-Kutta scheme of fourth order is employed for this purpose. Comparison between these solutions has been made on the domain of interest and found to be in excellent agreement. Also, influence of various parameters has been discussed for the nondimensional velocity, temperature, concentration and density of the motile microorganisms both for suction and injection cases. Almost inconsequential influence of thermophoretic and Brownian motion parameters on the temperature field is observed. An interesting variation are inspected for the density of the motile microorganisms due to the varying bioconvection parameter in suction and injection cases. At the end, we make some concluding remarks in the light of this article.
Multiscale modeling and simulation for polymer melt flows between parallel plates

NASA Astrophysics Data System (ADS)

Yasuda, Shugo; Yamamoto, Ryoichi

2010-03-01

The flow behaviors of polymer melt composed of short chains with ten beads between parallel plates are simulated by using a hybrid method of molecular dynamics and computational fluid dynamics. Three problems are solved: creep motion under a constant shear stress and its recovery motion after removing the stress, pressure-driven flows, and the flows in rapidly oscillating plates. In the creep/recovery problem, the delayed elastic deformation in the creep motion and evident elastic behavior in the recovery motion are demonstrated. The velocity profiles of the melt in pressure-driven flows are quite different from those of Newtonian fluid due to shear thinning. Velocity gradients of the melt become steeper near the plates and flatter at the middle between the plates as the pressure gradient increases and the temperature decreases. In the rapidly oscillating plates, the viscous boundary layer of the melt is much thinner than that of Newtonian fluid due to the shear thinning of the melt. Three different rheological regimes, i.e., the viscous fluid, viscoelastic liquid, and viscoelastic solid regimes, form over the oscillating plate according to the local Deborah numbers. The melt behaves as a viscous fluid in a region for ωτR≲1 , and the crossover between the liquidlike and solidlike regime takes place around ωτα≃1 (where ω is the angular frequency of the plate and τR and τα are Rouse and α relaxation time, respectively).
Multiscale modeling and simulation for polymer melt flows between parallel plates.

PubMed

Yasuda, Shugo; Yamamoto, Ryoichi

2010-03-01

The flow behaviors of polymer melt composed of short chains with ten beads between parallel plates are simulated by using a hybrid method of molecular dynamics and computational fluid dynamics. Three problems are solved: creep motion under a constant shear stress and its recovery motion after removing the stress, pressure-driven flows, and the flows in rapidly oscillating plates. In the creep/recovery problem, the delayed elastic deformation in the creep motion and evident elastic behavior in the recovery motion are demonstrated. The velocity profiles of the melt in pressure-driven flows are quite different from those of Newtonian fluid due to shear thinning. Velocity gradients of the melt become steeper near the plates and flatter at the middle between the plates as the pressure gradient increases and the temperature decreases. In the rapidly oscillating plates, the viscous boundary layer of the melt is much thinner than that of Newtonian fluid due to the shear thinning of the melt. Three different rheological regimes, i.e., the viscous fluid, viscoelastic liquid, and viscoelastic solid regimes, form over the oscillating plate according to the local Deborah numbers. The melt behaves as a viscous fluid in a region for omegatauR < approximately 1 , and the crossover between the liquidlike and solidlike regime takes place around omegataualpha approximately equal 1 (where omega is the angular frequency of the plate and tauR and taualpha are Rouse and alpha relaxation time, respectively).
Numerical Simulation of Flow Field Within Parallel Plate Plastometer

NASA Technical Reports Server (NTRS)

Antar, Basil N.

2002-01-01

Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

NASA Astrophysics Data System (ADS)

Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji

2013-03-01

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
Computational Fluid Dynamics of Whole-Body Aircraft

NASA Astrophysics Data System (ADS)

Agarwal, Ramesh

1999-01-01

The current state of the art in computational aerodynamics for whole-body aircraft flowfield simulations is described. Recent advances in geometry modeling, surface and volume grid generation, and flow simulation algorithms have led to accurate flowfield predictions for increasingly complex and realistic configurations. As a result, computational aerodynamics has emerged as a crucial enabling technology for the design and development of flight vehicles. Examples illustrating the current capability for the prediction of transport and fighter aircraft flowfields are presented. Unfortunately, accurate modeling of turbulence remains a major difficulty in the analysis of viscosity-dominated flows. In the future, inverse design methods, multidisciplinary design optimization methods, artificial intelligence technology, and massively parallel computer technology will be incorporated into computational aerodynamics, opening up greater opportunities for improved product design at substantially reduced costs.
ICASE semiannual report, April 1 - September 30, 1989

NASA Technical Reports Server (NTRS)

1990-01-01

The Institute conducts unclassified basic research in applied mathematics, numerical analysis, and computer science in order to extend and improve problem-solving capabilities in science and engineering, particularly in aeronautics and space. The major categories of the current Institute for Computer Applications in Science and Engineering (ICASE) research program are: (1) numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; (2) control and parameter identification problems, with emphasis on effective numerical methods; (3) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and (4) computer systems and software, especially vector and parallel computers. ICASE reports are considered to be primarily preprints of manuscripts that have been submitted to appropriate research journals or that are to appear in conference proceedings.
IMACS 󈨟: Proceedings of the IMACS World Congress on Computation and Applied Mathematics (13th) Held in Dublin, Ireland on July 22-26, 1991. Volume 2. Computational Fluid Dynamics and Wave Propagation, Parallel Computing, Concurrent and Supercomputing, Computational Physics/Computational Chemistry and Evolutionary Systems

DTIC Science & Technology

1991-01-01

Investigations of uniform convergence Sorption gel5ster Stoffe in porisen Medien (in German). Ver- are-in progress. lag P -Lang, Frankfurt/M., 1991 (in press...ad- sorption terms. Numerical results for solute transport with instantaneous, /it + 01(p): - q(p)#= 0, X > 0, t > 0. (5) nonlinear adsorption-are...13 ] A, S113S = (PfIJl) 8Iax, S1136= m A, y H137=- pf A, v 138 -f A, IIso l r opic-visco° c ousic- Y 139 im A, z-pliole (C’ilz) y 1141 = [ PfA 𔃻 + P
Heterogeneous scalable framework for multiphase flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morris, Karla Vanessa

2013-09-01

Two categories of challenges confront the developer of computational spray models: those related to the computation and those related to the physics. Regarding the computation, the trend towards heterogeneous, multi- and many-core platforms will require considerable re-engineering of codes written for the current supercomputing platforms. Regarding the physics, accurate methods for transferring mass, momentum and energy from the dispersed phase onto the carrier fluid grid have so far eluded modelers. Significant challenges also lie at the intersection between these two categories. To be competitive, any physics model must be expressible in a parallel algorithm that performs well on evolving computermore » platforms. This work created an application based on a software architecture where the physics and software concerns are separated in a way that adds flexibility to both. The develop spray-tracking package includes an application programming interface (API) that abstracts away the platform-dependent parallelization concerns, enabling the scientific programmer to write serial code that the API resolves into parallel processes and threads of execution. The project also developed the infrastructure required to provide similar APIs to other application. The API allow object-oriented Fortran applications direct interaction with Trilinos to support memory management of distributed objects in central processing units (CPU) and graphic processing units (GPU) nodes for applications using C++.« less
Numerical Simulation of Transit-Time Ultrasonic Flowmeters by a Direct Approach.

PubMed

Luca, Adrian; Marchiano, Regis; Chassaing, Jean-Camille

2016-06-01

This paper deals with the development of a computational code for the numerical simulation of wave propagation through domains with a complex geometry consisting in both solids and moving fluids. The emphasis is on the numerical simulation of ultrasonic flowmeters (UFMs) by modeling the wave propagation in solids with the equations of linear elasticity (ELE) and in fluids with the linearized Euler equations (LEEs). This approach requires high performance computing because of the high number of degrees of freedom and the long propagation distances. Therefore, the numerical method should be chosen with care. In order to minimize the numerical dissipation which may occur in this kind of configuration, the numerical method employed here is the nodal discontinuous Galerkin (DG) method. Also, this method is well suited for parallel computing. To speed up the code, almost all the computational stages have been implemented to run on graphical processing unit (GPU) by using the compute unified device architecture (CUDA) programming model from NVIDIA. This approach has been validated and then used for the two-dimensional simulation of gas UFMs. The large contrast of acoustic impedance characteristic to gas UFMs makes their simulation a real challenge.
Application of CHAD hydrodynamics to shock-wave problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Trease, H.E.; O`Rourke, P.J.; Sahota, M.S.

1997-12-31

CHAD is the latest in a sequence of continually evolving computer codes written to effectively utilize massively parallel computer architectures and the latest grid generators for unstructured meshes. Its applications range from automotive design issues such as in-cylinder and manifold flows of internal combustion engines, vehicle aerodynamics, underhood cooling and passenger compartment heating, ventilation, and air conditioning to shock hydrodynamics and materials modeling. CHAD solves the full unsteady Navier-Stoke equations with the k-epsilon turbulence model in three space dimensions. The code has four major features that distinguish it from the earlier KIVA code, also developed at Los Alamos. First, itmore » is based on a node-centered, finite-volume method in which, like finite element methods, all fluid variables are located at computational nodes. The computational mesh efficiently and accurately handles all element shapes ranging from tetrahedra to hexahedra. Second, it is written in standard Fortran 90 and relies on automatic domain decomposition and a universal communication library written in standard C and MPI for unstructured grids to effectively exploit distributed-memory parallel architectures. Thus the code is fully portable to a variety of computing platforms such as uniprocessor workstations, symmetric multiprocessors, clusters of workstations, and massively parallel platforms. Third, CHAD utilizes a variable explicit/implicit upwind method for convection that improves computational efficiency in flows that have large velocity Courant number variations due to velocity of mesh size variations. Fourth, CHAD is designed to also simulate shock hydrodynamics involving multimaterial anisotropic behavior under high shear. The authors will discuss CHAD capabilities and show several sample calculations showing the strengths and weaknesses of CHAD.« less
ms2: A molecular simulation tool for thermodynamic properties

NASA Astrophysics Data System (ADS)

Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

2011-11-01

This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.

Lattice Boltzmann modeling of transport phenomena in fuel cells and flow batteries

NASA Astrophysics Data System (ADS)

Xu, Ao; Shyy, Wei; Zhao, Tianshou

2017-06-01

Fuel cells and flow batteries are promising technologies to address climate change and air pollution problems. An understanding of the complex multiscale and multiphysics transport phenomena occurring in these electrochemical systems requires powerful numerical tools. Over the past decades, the lattice Boltzmann (LB) method has attracted broad interest in the computational fluid dynamics and the numerical heat transfer communities, primarily due to its kinetic nature making it appropriate for modeling complex multiphase transport phenomena. More importantly, the LB method fits well with parallel computing due to its locality feature, which is required for large-scale engineering applications. In this article, we review the LB method for gas-liquid two-phase flows, coupled fluid flow and mass transport in porous media, and particulate flows. Examples of applications are provided in fuel cells and flow batteries. Further developments of the LB method are also outlined.
CFD analysis of hypersonic, chemically reacting flow fields

NASA Technical Reports Server (NTRS)

Edwards, T. A.

1993-01-01

Design studies are underway for a variety of hypersonic flight vehicles. The National Aero-Space Plane will provide a reusable, single-stage-to-orbit capability for routine access to low earth orbit. Flight-capable satellites will dip into the atmosphere to maneuver to new orbits, while planetary probes will decelerate at their destination by atmospheric aerobraking. To supplement limited experimental capabilities in the hypersonic regime, computational fluid dynamics (CFD) is being used to analyze the flow about these configurations. The governing equations include fluid dynamic as well as chemical species equations, which are being solved with new, robust numerical algorithms. Examples of CFD applications to hypersonic vehicles suggest an important role this technology will play in the development of future aerospace systems. The computational resources needed to obtain solutions are large, but solution adaptive grids, convergence acceleration, and parallel processing may make run times manageable.
Optimal dynamic remapping of parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.; Reynolds, Paul F., Jr.

1987-01-01

A large class of computations are characterized by a sequence of phases, with phase changes occurring unpredictably. The decision problem was considered regarding the remapping of workload to processors in a parallel computation when the utility of remapping and the future behavior of the workload is uncertain, and phases exhibit stable execution requirements during a given phase, but requirements may change radically between phases. For these problems a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The fundamental problem of balancing the expected remapping performance gain against the delay cost was addressed. Stochastic dynamic programming is used to show that the remapping decision policy minimizing the expected running time of the computation has an extremely simple structure. Because the gain may not be predictable, the performance of a heuristic policy that does not require estimnation of the gain is examined. The heuristic method's feasibility is demonstrated by its use on an adaptive fluid dynamics code on a multiprocessor. The results suggest that except in extreme cases, the remapping decision problem is essentially that of dynamically determining whether gain can be achieved by remapping after a phase change. The results also suggest that this heuristic is applicable to computations with more than two phases.
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
HIGH-FIDELITY SIMULATION-DRIVEN MODEL DEVELOPMENT FOR COARSE-GRAINED COMPUTATIONAL FLUID DYNAMICS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hanna, Botros N.; Dinh, Nam T.; Bolotnov, Igor A.

Nuclear reactor safety analysis requires identifying various credible accident scenarios and determining their consequences. For a full-scale nuclear power plant system behavior, it is impossible to obtain sufficient experimental data for a broad range of risk-significant accident scenarios. In single-phase flow convective problems, Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES) can provide us with high fidelity results when physical data are unavailable. However, these methods are computationally expensive and cannot be afforded for simulation of long transient scenarios in nuclear accidents despite extraordinary advances in high performance scientific computing over the past decades. The major issue is themore » inability to make the transient computation parallel, thus making number of time steps required in high-fidelity methods unaffordable for long transients. In this work, we propose to apply a high fidelity simulation-driven approach to model sub-grid scale (SGS) effect in Coarse Grained Computational Fluid Dynamics CG-CFD. This approach aims to develop a statistical surrogate model instead of the deterministic SGS model. We chose to start with a turbulent natural convection case with volumetric heating in a horizontal fluid layer with a rigid, insulated lower boundary and isothermal (cold) upper boundary. This scenario of unstable stratification is relevant to turbulent natural convection in a molten corium pool during a severe nuclear reactor accident, as well as in containment mixing and passive cooling. The presented approach demonstrates how to create a correction for the CG-CFD solution by modifying the energy balance equation. A global correction for the temperature equation proves to achieve a significant improvement to the prediction of steady state temperature distribution through the fluid layer.« less
Generalized Fluid System Simulation Program, Version 6.0

NASA Technical Reports Server (NTRS)

Majumdar, A. K.; LeClair, A. C.; Moore, R.; Schallhorn, P. A.

2016-01-01

The Generalized Fluid System Simulation Program (GFSSP) is a general purpose computer program for analyzing steady state and time-dependent flow rates, pressures, temperatures, and concentrations in a complex flow network. The program is capable of modeling real fluids with phase changes, compressibility, mixture thermodynamics, conjugate heat transfer between solid and fluid, fluid transients, pumps, compressors, and external body forces such as gravity and centrifugal. The thermofluid system to be analyzed is discretized into nodes, branches, and conductors. The scalar properties such as pressure, temperature, and concentrations are calculated at nodes. Mass flow rates and heat transfer rates are computed in branches and conductors. The graphical user interface allows users to build their models using the 'point, drag, and click' method; the users can also run their models and post-process the results in the same environment. Two thermodynamic property programs (GASP/WASP and GASPAK) provide required thermodynamic and thermophysical properties for 36 fluids: helium, methane, neon, nitrogen, carbon monoxide, oxygen, argon, carbon dioxide, fluorine, hydrogen, parahydrogen, water, kerosene (RP-1), isobutene, butane, deuterium, ethane, ethylene, hydrogen sulfide, krypton, propane, xenon, R-11, R-12, R-22, R-32, R-123, R-124, R-125, R-134A, R-152A, nitrogen trifluoride, ammonia, hydrogen peroxide, and air. The program also provides the options of using any incompressible fluid with constant density and viscosity or ideal gas. The users can also supply property tables for fluids that are not in the library. Twenty-four different resistance/source options are provided for modeling momentum sources or sinks in the branches. These options include pipe flow, flow through a restriction, noncircular duct, pipe flow with entrance and/or exit losses, thin sharp orifice, thick orifice, square edge reduction, square edge expansion, rotating annular duct, rotating radial duct, labyrinth seal, parallel plates, common fittings and valves, pump characteristics, pump power, valve with a given loss coefficient, Joule-Thompson device, control valve, heat exchanger core, parallel tube, and compressible orifice. The program has the provision of including additional resistance options through User Subroutines. GFSSP employs a finite volume formulation of mass, momentum, and energy conservation equations in conjunction with the thermodynamic equations of state for real fluids as well as energy conservation equations for the solid. The system of equations describing the fluid network is solved by a hybrid numerical method that is a combination of the Newton-Raphson and successive substitution methods. The application and verification of the code has been demonstrated through 30 example problems.
Computational Fluid Dynamic Solutions of Optimized Heat Shields Designed for Earth Entry

DTIC Science & Technology

2010-01-01

domain ρ = Density (kg/m3) σ = Stefan Boltzmann constant τ = Shear stress tensor τT−V = T-V relaxation time τe−V = e-V relaxation time xi φ = Sweep angle...Vehicle DES = Differential evolutionary Scheme DOR = Design Optimization Tools DPLR = Data Parallel Line Relaxation GSLR = Gauss- Seidel Line... Stefan - Boltzmann constant. This model provides accurate heating predictions, especially for the non-ablating heat-shields explored in this work. Various
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)

2002-01-01

This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Fractal Viscous Fingering in Fracture Networks

NASA Astrophysics Data System (ADS)

Boyle, E.; Sams, W.; Ferer, M.; Smith, D. H.

2007-12-01

We have used two very different physical models and computer codes to study miscible injection of a low- viscosity fluid into a simple fracture network, where it displaces a much-more viscous "defending" fluid through "rock" that is otherwise impermeable. The one code (NETfLow) is a standard pore level model, originally intended to treat laboratory-scale experiments; it assumes negligible mixing of the two fluids. The other code (NFFLOW) was written to treat reservoir-scale engineering problems; It explicitly treats the flow through the fractures and allows for significant mixing of the fluids at the interface. Both codes treat the fractures as parallel plates, of different effective apertures. Results are presented for the composition profiles from both codes. Independent of the degree of fluid-mixing, the profiles from both models have a functional form identical to that for fractal viscous fingering (i.e., diffusion limited aggregation, DLA). The two codes that solve the equations for different models gave similar results; together they suggest that the injection of a low-viscosity fluid into large- scale fracture networks may be much more significantly affected by fractal fingering than previously illustrated.
Collectively loading an application in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
Numerical study of the vortex tube reconnection using vortex particle method on many graphics cards

NASA Astrophysics Data System (ADS)

Kudela, Henryk; Kosior, Andrzej

2014-08-01

Vortex Particle Methods are one of the most convenient ways of tracking the vorticity evolution. In the article we presented numerical recreation of the real life experiment concerning head-on collision of two vortex rings. In the experiment the evolution and reconnection of the vortex structures is tracked with passive markers (paint particles) which in viscous fluid does not follow the evolution of vorticity field. In numerical computations we showed the difference between vorticity evolution and movement of passive markers. The agreement with the experiment was very good. Due to problems with very long time of computations on a single processor the Vortex-in-Cell method was implemented on the multicore architecture of the graphics cards (GPUs). Vortex Particle Methods are very well suited for parallel computations. As there are myriads of particles in the flow and for each of them the same equations of motion have to be solved the SIMD architecture used in GPUs seems to be perfect. The main disadvantage in this case is the small amount of the RAM memory. To overcome this problem we created a multiGPU implementation of the VIC method. Some remarks on parallel computing are given in the article.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

NASA Astrophysics Data System (ADS)

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten

2017-11-01

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems

NASA Technical Reports Server (NTRS)

Follen, Gregory J.; Lytle, John K. (Technical Monitor)

2002-01-01

Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT). This paper discusses the salient features of the NPSS Architecture including its interface layer, object layer, implementation for accessing legacy codes, numerical zooming infrastructure and its computing layer. The computing layer focuses on the use and deployment of these propulsion simulations on parallel and distributed computing platforms which has been the focus of NASA Ames. Additional features of the object oriented architecture that support MultiDisciplinary (MD) Coupling, computer aided design (CAD) access and MD coupling objects will be discussed. Included will be a discussion of the successes, challenges and benefits of implementing this architecture.
Magnetophoretic circuits for digital control of single particles and cells

NASA Astrophysics Data System (ADS)

Lim, Byeonghwa; Reddy, Venu; Hu, Xinghao; Kim, Kunwoo; Jadhav, Mital; Abedini-Nassab, Roozbeh; Noh, Young-Woock; Lim, Yong Taik; Yellen, Benjamin B.; Kim, Cheolgi

2014-05-01

The ability to manipulate small fluid droplets, colloidal particles and single cells with the precision and parallelization of modern-day computer hardware has profound applications for biochemical detection, gene sequencing, chemical synthesis and highly parallel analysis of single cells. Drawing inspiration from general circuit theory and magnetic bubble technology, here we demonstrate a class of integrated circuits for executing sequential and parallel, timed operations on an ensemble of single particles and cells. The integrated circuits are constructed from lithographically defined, overlaid patterns of magnetic film and current lines. The magnetic patterns passively control particles similar to electrical conductors, diodes and capacitors. The current lines actively switch particles between different tracks similar to gated electrical transistors. When combined into arrays and driven by a rotating magnetic field clock, these integrated circuits have general multiplexing properties and enable the precise control of magnetizable objects.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Menon, Suresh

1996-01-01

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Multidisciplinary High-Fidelity Analysis and Optimization of Aerospace Vehicles. Part 1; Formulation

NASA Technical Reports Server (NTRS)

Walsh, J. L.; Townsend, J. C.; Salas, A. O.; Samareh, J. A.; Mukhopadhyay, V.; Barthelemy, J.-F.

2000-01-01

An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity, finite element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a highspeed civil transport configuration. The paper describes the engineering aspects of formulating the optimization by integrating these analysis codes and associated interface codes into the system. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture (CORBA) compliant software product. A companion paper presents currently available results.
Multidisciplinary High-Fidelity Analysis and Optimization of Aerospace Vehicles. Part 2; Preliminary Results

NASA Technical Reports Server (NTRS)

Walsh, J. L.; Weston, R. P.; Samareh, J. A.; Mason, B. H.; Green, L. L.; Biedron, R. T.

2000-01-01

An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity finite-element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a high-speed civil transport configuration. The paper describes both the preliminary results from implementing and validating the multidisciplinary analysis and the results from an aerodynamic optimization. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture compliant software product. A companion paper describes the formulation of the multidisciplinary analysis and optimization system.
Effect of asynchrony on numerical simulations of fluid flow phenomena

NASA Astrophysics Data System (ADS)

Konduri, Aditya; Mahoney, Bryan; Donzis, Diego

2015-11-01

Designing scalable CFD codes on massively parallel computers is a challenge. This is mainly due to the large number of communications between processing elements (PEs) and their synchronization, leading to idling of PEs. Indeed, communication will likely be the bottleneck in the scalability of codes on Exascale machines. Our recent work on asynchronous computing for PDEs based on finite-differences has shown that it is possible to relax synchronization between PEs at a mathematical level. Computations then proceed regardless of the status of communication, reducing the idle time of PEs and improving the scalability. However, accuracy of the schemes is greatly affected. We have proposed asynchrony-tolerant (AT) schemes to address this issue. In this work, we study the effect of asynchrony on the solution of fluid flow problems using standard and AT schemes. We show that asynchrony creates additional scales with low energy content. The specific wavenumbers affected can be shown to be due to two distinct effects: the randomness in the arrival of messages and the corresponding switching between schemes. Understanding these errors allow us to effectively control them, rendering the method's feasibility in solving turbulent flows at realistic conditions on future computing systems.
On the computational modeling of the viscosity of colloidal dispersions and its relation with basic molecular interactions

NASA Astrophysics Data System (ADS)

Gama Goicochea, A.; Balderas Altamirano, M. A.; Lopez-Esparza, R.; Waldo-Mendoza, Miguel A.; Perez, E.

2015-09-01

The connection between fundamental interactions acting in molecules in a fluid and macroscopically measured properties, such as the viscosity between colloidal particles coated with polymers, is studied here. The role that hydrodynamic and Brownian forces play in colloidal dispersions is also discussed. It is argued that many-body systems in which all these interactions take place can be accurately solved using computational simulation tools. One of those modern tools is the technique known as dissipative particle dynamics, which incorporates Brownian and hydrodynamic forces, as well as basic conservative interactions. A case study is reported, as an example of the applications of this technique, which consists of the prediction of the viscosity and friction between two opposing parallel surfaces covered with polymer chains, under the influence of a steady flow. This work is intended to serve as an introduction to the subject of colloidal dispersions and computer simulations, for final-year undergraduate students and beginning graduate students who are interested in beginning research in soft matter systems. To that end, a computational code is included that students can use right away to study complex fluids in equilibrium.

Acoustic streaming, fluid mixing, and particle transport by a Gaussian ultrasound beam in a cylindrical container

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marshall, Jeffrey S., E-mail: jeffm@cems.uvm.edu; Wu, Junru

A computational study is reported of the acoustic streaming flow field generated by a Gaussian ultrasound beam propagating normally toward the end wall of a cylindrical container. Particular focus is given to examining the effectiveness of the acoustic streaming flow for fluid mixing within the container, for deposition of particles in suspension onto the bottom surface, and for particle suspension from the bottom surface back into the flow field. The flow field is assumed to be axisymmetric with the ultrasound transducer oriented parallel to the cylinder axis and normal to the bottom surface of the container, which we refer tomore » as the impingement surface. Reflection of the sound from the impingement surface and sound absorption within the material at the container bottom are both accounted for in the computation. The computation also accounts for thermal buoyancy force due to ultrasonic heating of the impingement surface, but over the time period considered in the current simulations, the flow is found to be dominated by the acoustic streaming force, with only moderate effect of buoyancy force.« less
Acoustic streaming, fluid mixing, and particle transport by a Gaussian ultrasound beam in a cylindrical container

NASA Astrophysics Data System (ADS)

Marshall, Jeffrey S.; Wu, Junru

2015-10-01

A computational study is reported of the acoustic streaming flow field generated by a Gaussian ultrasound beam propagating normally toward the end wall of a cylindrical container. Particular focus is given to examining the effectiveness of the acoustic streaming flow for fluid mixing within the container, for deposition of particles in suspension onto the bottom surface, and for particle suspension from the bottom surface back into the flow field. The flow field is assumed to be axisymmetric with the ultrasound transducer oriented parallel to the cylinder axis and normal to the bottom surface of the container, which we refer to as the impingement surface. Reflection of the sound from the impingement surface and sound absorption within the material at the container bottom are both accounted for in the computation. The computation also accounts for thermal buoyancy force due to ultrasonic heating of the impingement surface, but over the time period considered in the current simulations, the flow is found to be dominated by the acoustic streaming force, with only moderate effect of buoyancy force.
Cumulative Reports and Publications through December 31, 1992

DTIC Science & Technology

1993-04-01

periodls of time , and by consultants. Members of NASA’s research staff also may be residenit at l( ASE for limited pieriodls. The major categories of the...To appear in Theoretical and Computational Fluid Dynamics. Nicol, David M.: Optimistic Barricr Synchronization . I(CASE Report No. 92-34, July 27, 1992...continuous time Markov chains. ICASE Report No. 92-60, November 18, 1992, 23 pages. Submitted to the 7th Annual Workshop on Parallel and Distributed
National Combustion Code, a Multidisciplinary Combustor Design System, Will Be Transferred to the Commercial Sector

NASA Technical Reports Server (NTRS)

Steele, Gynelle C.

1999-01-01

The NASA Lewis Research Center and Flow Parametrics will enter into an agreement to commercialize the National Combustion Code (NCC). This multidisciplinary combustor design system utilizes computer-aided design (CAD) tools for geometry creation, advanced mesh generators for creating solid model representations, a common framework for fluid flow and structural analyses, modern postprocessing tools, and parallel processing. This integrated system can facilitate and enhance various phases of the design and analysis process.
Investigation of obstacle effect to improve conjugate heat transfer in backward facing step channel using fast simulation of incompressible flow

NASA Astrophysics Data System (ADS)

Nouri-Borujerdi, Ali; Moazezi, Arash

2018-01-01

The current study investigates the conjugate heat transfer characteristics for laminar flow in backward facing step channel. All of the channel walls are insulated except the lower thick wall under a constant temperature. The upper wall includes a insulated obstacle perpendicular to flow direction. The effect of obstacle height and location on the fluid flow and heat transfer are numerically explored for the Reynolds number in the range of 10 ≤ Re ≤ 300. Incompressible Navier-Stokes and thermal energy equations are solved simultaneously in fluid region by the upwind compact finite difference scheme based on flux-difference splitting in conjunction with artificial compressibility method. In the thick wall, the energy equation is obtained by Laplace equation. A multi-block approach is used to perform parallel computing to reduce the CPU time. Each block is modeled separately by sharing boundary conditions with neighbors. The developed program for modeling was written in FORTRAN language with OpenMP API. The obtained results showed that using of the multi-block parallel computing method is a simple robust scheme with high performance and high-order accurate. Moreover, the obtained results demonstrated that the increment of Reynolds number and obstacle height as well as decrement of horizontal distance between the obstacle and the step improve the heat transfer.
Operationally Efficient Propulsion System Study (OEPSS) Data Book. Volume 8; Integrated Booster Propulsion Module (BPM) Engine Start Dynamics

NASA Technical Reports Server (NTRS)

Kemp, Victoria R.

1992-01-01

A fluid-dynamic, digital-transient computer model of an integrated, parallel propulsion system was developed for the CDC mainframe and the SUN workstation computers. Since all STME component designs were used for the integrated system, computer subroutines were written characterizing the performance and geometry of all the components used in the system, including the manifolds. Three transient analysis reports were completed. The first report evaluated the feasibility of integrated engine systems in regards to the start and cutoff transient behavior. The second report evaluated turbopump out and combined thrust chamber/turbopump out conditions. The third report presented sensitivity study results in staggered gas generator spin start and in pump performance characteristics.
Neural networks for calibration tomography

NASA Technical Reports Server (NTRS)

Decker, Arthur

1993-01-01

Artificial neural networks are suitable for performing pattern-to-pattern calibrations. These calibrations are potentially useful for facilities operations in aeronautics, the control of optical alignment, and the like. Computed tomography is compared with neural net calibration tomography for estimating density from its x-ray transform. X-ray transforms are measured, for example, in diffuse-illumination, holographic interferometry of fluids. Computed tomography and neural net calibration tomography are shown to have comparable performance for a 10 degree viewing cone and 29 interferograms within that cone. The system of tomography discussed is proposed as a relevant test of neural networks and other parallel processors intended for using flow visualization data.
Methods for compressible fluid simulation on GPUs using high-order finite differences

NASA Astrophysics Data System (ADS)

Pekkilä, Johannes; Väisälä, Miikka S.; Käpylä, Maarit J.; Käpylä, Petri J.; Anjum, Omer

2017-08-01

We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.
A Generalized Fluid System Simulation Program to Model Flow Distribution in Fluid Networks

NASA Technical Reports Server (NTRS)

Majumdar, Alok; Bailey, John W.; Schallhorn, Paul; Steadman, Todd

1998-01-01

This paper describes a general purpose computer program for analyzing steady state and transient flow in a complex network. The program is capable of modeling phase changes, compressibility, mixture thermodynamics and external body forces such as gravity and centrifugal. The program's preprocessor allows the user to interactively develop a fluid network simulation consisting of nodes and branches. Mass, energy and specie conservation equations are solved at the nodes; the momentum conservation equations are solved in the branches. The program contains subroutines for computing "real fluid" thermodynamic and thermophysical properties for 33 fluids. The fluids are: helium, methane, neon, nitrogen, carbon monoxide, oxygen, argon, carbon dioxide, fluorine, hydrogen, parahydrogen, water, kerosene (RP-1), isobutane, butane, deuterium, ethane, ethylene, hydrogen sulfide, krypton, propane, xenon, R-11, R-12, R-22, R-32, R-123, R-124, R-125, R-134A, R-152A, nitrogen trifluoride and ammonia. The program also provides the options of using any incompressible fluid with constant density and viscosity or ideal gas. Seventeen different resistance/source options are provided for modeling momentum sources or sinks in the branches. These options include: pipe flow, flow through a restriction, non-circular duct, pipe flow with entrance and/or exit losses, thin sharp orifice, thick orifice, square edge reduction, square edge expansion, rotating annular duct, rotating radial duct, labyrinth seal, parallel plates, common fittings and valves, pump characteristics, pump power, valve with a given loss coefficient, and a Joule-Thompson device. The system of equations describing the fluid network is solved by a hybrid numerical method that is a combination of the Newton-Raphson and successive substitution methods. This paper also illustrates the application and verification of the code by comparison with Hardy Cross method for steady state flow and analytical solution for unsteady flow.
Hierarchical calibration and validation for modeling bench-scale solvent-based carbon capture. Part 1: Non-reactive physical mass transfer across the wetted wall column: Original Research Article: Hierarchical calibration and validation for modeling bench-scale solvent-based carbon capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Chao; Xu, Zhijie; Lai, Canhai

A hierarchical model calibration and validation is proposed for quantifying the confidence level of mass transfer prediction using a computational fluid dynamics (CFD) model, where the solvent-based carbon dioxide (CO2) capture is simulated and simulation results are compared to the parallel bench-scale experimental data. Two unit problems with increasing level of complexity are proposed to breakdown the complex physical/chemical processes of solvent-based CO2 capture into relatively simpler problems to separate the effects of physical transport and chemical reaction. This paper focuses on the calibration and validation of the first unit problem, i.e. the CO2 mass transfer across a falling ethanolaminemore » (MEA) film in absence of chemical reaction. This problem is investigated both experimentally and numerically using nitrous oxide (N2O) as a surrogate for CO2. To capture the motion of gas-liquid interface, a volume of fluid method is employed together with a one-fluid formulation to compute the mass transfer between the two phases. Bench-scale parallel experiments are designed and conducted to validate and calibrate the CFD models using a general Bayesian calibration. Two important transport parameters, e.g. Henry’s constant and gas diffusivity, are calibrated to produce the posterior distributions, which will be used as the input for the second unit problem to address the chemical adsorption of CO2 across the MEA falling film, where both mass transfer and chemical reaction are involved.« less
Branching points in the low-temperature dipolar hard sphere fluid

NASA Astrophysics Data System (ADS)

Rovigatti, Lorenzo; Kantorovich, Sofia; Ivanov, Alexey O.; Tavares, José Maria; Sciortino, Francesco

2013-10-01

In this contribution, we investigate the low-temperature, low-density behaviour of dipolar hard-sphere (DHS) particles, i.e., hard spheres with dipoles embedded in their centre. We aim at describing the DHS fluid in terms of a network of chains and rings (the fundamental clusters) held together by branching points (defects) of different nature. We first introduce a systematic way of classifying inter-cluster connections according to their topology, and then employ this classification to analyse the geometric and thermodynamic properties of each class of defects, as extracted from state-of-the-art equilibrium Monte Carlo simulations. By computing the average density and energetic cost of each defect class, we find that the relevant contribution to inter-cluster interactions is indeed provided by (rare) three-way junctions and by four-way junctions arising from parallel or anti-parallel locally linear aggregates. All other (numerous) defects are either intra-cluster or associated to low cluster-cluster interaction energies, suggesting that these defects do not play a significant part in the thermodynamic description of the self-assembly processes of dipolar hard spheres.
Reviving the shear-free perfect fluid conjecture in general relativity

NASA Astrophysics Data System (ADS)

Sikhonde, Muzikayise E.; Dunsby, Peter K. S.

2017-12-01

Employing a Mathematica symbolic computer algebra package called xTensor, we present (1+3) -covariant special case proofs of the shear-free perfect fluid conjecture in general relativity. We first present the case where the pressure is constant, and where the acceleration is parallel to the vorticity vector. These cases were first presented in their covariant form by Senovilla et al. We then provide a covariant proof for the case where the acceleration and vorticity vectors are orthogonal, which leads to the existence of a Killing vector along the vorticity. This Killing vector satisfies the new constraint equations resulting from the vanishing of the shear. Furthermore, it is shown that in order for the conjecture to be true, this Killing vector must have a vanishing spatially projected directional covariant derivative along the velocity vector field. This in turn implies the existence of another basic vector field along the direction of the vorticity for the conjecture to hold. Finally, we show that in general, there exists a basic vector field parallel to the acceleration for which the conjecture is true.
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregatingmore » each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Newman, G.A.; Commer, M.

Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/Lmore » supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.« less
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH

NASA Astrophysics Data System (ADS)

Lee, D.; Gopal, S.; Mohapatra, P.

2012-07-01

We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

NASA Technical Reports Server (NTRS)

Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.

2012-01-01

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
Notes on the KIVA-2 software and chemically reactive fluid mechanics

NASA Astrophysics Data System (ADS)

Holst, M. J.

1992-09-01

Working notes regarding the mechanics of chemically reactive fluids with sprays, and their numerical simulation with the KIVA-2 software are presented. KIVA-2 is a large FORTRAN program developed at Los Alamos National Laboratory for internal combustion engine simulation. It is our hope that these notes summarize some of the necessary background material in fluid mechanics and combustion, explain the numerical methods currently used in KIVA-2 and similar combustion codes, and provide an outline of the overall structure of KIVA-2 as a representative combustion program, in order to aid the researcher in the task of implementing KIVA-2 or a similar combustion code on a massively parallel computer. The notes are organized into three parts as follows. In Part 1, a brief introduction to continuum mechanics, to fluid mechanics, and to the mechanics of chemically reactive fluids with sprays is presented. In Part 2, a close look at the governing equations of KIVA-2 is taken, and the methods employed in the numerical solution of these equations is discussed. Some conclusions are drawn and some observations are made in Part 3.
Shape optimization of pulsatile ventricular assist devices using FSI to minimize thrombotic risk

NASA Astrophysics Data System (ADS)

Long, C. C.; Marsden, A. L.; Bazilevs, Y.

2014-10-01

In this paper we perform shape optimization of a pediatric pulsatile ventricular assist device (PVAD). The device simulation is carried out using fluid-structure interaction (FSI) modeling techniques within a computational framework that combines FEM for fluid mechanics and isogeometric analysis for structural mechanics modeling. The PVAD FSI simulations are performed under realistic conditions (i.e., flow speeds, pressure levels, boundary conditions, etc.), and account for the interaction of air, blood, and a thin structural membrane separating the two fluid subdomains. The shape optimization study is designed to reduce thrombotic risk, a major clinical problem in PVADs. Thrombotic risk is quantified in terms of particle residence time in the device blood chamber. Methods to compute particle residence time in the context of moving spatial domains are presented in a companion paper published in the same issue (Comput Mech, doi: 10.1007/s00466-013-0931-y, 2013). The surrogate management framework, a derivative-free pattern search optimization method that relies on surrogates for increased efficiency, is employed in this work. For the optimization study shown here, particle residence time is used to define a suitable cost or objective function, while four adjustable design optimization parameters are used to define the device geometry. The FSI-based optimization framework is implemented in a parallel computing environment, and deployed with minimal user intervention. Using five SEARCH/ POLL steps the optimization scheme identifies a PVAD design with significantly better throughput efficiency than the original device.
Alternative modeling methods for plasma-based Rf ion sources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Veitzer, Seth A., E-mail: veitzer@txcorp.com; Kundrapu, Madhusudhan, E-mail: madhusnk@txcorp.com; Stoltz, Peter H., E-mail: phstoltz@txcorp.com

Rf-driven ion sources for accelerators and many industrial applications benefit from detailed numerical modeling and simulation of plasma characteristics. For instance, modeling of the Spallation Neutron Source (SNS) internal antenna H{sup −} source has indicated that a large plasma velocity is induced near bends in the antenna where structural failures are often observed. This could lead to improved designs and ion source performance based on simulation and modeling. However, there are significant separations of time and spatial scales inherent to Rf-driven plasma ion sources, which makes it difficult to model ion sources with explicit, kinetic Particle-In-Cell (PIC) simulation codes. Inmore » particular, if both electron and ion motions are to be explicitly modeled, then the simulation time step must be very small, and total simulation times must be large enough to capture the evolution of the plasma ions, as well as extending over many Rf periods. Additional physics processes such as plasma chemistry and surface effects such as secondary electron emission increase the computational requirements in such a way that even fully parallel explicit PIC models cannot be used. One alternative method is to develop fluid-based codes coupled with electromagnetics in order to model ion sources. Time-domain fluid models can simulate plasma evolution, plasma chemistry, and surface physics models with reasonable computational resources by not explicitly resolving electron motions, which thereby leads to an increase in the time step. This is achieved by solving fluid motions coupled with electromagnetics using reduced-physics models, such as single-temperature magnetohydrodynamics (MHD), extended, gas dynamic, and Hall MHD, and two-fluid MHD models. We show recent results on modeling the internal antenna H{sup −} ion source for the SNS at Oak Ridge National Laboratory using the fluid plasma modeling code USim. We compare demonstrate plasma temperature equilibration in two-temperature MHD models for the SNS source and present simulation results demonstrating plasma evolution over many Rf periods for different plasma temperatures. We perform the calculations in parallel, on unstructured meshes, using finite-volume solvers in order to obtain results in reasonable time.« less
Alternative modeling methods for plasma-based Rf ion sources.

PubMed

Veitzer, Seth A; Kundrapu, Madhusudhan; Stoltz, Peter H; Beckwith, Kristian R C

2016-02-01

Rf-driven ion sources for accelerators and many industrial applications benefit from detailed numerical modeling and simulation of plasma characteristics. For instance, modeling of the Spallation Neutron Source (SNS) internal antenna H(-) source has indicated that a large plasma velocity is induced near bends in the antenna where structural failures are often observed. This could lead to improved designs and ion source performance based on simulation and modeling. However, there are significant separations of time and spatial scales inherent to Rf-driven plasma ion sources, which makes it difficult to model ion sources with explicit, kinetic Particle-In-Cell (PIC) simulation codes. In particular, if both electron and ion motions are to be explicitly modeled, then the simulation time step must be very small, and total simulation times must be large enough to capture the evolution of the plasma ions, as well as extending over many Rf periods. Additional physics processes such as plasma chemistry and surface effects such as secondary electron emission increase the computational requirements in such a way that even fully parallel explicit PIC models cannot be used. One alternative method is to develop fluid-based codes coupled with electromagnetics in order to model ion sources. Time-domain fluid models can simulate plasma evolution, plasma chemistry, and surface physics models with reasonable computational resources by not explicitly resolving electron motions, which thereby leads to an increase in the time step. This is achieved by solving fluid motions coupled with electromagnetics using reduced-physics models, such as single-temperature magnetohydrodynamics (MHD), extended, gas dynamic, and Hall MHD, and two-fluid MHD models. We show recent results on modeling the internal antenna H(-) ion source for the SNS at Oak Ridge National Laboratory using the fluid plasma modeling code USim. We compare demonstrate plasma temperature equilibration in two-temperature MHD models for the SNS source and present simulation results demonstrating plasma evolution over many Rf periods for different plasma temperatures. We perform the calculations in parallel, on unstructured meshes, using finite-volume solvers in order to obtain results in reasonable time.

Finite Element Analysis of Magnetic Damping Effects on G-Jitter Induced Fluid Flow

NASA Technical Reports Server (NTRS)

Pan, Bo; Li, Ben Q.; deGroh, Henry C., III

1997-01-01

This paper reports some interim results on numerical modeling and analyses of magnetic damping of g-jitter driven fluid flow in microgravity. A finite element model is developed to represent the fluid flow, thermal and solute transport phenomena in a 2-D cavity under g-jitter conditions with and without an applied magnetic field. The numerical model is checked by comparing with analytical solutions obtained for a simple parallel plate channel flow driven by g-jitter in a transverse magnetic field. The model is then applied to study the effect of steady state g-jitter induced oscillation and on the solute redistribution in the liquid that bears direct relevance to the Bridgman-Stockbarger single crystal growth processes. A selection of computed results is presented and the results indicate that an applied magnetic field can effectively damp the velocity caused by g-jitter and help to reduce the time variation of solute redistribution.
Nonaxisymmetric modelling in BOUT++; toward global edge fluid turbulence in stellarators

NASA Astrophysics Data System (ADS)

Shanahan, Brendan; Hill, Peter; Dudson, Ben

2016-10-01

As Wendelstein 7-X has been optimized for neoclassical transport, turbulent transport could potentially become comparable to neoclassical losses. Furthermore, the imminent installation of an island divertor merits global edge modelling to determine heat flux profiles and the efficacy of the system. Currently, however, nonaxisymmetric edge plasma modelling is limited to either steady state (non-turbulent) transport modelling, or computationally expensive gyrokinetics. The implementation of the Flux Coordinate Independent (FCI) approach to parallel derivatives has allowed the extension of the BOUT++ edge fluid turbulence framework to nonaxisymmetric geometries. Here we first investigate the implementation of the FCI method in BOUT++ by modelling diffusion equations in nonaxisymmetric geometries with and without boundary interaction, and quantify the inherent error. We then present the results of non-turbulent transport modelling and compare with analytical theory. The ongoing extension of BOUT++ to nonaxisymmetric configurations, and the prospects of stellarator edge fluid turbulence simulations will be discussed.
A matrix-free implicit unstructured multigrid finite volume method for simulating structural dynamics and fluid structure interaction

NASA Astrophysics Data System (ADS)

Lv, X.; Zhao, Y.; Huang, X. Y.; Xia, G. H.; Su, X. H.

2007-07-01

A new three-dimensional (3D) matrix-free implicit unstructured multigrid finite volume (FV) solver for structural dynamics is presented in this paper. The solver is first validated using classical 2D and 3D cantilever problems. It is shown that very accurate predictions of the fundamental natural frequencies of the problems can be obtained by the solver with fast convergence rates. This method has been integrated into our existing FV compressible solver [X. Lv, Y. Zhao, et al., An efficient parallel/unstructured-multigrid preconditioned implicit method for simulating 3d unsteady compressible flows with moving objects, Journal of Computational Physics 215(2) (2006) 661-690] based on the immersed membrane method (IMM) [X. Lv, Y. Zhao, et al., as mentioned above]. Results for the interaction between the fluid and an immersed fixed-free cantilever are also presented to demonstrate the potential of this integrated fluid-structure interaction approach.
A supportive architecture for CFD-based design optimisation

NASA Astrophysics Data System (ADS)

Li, Ni; Su, Zeya; Bi, Zhuming; Tian, Chao; Ren, Zhiming; Gong, Guanghong

2014-03-01

Multi-disciplinary design optimisation (MDO) is one of critical methodologies to the implementation of enterprise systems (ES). MDO requiring the analysis of fluid dynamics raises a special challenge due to its extremely intensive computation. The rapid development of computational fluid dynamic (CFD) technique has caused a rise of its applications in various fields. Especially for the exterior designs of vehicles, CFD has become one of the three main design tools comparable to analytical approaches and wind tunnel experiments. CFD-based design optimisation is an effective way to achieve the desired performance under the given constraints. However, due to the complexity of CFD, integrating with CFD analysis in an intelligent optimisation algorithm is not straightforward. It is a challenge to solve a CFD-based design problem, which is usually with high dimensions, and multiple objectives and constraints. It is desirable to have an integrated architecture for CFD-based design optimisation. However, our review on existing works has found that very few researchers have studied on the assistive tools to facilitate CFD-based design optimisation. In the paper, a multi-layer architecture and a general procedure are proposed to integrate different CFD toolsets with intelligent optimisation algorithms, parallel computing technique and other techniques for efficient computation. In the proposed architecture, the integration is performed either at the code level or data level to fully utilise the capabilities of different assistive tools. Two intelligent algorithms are developed and embedded with parallel computing. These algorithms, together with the supportive architecture, lay a solid foundation for various applications of CFD-based design optimisation. To illustrate the effectiveness of the proposed architecture and algorithms, the case studies on aerodynamic shape design of a hypersonic cruising vehicle are provided, and the result has shown that the proposed architecture and developed algorithms have performed successfully and efficiently in dealing with the design optimisation with over 200 design variables.
A scalable nonlinear fluid–structure interaction solver based on a Schwarz preconditioner with isogeometric unstructured coarse spaces in 3D

DOE PAGES

Kong, Fande; Cai, Xiao-Chuan

2017-03-24

Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexactmore » Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here ''geometry'' includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.« less
Re-radiation of acoustic waves from the A0 wave on a submerged elastic shell

NASA Astrophysics Data System (ADS)

Ahyi, A. C.; Cao, Hui; Raju, P. K.; Überall, Herbert

2005-07-01

We consider evacuated thin semi-infinite shells immersed in a fluid, which may be either of cylindrical shape with a hemispherical shell endcap, or formed two-dimensionally by semi-infinite parallel plates joined together by a semi-cylinder. The connected shell portions are joined in a manner to satisfy continuity but with a discontinuous radius of curvature. Acoustic waves are considered incident along the axis of symmetry (say the z axis) onto the curved portion of the shell, where they, at the critical angle of coincidence, generate Lamb and Stoneley-type waves in the shell. Computations were carried out using a code developed by Cao et al. [Chinese J. Acoust. 14, 317 (1995)] and was used in order to computationally visualize the waves in the fluid that have been re-radiated by the shell waves a the critical angle. The frequency range was below that of the lowest Lamb wave, and only the A0 wave (and partly the S0 wave) was observed to re-radiate into the fluid under our assumptions. The results will be compared to experimental results in which the re-radiated waves are optically visualized by the Schardin-Cranz schlieren method. .
3D Global Fluid Simulations of Turbulence in LAPD

NASA Astrophysics Data System (ADS)

Rogers, Barrett; Ricci, Paolo; Li, Bo

2009-05-01

We present 3D global fluid simulations of the UCLA upgraded Large Plasma Device (LAPD). This device confines an 18-m-long, cylindrically symmetric plasma with a uniform magnetic field. The plasma in the simulations is generated by density and temperature sources inside the computational domain, and sheath boundary conditions are applied at the ends of the plasma column. In 3D simulations of the entire plasma, we observe strong, rotating intermittent density and temperature fluctuations driven by resistive driftwave turbulence with finite parallel wavenumbers. Analogous simulations carried out in the 2D limit (that is, assuming that the motions are purely interchange-like) display much weaker mode activity driven a Kelvin-Helmholtz instability. The properties and scaling of the turbulence and transport will be discussed.
GPU-based Parallel Application Design for Emerging Mobile Devices

NASA Astrophysics Data System (ADS)

Gupta, Kshitij

A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as compute and communication capabilities of mobile devices improve, we analyze energy implications of processing speech recognition locally (on-chip) and offloading it to servers (in-cloud).
SIMULATION AND VISUALIZATION OF FLOW PATTERN IN MICROARRAYS FOR LIQUID PHASE OLIGONUCLEOTIDE AND PEPTIDE SYNTHESIS

PubMed Central

O-Charoen, Sirimon; Srivannavit, Onnop; Gulari, Erdogan

2008-01-01

Microfluidic microarrays have been developed for economical and rapid parallel synthesis of oligonucleotide and peptide libraries. For a synthesis system to be reproducible and uniform, it is crucial to have a uniform reagent delivery throughout the system. Computational fluid dynamics (CFD) is used to model and simulate the microfluidic microarrays to study geometrical effects on flow patterns. By proper design geometry, flow uniformity could be obtained in every microreactor in the microarrays. PMID:17480053
A Comparison of Lifting-Line and CFD Methods with Flight Test Data from a Research Puma Helicopter

NASA Technical Reports Server (NTRS)

Bousman, William G.; Young, Colin; Toulmay, Francois; Gilbert, Neil E.; Strawn, Roger C.; Miller, Judith V.; Maier, Thomas H.; Costes, Michel; Beaumier, Philippe

1996-01-01

Four lifting-line methods were compared with flight test data from a research Puma helicopter and the accuracy assessed over a wide range of flight speeds. Hybrid Computational Fluid Dynamics (CFD) methods were also examined for two high-speed conditions. A parallel analytical effort was performed with the lifting-line methods to assess the effects of modeling assumptions and this provided insight into the adequacy of these methods for load predictions.
Recent Improvements in the FDNS CFD Code and its Associated Process

NASA Technical Reports Server (NTRS)

West, Jeff S.; Dorney, Suzanne M.; Turner, Jim (Technical Monitor)

2002-01-01

This viewgraph presentation gives an overview on recent improvements in the Finite Difference Navier Stokes (FDNS) computational fluid dynamics (CFD) code and its associated process. The development of a utility, PreViewer, has essentially eliminated the creeping of simple human error into the FDNS Solution process. Extension of PreViewer to encapsulate the Domain Decompression process has made practical the routine use of parallel processing. The combination of CVS source control and ATS consistency validation significantly increases the efficiency of the CFD process.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Apparatus for and method of simulating turbulence

DOEpatents

Dimas, Athanassios; Lottati, Isaac; Bernard, Peter; Collins, James; Geiger, James C.

2003-01-01

In accordance with a preferred embodiment of the invention, a novel apparatus for and method of simulating physical processes such as fluid flow is provided. Fluid flow near a boundary or wall of an object is represented by a collection of vortex sheet layers. The layers are composed of a grid or mesh of one or more geometrically shaped space filling elements. In the preferred embodiment, the space filling elements take on a triangular shape. An Eulerian approach is employed for the vortex sheets, where a finite-volume scheme is used on the prismatic grid formed by the vortex sheet layers. A Lagrangian approach is employed for the vortical elements (e.g., vortex tubes or filaments) found in the remainder of the flow domain. To reduce the computational time, a hairpin removal scheme is employed to reduce the number of vortex filaments, and a Fast Multipole Method (FMM), preferably implemented using parallel processing techniques, reduces the computation of the velocity field.
Finite-difference method Stokes solver (FDMSS) for 3D pore geometries: Software development, validation and case studies

NASA Astrophysics Data System (ADS)

Gerke, Kirill M.; Vasilyev, Roman V.; Khirevich, Siarhei; Collins, Daniel; Karsanina, Marina V.; Sizonenko, Timofey O.; Korost, Dmitry V.; Lamontagne, Sébastien; Mallants, Dirk

2018-05-01

Permeability is one of the fundamental properties of porous media and is required for large-scale Darcian fluid flow and mass transport models. Whilst permeability can be measured directly at a range of scales, there are increasing opportunities to evaluate permeability from pore-scale fluid flow simulations. We introduce the free software Finite-Difference Method Stokes Solver (FDMSS) that solves Stokes equation using a finite-difference method (FDM) directly on voxelized 3D pore geometries (i.e. without meshing). Based on explicit convergence studies, validation on sphere packings with analytically known permeabilities, and comparison against lattice-Boltzmann and other published FDM studies, we conclude that FDMSS provides a computationally efficient and accurate basis for single-phase pore-scale flow simulations. By implementing an efficient parallelization and code optimization scheme, permeability inferences can now be made from 3D images of up to 109 voxels using modern desktop computers. Case studies demonstrate the broad applicability of the FDMSS software for both natural and artificial porous media.
Gigaflop performance on a CRAY-2: Multitasking a computational fluid dynamics application

NASA Technical Reports Server (NTRS)

Tennille, Geoffrey M.; Overman, Andrea L.; Lambiotte, Jules J.; Streett, Craig L.

1991-01-01

The methodology is described for converting a large, long-running applications code that executed on a single processor of a CRAY-2 supercomputer to a version that executed efficiently on multiple processors. Although the conversion of every application is different, a discussion of the types of modification used to achieve gigaflop performance is included to assist others in the parallelization of applications for CRAY computers, especially those that were developed for other computers. An existing application, from the discipline of computational fluid dynamics, that had utilized over 2000 hrs of CPU time on CRAY-2 during the previous year was chosen as a test case to study the effectiveness of multitasking on a CRAY-2. The nature of dominant calculations within the application indicated that a sustained computational rate of 1 billion floating-point operations per second, or 1 gigaflop, might be achieved. The code was first analyzed and modified for optimal performance on a single processor in a batch environment. After optimal performance on a single CPU was achieved, the code was modified to use multiple processors in a dedicated environment. The results of these two efforts were merged into a single code that had a sustained computational rate of over 1 gigaflop on a CRAY-2. Timings and analysis of performance are given for both single- and multiple-processor runs.
Numerical modeling of fluid flow in a fault zone: a case of study from Majella Mountain (Italy).

NASA Astrophysics Data System (ADS)

Romano, Valentina; Battaglia, Maurizio; Bigi, Sabina; De'Haven Hyman, Jeffrey; Valocchi, Albert J.

2017-04-01

The study of fluid flow in fractured rocks plays a key role in reservoir management, including CO2 sequestration and waste isolation. We present a numerical model of fluid flow in a fault zone, based on field data acquired in Majella Mountain, in the Central Apennines (Italy). This fault zone is considered a good analogue for the massive presence of fluid migration in the form of tar. Faults are mechanical features and cause permeability heterogeneities in the upper crust, so they strongly influence fluid flow. The distribution of the main components (core, damage zone) can lead the fault zone to act as a conduit, a barrier, or a combined conduit-barrier system. We integrated existing information and our own structural surveys of the area to better identify the major fault features (e.g., type of fractures, statistical properties, geometrical and petro-physical characteristics). In our model the damage zones of the fault are described as discretely fractured medium, while the core of the fault as a porous one. Our model utilizes the dfnWorks code, a parallelized computational suite, developed at Los Alamos National Laboratory (LANL), that generates three dimensional Discrete Fracture Network (DFN) of the damage zones of the fault and characterizes its hydraulic parameters. The challenge of the study is the coupling between the discrete domain of the damage zones and the continuum one of the core. The field investigations and the basic computational workflow will be described, along with preliminary results of fluid flow simulation at the scale of the fault.
Visual Computing Environment

NASA Technical Reports Server (NTRS)

Lawrence, Charles; Putt, Charles W.

1997-01-01

The Visual Computing Environment (VCE) is a NASA Lewis Research Center project to develop a framework for intercomponent and multidisciplinary computational simulations. Many current engineering analysis codes simulate various aspects of aircraft engine operation. For example, existing computational fluid dynamics (CFD) codes can model the airflow through individual engine components such as the inlet, compressor, combustor, turbine, or nozzle. Currently, these codes are run in isolation, making intercomponent and complete system simulations very difficult to perform. In addition, management and utilization of these engineering codes for coupled component simulations is a complex, laborious task, requiring substantial experience and effort. To facilitate multicomponent aircraft engine analysis, the CFD Research Corporation (CFDRC) is developing the VCE system. This system, which is part of NASA's Numerical Propulsion Simulation System (NPSS) program, can couple various engineering disciplines, such as CFD, structural analysis, and thermal analysis. The objectives of VCE are to (1) develop a visual computing environment for controlling the execution of individual simulation codes that are running in parallel and are distributed on heterogeneous host machines in a networked environment, (2) develop numerical coupling algorithms for interchanging boundary conditions between codes with arbitrary grid matching and different levels of dimensionality, (3) provide a graphical interface for simulation setup and control, and (4) provide tools for online visualization and plotting. VCE was designed to provide a distributed, object-oriented environment. Mechanisms are provided for creating and manipulating objects, such as grids, boundary conditions, and solution data. This environment includes parallel virtual machine (PVM) for distributed processing. Users can interactively select and couple any set of codes that have been modified to run in a parallel distributed fashion on a cluster of heterogeneous workstations. A scripting facility allows users to dictate the sequence of events that make up the particular simulation.
Optimal cube-connected cube multiprocessors

NASA Technical Reports Server (NTRS)

Sun, Xian-He; Wu, Jie

1993-01-01

Many CFD (computational fluid dynamics) and other scientific applications can be partitioned into subproblems. However, in general the partitioned subproblems are very large. They demand high performance computing power themselves, and the solutions of the subproblems have to be combined at each time step. The cube-connect cube (CCCube) architecture is studied. The CCCube architecture is an extended hypercube structure with each node represented as a cube. It requires fewer physical links between nodes than the hypercube, and provides the same communication support as the hypercube does on many applications. The reduced physical links can be used to enhance the bandwidth of the remaining links and, therefore, enhance the overall performance. The concept and the method to obtain optimal CCCubes, which are the CCCubes with a minimum number of links under a given total number of nodes, are proposed. The superiority of optimal CCCubes over standard hypercubes was also shown in terms of the link usage in the embedding of a binomial tree. A useful computation structure based on a semi-binomial tree for divide-and-conquer type of parallel algorithms was identified. It was shown that this structure can be implemented in optimal CCCubes without performance degradation compared with regular hypercubes. The result presented should provide a useful approach to design of scientific parallel computers.
Parallelization and checkpointing of GPU applications through program transformation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solano-Quinde, Lizandro Damian

2012-01-01

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less
Modeling cardiovascular hemodynamics using the lattice Boltzmann method on massively parallel supercomputers

NASA Astrophysics Data System (ADS)

Randles, Amanda Elizabeth

Accurate and reliable modeling of cardiovascular hemodynamics has the potential to improve understanding of the localization and progression of heart diseases, which are currently the most common cause of death in Western countries. However, building a detailed, realistic model of human blood flow is a formidable mathematical and computational challenge. The simulation must combine the motion of the fluid, the intricate geometry of the blood vessels, continual changes in flow and pressure driven by the heartbeat, and the behavior of suspended bodies such as red blood cells. Such simulations can provide insight into factors like endothelial shear stress that act as triggers for the complex biomechanical events that can lead to atherosclerotic pathologies. Currently, it is not possible to measure endothelial shear stress in vivo, making these simulations a crucial component to understanding and potentially predicting the progression of cardiovascular disease. In this thesis, an approach for efficiently modeling the fluid movement coupled to the cell dynamics in real-patient geometries while accounting for the additional force from the expansion and contraction of the heart will be presented and examined. First, a novel method to couple a mesoscopic lattice Boltzmann fluid model to the microscopic molecular dynamics model of cell movement is elucidated. A treatment of red blood cells as extended structures, a method to handle highly irregular geometries through topology driven graph partitioning, and an efficient molecular dynamics load balancing scheme are introduced. These result in a large-scale simulation of the cardiovascular system, with a realistic description of the complex human arterial geometry, from centimeters down to the spatial resolution of red-blood cells. The computational methods developed to enable scaling of the application to 294,912 processors are discussed, thus empowering the simulation of a full heartbeat. Second, further extensions to enable the modeling of fluids in vessels with smaller diameters and a method for introducing the deformational forces exerted on the arterial flows from the movement of the heart by borrowing concepts from cosmodynamics are presented. These additional forces have a great impact on the endothelial shear stress. Third, the fluid model is extended to not only recover Navier-Stokes hydrodynamics, but also a wider range of Knudsen numbers, which is especially important in micro- and nano-scale flows. The tradeoffs of many optimizations methods such as the use of deep halo level ghost cells that, alongside hybrid programming models, reduce the impact of such higher-order models and enable efficient modeling of extreme regimes of computational fluid dynamics are discussed. Fourth, the extension of these models to other research questions like clogging in microfluidic devices and determining the severity of co-arctation of the aorta is presented. Through this work, a validation of these methods by taking real patient data and the measured pressure value before the narrowing of the aorta and predicting the pressure drop across the co-arctation is shown. Comparison with the measured pressure drop in vivo highlights the accuracy and potential impact of such patient specific simulations. Finally, a method to enable the simulation of longer trajectories in time by discretizing both spatially and temporally is presented. In this method, a serial coarse iterator is used to initialize data at discrete time steps for a fine model that runs in parallel. This coarse solver is based on a larger time step and typically a coarser discretization in space. Iterative refinement enables the compute-intensive fine iterator to be modeled with temporal parallelization. The algorithm consists of a series of prediction-corrector iterations completing when the results have converged within a certain tolerance. Combined, these developments allow large fluid models to be simulated for longer time durations than previously possible.

An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems

NASA Technical Reports Server (NTRS)

Follen, Gregory J.

2003-01-01

Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT).
An Application-Based Performance Characterization of the Columbia Supercluster

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Djomehri, Jahed M.; Hood, Robert; Jin, Hoaqiang; Kiris, Cetin; Saini, Subhash

2005-01-01

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors.
Heat exchange assembly

DOEpatents

Lowenstein, Andrew; Sibilia, Marc; Miller, Jeffrey; Tonon, Thomas S.

2004-06-08

A heat exchange assembly comprises a plurality of plates disposed in a spaced-apart arrangement, each of the plurality of plates includes a plurality of passages extending internally from a first end to a second end for directing flow of a heat transfer fluid in a first plane, a plurality of first end-piece members equaling the number of plates and a plurality of second end-piece members also equaling the number of plates, each of the first and second end-piece members including a recessed region adapted to fluidly connect and couple with the first and second ends of the plate, respectively, and further adapted to be affixed to respective adjacent first and second end-piece members in a stacked formation, and each of the first and second end-piece members further including at least one cavity for enabling entry of the heat transfer fluid into the plate, exit of the heat transfer fluid from the plate, or 180.degree. turning of the fluid within the plate to create a serpentine-like fluid flow path between points of entry and exit of the fluid, and at least two fluid conduits extending through the stacked plurality of first and second end-piece members for providing first fluid connections between the parallel fluid entry points of adjacent plates and a fluid supply inlet, and second fluid connections between the parallel fluid exit points of adjacent plates and a fluid discharge outlet so that the heat transfer fluid travels in parallel paths through each respective plate.
Hear Exchange Assembly

DOEpatents

Lowenstein, Andrew; Sibilia, Marc; Miller, Jeffrey; Tonon, Thomas S.

2003-05-27

A heat exchange assembly comprises a plurality of plates disposed in a spaced-apart arrangement, each of the plurality of plates includes a plurality of passages extending internally from a first end to a second end for directing flow of a heat transfer fluid in a first plane, a plurality of first end-piece members equaling the number of plates and a plurality of second end-piece members also equaling the number of plates, each of the first and second end-piece members including a recessed region adapted to fluidly connect and couple with the first and second ends of the plate, respectively, and further adapted to be affixed to respective adjacent first and second end-piece members in a stacked formation, and each of the first and second end-piece members further including at least one cavity for enabling entry of the heat transfer fluid into the plate, exit of the heat transfer fluid from the plate, or 180.degree. turning of the fluid within the plate to create a serpentine-like fluid flow path between points of entry and exit of the fluid, and at least two fluid conduits extending through the stacked plurality of first and second end-piece members for providing first fluid connections between the parallel fluid entry points of adjacent plates and a fluid supply inlet, and second fluid connections between the parallel fluid exit points of adjacent plates and a fluid discharge outlet so that the heat transfer fluid travels in parallel paths through each respective plate.
Imaging Polarimetry in Central Serous Chorioretinopathy

PubMed Central

MIURA, MASAHIRO; ELSNER, ANN E.; WEBER, ANKE; CHENEY, MICHAEL C.; OSAKO, MASAHIRO; USUI, MASAHIKO; IWASAKI, TAKUYA

2006-01-01

PURPOSE To evaluate a noninvasive technique to detect the leakage point of central serous chorioretinopathy (CSR), using a polarimetry method. DESIGN Prospective cohort study. METHODS SETTING Institutional practice. PATIENTS We examined 30 eyes of 30 patients with CSR. MAIN OUTCOME MEASURES Polarimetry images were recorded using the GDx-N (Laser Diagnostic Technologies). We computed four images that differed in their polarization content: a depolarized light image, an average reflectance image, a parallel polarized light image, and a birefringence image. Each polarimetry image was compared with abnormalities seen on fluorescein angiography. RESULTS In all eyes, leakage area could be clearly visualized as a bright area in the depolarized light images. Michelson contrasts for the leakage areas were 0.58 ± 0.28 in the depolarized light images, 0.17 ± 0.11 in the average reflectance images, 0.09 ± 0.09 in the parallel polarized light images, and 0.11 ± 0.21 in the birefringence images from the same raw data. Michelson contrasts in depolarized light images were significantly higher than for the other three images (P < .0001, for all tests, paired t test). The fluid accumulated in the retina was well-visualized in the average and parallel polarized light images. CONCLUSIONS Polarization-sensitive imaging could readily localize the leakage point and area of fluid in CSR. This may assist with the rapid, noninvasive assessment of CSR. PMID:16376644
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

DOE PAGES

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...

2017-11-27

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Applying Parallel Adaptive Methods with GeoFEST/PYRAMID to Simulate Earth Surface Crustal Dynamics

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Glasscoe, Margaret; Donnellan, Andrea; Li, Peggy

2006-01-01

This viewgraph presentation reviews the use Adaptive Mesh Refinement (AMR) in simulating the Crustal Dynamics of Earth's Surface. AMR simultaneously improves solution quality, time to solution, and computer memory requirements when compared to generating/running on a globally fine mesh. The use of AMR in simulating the dynamics of the Earth's Surface is spurred by future proposed NASA missions, such as InSAR for Earth surface deformation and other measurements. These missions will require support for large-scale adaptive numerical methods using AMR to model observations. AMR was chosen because it has been successful in computation fluid dynamics for predictive simulation of complex flows around complex structures.
Turbulent shear layers in confining channels

NASA Astrophysics Data System (ADS)

Benham, Graham P.; Castrejon-Pita, Alfonso A.; Hewitt, Ian J.; Please, Colin P.; Style, Rob W.; Bird, Paul A. D.

2018-06-01

We present a simple model for the development of shear layers between parallel flows in confining channels. Such flows are important across a wide range of topics from diffusers, nozzles and ducts to urban air flow and geophysical fluid dynamics. The model approximates the flow in the shear layer as a linear profile separating uniform-velocity streams. Both the channel geometry and wall drag affect the development of the flow. The model shows good agreement with both particle image velocimetry experiments and computational turbulence modelling. The simplicity and low computational cost of the model allows it to be used for benchmark predictions and design purposes, which we demonstrate by investigating optimal pressure recovery in diffusers with non-uniform inflow.
NAS Technical Summaries, March 1993 - February 1994

NASA Technical Reports Server (NTRS)

1995-01-01

NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1993-94 operational year concluded with 448 high-speed processor projects and 95 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
NAS technical summaries. Numerical aerodynamic simulation program, March 1992 - February 1993

NASA Technical Reports Server (NTRS)

1994-01-01

NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1992-93 operational year concluded with 399 high-speed processor projects and 91 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
Towards a new method for modeling multicomponent, multiphase flow and transport in porous media

NASA Astrophysics Data System (ADS)

Kong, X. Z.; Schaedle, P.; Leal, A. M. M.; Saar, M. O.

2016-12-01

The ability to computationally simulate multiphase-multicomponent fluid flow, coupled with geochemical reactions between fluid species and rock minerals, in porous and/or fractured subsurface systems is of major importance to a vast number of applications. These include (1) carbon dioxide storage in geologic formations, (2) geothermal energy extraction, (3) combinations of the latter two applications during CO2-Plume Geothermal energy extraction, (4) waste fluid and waste storage, as well as (5) groundwater and contaminant transport. Modeling these systems with such a wide variety of coupled physical and chemical processes is both challenging and computationally expensive. In this work we present a new approach to develop a simulator for multicomponent-multiphase flow and reactive transport in porous media by using state of the art numerical tools, namely FEniCS (fenicsproject.org) and Reaktoro (reaktoro.org). The governing partial differential equations for fluid flow and transport are solved using FEniCS, which enables fast and efficient implementation of computer codes for the simulation of complex physical phenomena using finite element methods on unstructured meshes. FEniCS supports a wide range of finite element schemes of special interest to porous media flow. In addition, FEniCS interfaces with many sparse linear solvers and provides convenient tools for adaptive mesh refinement and the capability of massively parallel calculations. A fundamental component of our contribution is the coupling of our FEniCS based flow and transport solver with our chemical reaction simulator, Reaktoro, which implements efficient, robust, and accurate methods for chemical equilibrium and kinetics calculations at every node of the mesh, at every time step. These numerical methods for reaction modeling have been especially developed for performance-critical applications such as reactive transport modeling. Furthermore, Reaktoro is also used for the calculation of thermodynamic properties of rock minerals and fluids. The proposed simulator can, however, be coupled with other back-ends for the calculation of both thermodynamic and thermophysical properties of rock minerals and fluids. We present several example applications of our new approach, demonstrating its capabilities and computation speed.
Segmented heat exchanger

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baldwin, Darryl Dean; Willi, Martin Leo; Fiveland, Scott Byron

2010-12-14

A segmented heat exchanger system for transferring heat energy from an exhaust fluid to a working fluid. The heat exchanger system may include a first heat exchanger for receiving incoming working fluid and the exhaust fluid. The working fluid and exhaust fluid may travel through at least a portion of the first heat exchanger in a parallel flow configuration. In addition, the heat exchanger system may include a second heat exchanger for receiving working fluid from the first heat exchanger and exhaust fluid from a third heat exchanger. The working fluid and exhaust fluid may travel through at least amore » portion of the second heat exchanger in a counter flow configuration. Furthermore, the heat exchanger system may include a third heat exchanger for receiving working fluid from the second heat exchanger and exhaust fluid from the first heat exchanger. The working fluid and exhaust fluid may travel through at least a portion of the third heat exchanger in a parallel flow configuration.« less
CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation

NASA Astrophysics Data System (ADS)

Schneider, Evan E.; Robertson, Brant E.

2015-04-01

We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
MPSalsa Version 1.5: A Finite Element Computer Program for Reacting Flow Problems: Part 1 - Theoretical Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devine, K.D.; Hennigan, G.L.; Hutchinson, S.A.

1999-01-01

The theoretical background for the finite element computer program, MPSalsa Version 1.5, is presented in detail. MPSalsa is designed to solve laminar or turbulent low Mach number, two- or three-dimensional incompressible and variable density reacting fluid flows on massively parallel computers, using a Petrov-Galerkin finite element formulation. The code has the capability to solve coupled fluid flow (with auxiliary turbulence equations), heat transport, multicomponent species transport, and finite-rate chemical reactions, and to solve coupled multiple Poisson or advection-diffusion-reaction equations. The program employs the CHEMKIN library to provide a rigorous treatment of multicomponent ideal gas kinetics and transport. Chemical reactions occurringmore » in the gas phase and on surfaces are treated by calls to CHEMKIN and SURFACE CHEMK3N, respectively. The code employs unstructured meshes, using the EXODUS II finite element database suite of programs for its input and output files. MPSalsa solves both transient and steady flows by using fully implicit time integration, an inexact Newton method and iterative solvers based on preconditioned Krylov methods as implemented in the Aztec. solver library.« less
Developing Chemistry and Kinetic Modeling Tools for Low-Temperature Plasma Simulations

NASA Astrophysics Data System (ADS)

Jenkins, Thomas; Beckwith, Kris; Davidson, Bradley; Kruger, Scott; Pankin, Alexei; Roark, Christine; Stoltz, Peter

2015-09-01

We discuss the use of proper orthogonal decomposition (POD) methods in VSim, a FDTD plasma simulation code capable of both PIC/MCC and fluid modeling. POD methods efficiently generate smooth representations of noisy self-consistent or test-particle PIC data, and are thus advantageous in computing macroscopic fluid quantities from large PIC datasets (e.g. for particle-based closure computations) and in constructing optimal visual representations of the underlying physics. They may also confer performance advantages for massively parallel simulations, due to the significant reduction in dataset sizes conferred by truncated singular-value decompositions of the PIC data. We also demonstrate how complex LTP chemistry scenarios can be modeled in VSim via an interface with MUNCHKIN, a developing standalone python/C++/SQL code that identifies reaction paths for given input species, solves 1D rate equations for the time-dependent chemical evolution of the system, and generates corresponding VSim input blocks with appropriate cross-sections/reaction rates. MUNCHKIN also computes reaction rates from user-specified distribution functions, and conducts principal path analyses to reduce the number of simulated chemical reactions. Supported by U.S. Department of Energy SBIR program, Award DE-SC0009501.
A fluid–structure interaction model to characterize bone cell stimulation in parallel-plate flow chamber systems

PubMed Central

Vaughan, T. J.; Haugh, M. G.; McNamara, L. M.

2013-01-01

Bone continuously adapts its internal structure to accommodate the functional demands of its mechanical environment and strain-induced flow of interstitial fluid is believed to be the primary mediator of mechanical stimuli to bone cells in vivo. In vitro investigations have shown that bone cells produce important biochemical signals in response to fluid flow applied using parallel-plate flow chamber (PPFC) systems. However, the exact mechanical stimulus experienced by the cells within these systems remains unclear. To fully understand this behaviour represents a most challenging multi-physics problem involving the interaction between deformable cellular structures and adjacent fluid flows. In this study, we use a fluid–structure interaction computational approach to investigate the nature of the mechanical stimulus being applied to a single osteoblast cell under fluid flow within a PPFC system. The analysis decouples the contribution of pressure and shear stress on cellular deformation and for the first time highlights that cell strain under flow is dominated by the pressure in the PPFC system rather than the applied shear stress. Furthermore, it was found that strains imparted on the cell membrane were relatively low whereas significant strain amplification occurred at the cell–substrate interface. These results suggest that strain transfer through focal attachments at the base of the cell are the primary mediators of mechanical signals to the cell under flow in a PPFC system. Such information is vital in order to correctly interpret biological responses of bone cells under in vitro stimulation and elucidate the mechanisms associated with mechanotransduction in vivo. PMID:23365189
Gpu Implementation of a Viscous Flow Solver on Unstructured Grids

NASA Astrophysics Data System (ADS)

Xu, Tianhao; Chen, Long

2016-06-01

Graphics processing units have gained popularities in scientific computing over past several years due to their outstanding parallel computing capability. Computational fluid dynamics applications involve large amounts of calculations, therefore a latest GPU card is preferable of which the peak computing performance and memory bandwidth are much better than a contemporary high-end CPU. We herein focus on the detailed implementation of our GPU targeting Reynolds-averaged Navier-Stokes equations solver based on finite-volume method. The solver employs a vertex-centered scheme on unstructured grids for the sake of being capable of handling complex topologies. Multiple optimizations are carried out to improve the memory accessing performance and kernel utilization. Both steady and unsteady flow simulation cases are carried out using explicit Runge-Kutta scheme. The solver with GPU acceleration in this paper is demonstrated to have competitive advantages over the CPU targeting one.
Fast Eigensolver for Computing 3D Earth's Normal Modes

NASA Astrophysics Data System (ADS)

Shi, J.; De Hoop, M. V.; Li, R.; Xi, Y.; Saad, Y.

2017-12-01

We present a novel parallel computational approach to compute Earth's normal modes. We discretize Earth via an unstructured tetrahedral mesh and apply the continuous Galerkin finite element method to the elasto-gravitational system. To resolve the eigenvalue pollution issue, following the analysis separating the seismic point spectrum, we utilize explicitly a representation of the displacement for describing the oscillations of the non-seismic modes in the fluid outer core. Effectively, we separate out the essential spectrum which is naturally related to the Brunt-Väisälä frequency. We introduce two Lanczos approaches with polynomial and rational filtering for solving this generalized eigenvalue problem in prescribed intervals. The polynomial filtering technique only accesses the matrix pair through matrix-vector products and is an ideal candidate for solving three-dimensional large-scale eigenvalue problems. The matrix-free scheme allows us to deal with fluid separation and self-gravitation in an efficient way, while the standard shift-and-invert method typically needs an explicit shifted matrix and its factorization. The rational filtering method converges much faster than the standard shift-and-invert procedure when computing all the eigenvalues inside an interval. Both two Lanczos approaches solve for the internal eigenvalues extremely accurately, comparing with the standard eigensolver. In our computational experiments, we compare our results with the radial earth model benchmark, and visualize the normal modes using vector plots to illustrate the properties of the displacements in different modes.
High-Order Methods for Incompressible Fluid Flow

NASA Astrophysics Data System (ADS)

Deville, M. O.; Fischer, P. F.; Mund, E. H.

2002-08-01

High-order numerical methods provide an efficient approach to simulating many physical problems. This book considers the range of mathematical, engineering, and computer science topics that form the foundation of high-order numerical methods for the simulation of incompressible fluid flows in complex domains. Introductory chapters present high-order spatial and temporal discretizations for one-dimensional problems. These are extended to multiple space dimensions with a detailed discussion of tensor-product forms, multi-domain methods, and preconditioners for iterative solution techniques. Numerous discretizations of the steady and unsteady Stokes and Navier-Stokes equations are presented, with particular sttention given to enforcement of imcompressibility. Advanced discretizations. implementation issues, and parallel and vector performance are considered in the closing sections. Numerous examples are provided throughout to illustrate the capabilities of high-order methods in actual applications.

The Research of the Parallel Computing Development from the Angle of Cloud Computing

NASA Astrophysics Data System (ADS)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

USGS Publications Warehouse

Laura, Jason R.; Rey, Sergio J.

2017-01-01

Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.
Development of a Prototype Lattice Boltzmann Code for CFD of Fusion Systems.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pattison, Martin J; Premnath, Kannan N; Banerjee, Sanjoy

2007-02-26

Designs of proposed fusion reactors, such as the ITER project, typically involve the use of liquid metals as coolants in components such as heat exchangers, which are generally subjected to strong magnetic fields. These fields induce electric currents in the fluids, resulting in magnetohydrodynamic (MHD) forces which have important effects on the flow. The objective of this SBIR project was to develop computational techniques based on recently developed lattice Boltzmann techniques for the simulation of these MHD flows and implement them in a computational fluid dynamics (CFD) code for the study of fluid flow systems encountered in fusion engineering. Themore » code developed during this project, solves the lattice Boltzmann equation, which is a kinetic equation whose behaviour represents fluid motion. This is in contrast to most CFD codes which are based on finite difference/finite volume based solvers. The lattice Boltzmann method (LBM) is a relatively new approach which has a number of advantages compared with more conventional methods such as the SIMPLE or projection method algorithms that involve direct solution of the Navier-Stokes equations. These are that the LBM is very well suited to parallel processing, with almost linear scaling even for very large numbers of processors. Unlike other methods, the LBM does not require solution of a Poisson pressure equation leading to a relatively fast execution time. A particularly attractive property of the LBM is that it can handle flows in complex geometries very easily. It can use simple rectangular grids throughout the computational domain -- generation of a body-fitted grid is not required. A recent advance in the LBM is the introduction of the multiple relaxation time (MRT) model; the implementation of this model greatly enhanced the numerical stability when used in lieu of the single relaxation time model, with only a small increase in computer time. Parallel processing was implemented using MPI and demonstrated the ability of the LBM to scale almost linearly. The equation for magnetic induction was also solved using a lattice Boltzmann method. This approach has the advantage that it fits in well to the framework used for the hydrodynamic equations, but more importantly that it preserves the ability of the code to run efficiently on parallel architectures. Since the LBM is a relatively recent model, a number of new developments were needed to solve the magnetic induction equation for practical problems. Existing methods were only suitable for cases where the fluid viscosity and the magnetic resistivity are of the same order, and a preconditioning method was used to allow the simulation of liquid metals, where these properties differ by several orders of magnitude. An extension of this method to the hydrodynamic equations allowed faster convergence to steady state. A new method of imposing boundary conditions using an extrapolation technique was derived, enabling the magnetic field at a boundary to be specified. Also, a technique by which the grid can be stretched was formulated to resolve thin layers at high imposed magnetic fields, allowing flows with Hartmann numbers of several thousand to be quickly and efficiently simulated. In addition, a module has been developed to calculate the temperature field and heat transfer. This uses a total variation diminishing scheme to solve the equations and is again very amenable to parallelisation. Although, the module was developed with thermal modelling in mind, it can also be applied to passive scalar transport. The code is fully three dimensional and has been applied to a wide variety of cases, including both laminar and turbulent flows. Validations against a series of canonical problems involving both MHD effects and turbulence have clearly demonstrated the ability of the LBM to properly model these types of flow. As well as applications to fusion engineering, the resulting code is flexible enough to be applied to a wide range of other flows, in particular those requiring parallel computations with many processors. For example, at present it is being used for studies in aerodynamics and acoustics involving flows at high Reynolds numbers. It is anticipated that it will be used for multiphase flow applications in the near future.« less
Field gradients can control the alignment of nanorods.

PubMed

Ooi, Chinchun; Yellen, Benjamin B

2008-08-19

This work is motivated by the unexpected experimental observation that field gradients can control the alignment of nonmagnetic nanorods immersed inside magnetic fluids. In the presence of local field gradients, nanorods were observed to align perpendicular to the external field at low field strengths, but parallel to the external field at high field strengths. The switching behavior results from the competition between a preference to align with the external field (orientational potential energy) and preference to move into regions of minimum magnetic field (positional potential energy). A theoretical model is developed to explain this experimental behavior by investigating the statistics of nanorod alignment as a function of both the external uniform magnetic field strength and the local magnetic field variation above a periodic array of micromagnets. Computational phase diagrams are developed which indicate that the relative population of nanorods in parallel and perpendicular states can be adjusted through several control parameters. However, an energy barrier to rotation was discovered to influence the rate kinetics and restrict the utility of this assembly technique to nanorods which are slightly shorter than the micromagnet length. Experimental results concerning the orientation of nanorods inside magnetic fluid are also presented and shown to be in strong agreement with the theoretical work.
Impact of Cattaneo-Christov heat flux on electroosmotic transport of third-order fluids in a magnetic environment

NASA Astrophysics Data System (ADS)

Misra, J. C.; Mallick, B.; Sinha, A.; Roy Chowdhury, A.

2018-05-01

In the case of steady flow of a fluid under the combined influence of external electric and magnetic fields, the fluid moves forward by forming an axial momentum boundary layer. With this end in view a study has been performed here to investigate the problem of entropy generation during electroosmotically modulated flow of a third-order electrically conducting fluid flowing on a microchannel bounded by silicon-made parallel plates under the influence of a magnetic field, by paying due consideration to the steric effect. The associated mechanism of heat transfer has also been duly taken care of, by considering Cattaneo-Christov heat flux. A suitable finite difference scheme has been developed for the numerical procedure. A detailed study of the velocity and temperature distributions has been made by considering their variations with respect to different physical parameters involved in the problem. The results of numerical computation have been displayed graphically. The computational work has been carried out by considering blood as the working fluid, with the motivation of exploring some interesting phenomena in the context of hemodynamical flow in micro-vessels. Among other variables, parametric variations of the important physical variables, viz. i) skin friction and ii) Nusselt number have been investigated. The study confirms that the random motion of the fluid particles can be controlled by a suitable adjustment of the intensity of an externally applied magnetic field in the transverse direction. It is further revealed that the Nusselt number diminishes, as the Prandtl number gradually increases; however, a steady increase in the Nusselt number occurs with increase in thermal relaxation. Entropy generation is also found to be enhanced with increase in Joule heating. The results of the present study have also been validated in a proper manner.
Integrated multiplexed capillary electrophoresis system

DOEpatents

Yeung, Edward S.; Tan, Hongdong

2002-05-14

The present invention provides an integrated multiplexed capillary electrophoresis system for the analysis of sample analytes. The system integrates and automates multiple components, such as chromatographic columns and separation capillaries, and further provides a detector for the detection of analytes eluting from the separation capillaries. The system employs multiplexed freeze/thaw valves to manage fluid flow and sample movement. The system is computer controlled and is capable of processing samples through reaction, purification, denaturation, pre-concentration, injection, separation and detection in parallel fashion. Methods employing the system of the invention are also provided.
Algorithms for parallel flow solvers on message passing architectures

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1995-01-01

The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those immediately adjacent to them, then the first processor in the pipeline will receive a computational load that is less than that of subsequent processors, magnifying the pipeline slowdown effect. Extra compensation is needed for grid boundary effects, even if all grid blocks are equally sized.
Broadcasting collective operation contributions throughout a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
A Lattice-Boltzmann model to simulate diffractive nonlinear ultrasound beam propagation in a dissipative fluid medium

NASA Astrophysics Data System (ADS)

Abdi, Mohamad; Hajihasani, Mojtaba; Gharibzadeh, Shahriar; Tavakkoli, Jahan

2012-12-01

Ultrasound waves have been widely used in diagnostic and therapeutic medical applications. Accurate and effective simulation of ultrasound beam propagation and its interaction with tissue has been proved to be important. The nonlinear nature of the ultrasound beam propagation, especially in the therapeutic regime, plays an important role in the mechanisms of interaction with tissue. There are three main approaches in current computational fluid dynamics (CFD) methods to model and simulate nonlinear ultrasound beams: macroscopic, mesoscopic and microscopic approaches. In this work, a mesoscopic CFD method based on the Lattice-Boltzmann model (LBM) was investigated. In the developed method, the Boltzmann equation is evolved to simulate the flow of a Newtonian fluid with the collision model instead of solving the Navier-Stokes, continuity and state equations which are used in conventional CFD methods. The LBM has some prominent advantages over conventional CFD methods, including: (1) its parallel computational nature; (2) taking microscopic boundaries into account; and (3) capability of simulating in porous and inhomogeneous media. In our proposed method, the propagating medium is discretized with a square grid in 2 dimensions with 9 velocity vectors for each node. Using the developed model, the nonlinear distortion and shock front development of a finiteamplitude diffractive ultrasonic beam in a dissipative fluid medium was computed and validated against the published data. The results confirm that the LBM is an accurate and effective approach to model and simulate nonlinearity in finite-amplitude ultrasound beams with Mach numbers of up to 0.01 which, among others, falls within the range of therapeutic ultrasound regime such as high intensity focused ultrasound (HIFU) beams. A comparison between the HIFU nonlinear beam simulations using the proposed model and pseudospectral methods in a 2D geometry is presented.
Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Parallel Adaptive Mesh Refinement for High-Order Finite-Volume Schemes in Computational Fluid Dynamics

NASA Astrophysics Data System (ADS)

Schwing, Alan Michael

For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable comparisons across a range of regimes. Unsteady and steady applications are considered in both subsonic and supersonic flows. Inviscid and viscous simulations achieve similar results at a much reduced cost when employing dynamic mesh adaptation. Several techniques for guiding adaptation are compared. Detailed analysis of statistics from the instrumented solver enable understanding of the costs associated with adaptation. Adaptive mesh refinement shows promise for the test cases presented here. It can be considerably faster than using conventional grids and provides accurate results. The procedures for adapting the grid are light-weight enough to not require significant computational time and yield significant reductions in grid size.
Performance of parallel computation using CUDA for solving the one-dimensional elasticity equations

NASA Astrophysics Data System (ADS)

Darmawan, J. B. B.; Mungkasi, S.

2017-01-01

In this paper, we investigate the performance of parallel computation in solving the one-dimensional elasticity equations. Elasticity equations are usually implemented in engineering science. Solving these equations fast and efficiently is desired. Therefore, we propose the use of parallel computation. Our parallel computation uses CUDA of the NVIDIA. Our research results show that parallel computation using CUDA has a great advantage and is powerful when the computation is of large scale.
Hardware accelerator for molecular dynamics: MDGRAPE-2

NASA Astrophysics Data System (ADS)

Susukita, Ryutaro; Ebisuzaki, Toshikazu; Elmegreen, Bruce G.; Furusawa, Hideaki; Kato, Kenya; Kawai, Atsushi; Kobayashi, Yoshinao; Koishi, Takahiro; McNiven, Geoffrey D.; Narumi, Tetsu; Yasuoka, Kenji

2003-10-01

We developed MDGRAPE-2, a hardware accelerator that calculates forces at high speed in molecular dynamics (MD) simulations. MDGRAPE-2 is connected to a PC or a workstation as an extension board. The sustained performance of one MDGRAPE-2 board is 15 Gflops, roughly equivalent to the peak performance of the fastest supercomputer processing element. One board is able to calculate all forces between 10 000 particles in 0.28 s (i.e. 310000 time steps per day). If 16 boards are connected to one computer and operated in parallel, this calculation speed becomes ˜10 times faster. In addition to MD, MDGRAPE-2 can be applied to gravitational N-body simulations, the vortex method and smoothed particle hydrodynamics in computational fluid dynamics.
RIACS

NASA Technical Reports Server (NTRS)

Oliger, Joseph

1997-01-01

Topics considered include: high-performance computing; cognitive and perceptual prostheses (computational aids designed to leverage human abilities); autonomous systems. Also included: development of a 3D unstructured grid code based on a finite volume formulation and applied to the Navier-stokes equations; Cartesian grid methods for complex geometry; multigrid methods for solving elliptic problems on unstructured grids; algebraic non-overlapping domain decomposition methods for compressible fluid flow problems on unstructured meshes; numerical methods for the compressible navier-stokes equations with application to aerodynamic flows; research in aerodynamic shape optimization; S-HARP: a parallel dynamic spectral partitioner; numerical schemes for the Hamilton-Jacobi and level set equations on triangulated domains; application of high-order shock capturing schemes to direct simulation of turbulence; multicast technology; network testbeds; supercomputer consolidation project.
Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

NASA Astrophysics Data System (ADS)

Sandalski, Stou

Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.
DOE Office of Scientific and Technical Information (OSTI.GOV)

de Miguel, E.; Rull, L.F.; Gubbins, K.E.

Using molecular-dynamics computer simulation, we study the dynamical behavior of the isotropic and nematic phases of highly anisotropic molecular fluids. The interactions are modeled by means of the Gay-Berne potential with anisotropy parameters {kappa}=3 and {kappa}{prime}=5. The linear-velocity autocorrelation function shows no evidence of a negative region in the isotropic phase, even at the higher densities considered. The self-diffusion coefficient parallel to the molecular axis shows an anomalous increase with density as the system enters the nematic region. This enhancement in parallel diffusion is also observed in the isotropic side of the transition as a precursor effect. The molecular reorientationmore » is discussed in the light of different theoretical models. The Debye diffusion model appears to explain the reorientational mechanism in the nematic phase. None of the models gives a satisfactory account of the reorientation process in the isotropic phase.« less
OpenMP performance for benchmark 2D shallow water equations using LBM

NASA Astrophysics Data System (ADS)

Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry

2018-03-01

Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.
Computational sciences in the upstream oil and gas industry

PubMed Central

Halsey, Thomas C.

2016-01-01

The predominant technical challenge of the upstream oil and gas industry has always been the fundamental uncertainty of the subsurface from which it produces hydrocarbon fluids. The subsurface can be detected remotely by, for example, seismic waves, or it can be penetrated and studied in the extremely limited vicinity of wells. Inevitably, a great deal of uncertainty remains. Computational sciences have been a key avenue to reduce and manage this uncertainty. In this review, we discuss at a relatively non-technical level the current state of three applications of computational sciences in the industry. The first of these is seismic imaging, which is currently being revolutionized by the emergence of full wavefield inversion, enabled by algorithmic advances and petascale computing. The second is reservoir simulation, also being advanced through the use of modern highly parallel computing architectures. Finally, we comment on the role of data analytics in the upstream industry. This article is part of the themed issue ‘Energy and the subsurface’. PMID:27597785
Method and apparatus for probing relative volume fractions

DOEpatents

Jandrasits, Walter G.; Kikta, Thomas J.

1998-01-01

A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining therebetween a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirely of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction.
The surface diffusion coefficient for an arbitrarily curved fluid-fluid interface. (I). General expression

NASA Astrophysics Data System (ADS)

M. C. Sagis, Leonard

2001-03-01

In this paper, we develop a theory for the calculation of the surface diffusion coefficient for an arbitrarily curved fluid-fluid interface. The theory is valid for systems in hydrodynamic equilibrium, with zero mass-averaged velocities in the bulk and interfacial regions. We restrict our attention to systems with isotropic bulk phases, and an interfacial region that is isotropic in the plane parallel to the dividing surface. The dividing surface is assumed to be a simple interface, without memory effects or yield stresses. We derive an expression for the surface diffusion coefficient in terms of two parameters of the interfacial region: the coefficient for plane-parallel diffusion D (AB)aa(ξ) , and the driving force d(B)I||(ξ) . This driving force is the parallel component of the driving force for diffusion in the interfacial region. We derive an expression for this driving force using the entropy balance.

Method and apparatus for probing relative volume fractions

DOEpatents

Jandrasits, W.G.; Kikta, T.J.

1998-03-17

A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining therebetween a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirely of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction. 9 figs.
Polymer scaling and dynamics in steady-state sedimentation at infinite Péclet number.

PubMed

Lehtola, V; Punkkinen, O; Ala-Nissila, T

2007-11-01

We consider the static and dynamical behavior of a flexible polymer chain under steady-state sedimentation using analytic arguments and computer simulations. The model system comprises a single coarse-grained polymer chain of N segments, which resides in a Newtonian fluid as described by the Navier-Stokes equations. The chain is driven into nonequilibrium steady state by gravity acting on each segment. The equations of motion for the segments and the Navier-Stokes equations are solved simultaneously using an immersed boundary method, where thermal fluctuations are neglected. To characterize the chain conformation, we consider its radius of gyration RG(N). We find that the presence of gravity explicitly breaks the spatial symmetry leading to anisotropic scaling of the components of RG with N along the direction of gravity RG, parallel and perpendicular to it RG, perpendicular, respectively. We numerically estimate the corresponding anisotropic scaling exponents nu parallel approximately 0.79 and nu perpendicular approximately 0.45, which differ significantly from the equilibrium scaling exponent nue=0.588 in three dimensions. This indicates that on the average, the chain becomes elongated along the sedimentation direction for large enough N. We present a generalization of the Flory scaling argument, which is in good agreement with the numerical results. It also reveals an explicit dependence of the scaling exponents on the Reynolds number. To study the dynamics of the chain, we compute its effective diffusion coefficient D(N), which does not contain Brownian motion. For the range of values of N used here, we find that both the parallel and perpendicular components of D increase with the chain length N, in contrast to the case of thermal diffusion in equilibrium. This is caused by the fluid-driven fluctuations in the internal configuration of the polymer that are magnified as polymer size becomes larger.
Increasing processor utilization during parallel computation rundown

NASA Technical Reports Server (NTRS)

Jones, W. H.

1986-01-01

Some parallel processing environments provide for asynchronous execution and completion of general purpose parallel computations from a single computational phase. When all the computations from such a phase are complete, a new parallel computational phase is begun. Depending upon the granularity of the parallel computations to be performed, there may be a shortage of available work as a particular computational phase draws to a close (computational rundown). This can result in the waste of computing resources and the delay of the overall problem. In many practical instances, strict sequential ordering of phases of parallel computation is not totally required. In such cases, the beginning of one phase can be correctly computed before the end of a previous phase is completed. This allows additional work to be generated somewhat earlier to keep computing resources busy during each computational rundown. The conditions under which this can occur are identified and the frequency of occurrence of such overlapping in an actual parallel Navier-Stokes code is reported. A language construct is suggested and possible control strategies for the management of such computational phase overlapping are discussed.
Broadcasting a message in a parallel computer

DOEpatents

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Force user's manual: A portable, parallel FORTRAN

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

1990-01-01

The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
The Pore-scale modeling of multiphase flows in reservoir rocks using the lattice Boltzmann method

NASA Astrophysics Data System (ADS)

Mu, Y.; Baldwin, C. H.; Toelke, J.; Grader, A.

2011-12-01

Digital rock physics (DRP) is a new technology to compute the physical and fluid flow properties of reservoir rocks. In this approach, pore scale images of the porous rock are obtained and processed to create highly accurate 3D digital rock sample, and then the rock properties are evaluated by advanced numerical methods at the pore scale. Ingrain's DRP technology is a breakthrough for oil and gas companies that need large volumes of accurate results faster than the current special core analysis (SCAL) laboratories can normally deliver. In this work, we compute the multiphase fluid flow properties of 3D digital rocks using D3Q19 immiscible LBM with two relaxation times (TRT). For efficient implementation on GPU, we improved and reformulated color-gradient model proposed by Gunstensen and Rothmann. Furthermore, we only use one-lattice with the sparse data structure: only allocate memory for pore nodes on GPU. We achieved more than 100 million fluid lattice updates per second (MFLUPS) for two-phase LBM on single Fermi-GPU and high parallel efficiency on Multi-GPUs. We present and discuss our simulation results of important two-phase fluid flow properties, such as capillary pressure and relative permeabilities. We also investigate the effects of resolution and wettability on multiphase flows. Comparison of direct measurement results with the LBM-based simulations shows practical ability of DRP to predict two-phase flow properties of reservoir rock.
Simulation of particle motion in a closed conduit validated against experimental data

NASA Astrophysics Data System (ADS)

Dolanský, Jindřich

2015-05-01

Motion of a number of spherical particles in a closed conduit is examined by means of both simulation and experiment. The bed of the conduit is covered by stationary spherical particles of the size of the moving particles. The flow is driven by experimentally measured velocity profiles which are inputs of the simulation. Altering input velocity profiles generates various trajectory patterns. The lattice Boltzmann method (LBM) based simulation is developed to study mutual interactions of the flow and the particles. The simulation enables to model both the particle motion and the fluid flow. The entropic LBM is employed to deal with the flow characterized by the high Reynolds number. The entropic modification of the LBM along with the enhanced refinement of the lattice grid yield an increase in demands on computational resources. Due to the inherently parallel nature of the LBM it can be handled by employing the Parallel Computing Toolbox (MATLAB) and other transformations enabling usage of the CUDA GPU computing technology. The trajectories of the particles determined within the LBM simulation are validated against data gained from the experiments. The compatibility of the simulation results with the outputs of experimental measurements is evaluated. The accuracy of the applied approach is assessed and stability and efficiency of the simulation is also considered.
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Advances in computational design and analysis of airbreathing propulsion systems

NASA Technical Reports Server (NTRS)

Klineberg, John M.

1989-01-01

The development of commercial and military aircraft depends, to a large extent, on engine manufacturers being able to achieve significant increases in propulsion capability through improved component aerodynamics, materials, and structures. The recent history of propulsion has been marked by efforts to develop computational techniques that can speed up the propulsion design process and produce superior designs. The availability of powerful supercomputers, such as the NASA Numerical Aerodynamic Simulator, and the potential for even higher performance offered by parallel computer architectures, have opened the door to the use of multi-dimensional simulations to study complex physical phenomena in propulsion systems that have previously defied analysis or experimental observation. An overview of several NASA Lewis research efforts is provided that are contributing toward the long-range goal of a numerical test-cell for the integrated, multidisciplinary design, analysis, and optimization of propulsion systems. Specific examples in Internal Computational Fluid Mechanics, Computational Structural Mechanics, Computational Materials Science, and High Performance Computing are cited and described in terms of current capabilities, technical challenges, and future research directions.
Parallel computing works

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
Distributing an executable job load file to compute nodes in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
Laplace-domain waveform modeling and inversion for the 3D acoustic-elastic coupled media

NASA Astrophysics Data System (ADS)

Shin, Jungkyun; Shin, Changsoo; Calandra, Henri

2016-06-01

Laplace-domain waveform inversion reconstructs long-wavelength subsurface models by using the zero-frequency component of damped seismic signals. Despite the computational advantages of Laplace-domain waveform inversion over conventional frequency-domain waveform inversion, an acoustic assumption and an iterative matrix solver have been used to invert 3D marine datasets to mitigate the intensive computing cost. In this study, we develop a Laplace-domain waveform modeling and inversion algorithm for 3D acoustic-elastic coupled media by using a parallel sparse direct solver library (MUltifrontal Massively Parallel Solver, MUMPS). We precisely simulate a real marine environment by coupling the 3D acoustic and elastic wave equations with the proper boundary condition at the fluid-solid interface. In addition, we can extract the elastic properties of the Earth below the sea bottom from the recorded acoustic pressure datasets. As a matrix solver, the parallel sparse direct solver is used to factorize the non-symmetric impedance matrix in a distributed memory architecture and rapidly solve the wave field for a number of shots by using the lower and upper matrix factors. Using both synthetic datasets and real datasets obtained by a 3D wide azimuth survey, the long-wavelength component of the P-wave and S-wave velocity models is reconstructed and the proposed modeling and inversion algorithm are verified. A cluster of 80 CPU cores is used for this study.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Comparisons of some large scientific computers

NASA Technical Reports Server (NTRS)

Credeur, K. R.

1981-01-01

In 1975, the National Aeronautics and Space Administration (NASA) began studies to assess the technical and economic feasibility of developing a computer having sustained computational speed of one billion floating point operations per second and a working memory of at least 240 million words. Such a powerful computer would allow computational aerodynamics to play a major role in aeronautical design and advanced fluid dynamics research. Based on favorable results from these studies, NASA proceeded with developmental plans. The computer was named the Numerical Aerodynamic Simulator (NAS). To help insure that the estimated cost, schedule, and technical scope were realistic, a brief study was made of past large scientific computers. Large discrepancies between inception and operation in scope, cost, or schedule were studied so that they could be minimized with NASA's proposed new compter. The main computers studied were the ILLIAC IV, STAR 100, Parallel Element Processor Ensemble (PEPE), and Shuttle Mission Simulator (SMS) computer. Comparison data on memory and speed were also obtained on the IBM 650, 704, 7090, 360-50, 360-67, 360-91, and 370-195; the CDC 6400, 6600, 7600, CYBER 203, and CYBER 205; CRAY 1; and the Advanced Scientific Computer (ASC). A few lessons learned conclude the report.
Methodes iteratives paralleles: Applications en neutronique et en mecanique des fluides

NASA Astrophysics Data System (ADS)

Qaddouri, Abdessamad

Dans cette these, le calcul parallele est applique successivement a la neutronique et a la mecanique des fluides. Dans chacune de ces deux applications, des methodes iteratives sont utilisees pour resoudre le systeme d'equations algebriques resultant de la discretisation des equations du probleme physique. Dans le probleme de neutronique, le calcul des matrices des probabilites de collision (PC) ainsi qu'un schema iteratif multigroupe utilisant une methode inverse de puissance sont parallelises. Dans le probleme de mecanique des fluides, un code d'elements finis utilisant un algorithme iteratif du type GMRES preconditionne est parallelise. Cette these est presentee sous forme de six articles suivis d'une conclusion. Les cinq premiers articles traitent des applications en neutronique, articles qui representent l'evolution de notre travail dans ce domaine. Cette evolution passe par un calcul parallele des matrices des PC et un algorithme multigroupe parallele teste sur un probleme unidimensionnel (article 1), puis par deux algorithmes paralleles l'un mutiregion l'autre multigroupe, testes sur des problemes bidimensionnels (articles 2--3). Ces deux premieres etapes sont suivies par l'application de deux techniques d'acceleration, le rebalancement neutronique et la minimisation du residu aux deux algorithmes paralleles (article 4). Finalement, on a mis en oeuvre l'algorithme multigroupe et le calcul parallele des matrices des PC sur un code de production DRAGON ou les tests sont plus realistes et peuvent etre tridimensionnels (article 5). Le sixieme article (article 6), consacre a l'application a la mecanique des fluides, traite la parallelisation d'un code d'elements finis FES ou le partitionneur de graphe METIS et la librairie PSPARSLIB sont utilises.
Design Aspects of the Rayleigh Convection Code

NASA Astrophysics Data System (ADS)

Featherstone, N. A.

2017-12-01

Understanding the long-term generation of planetary or stellar magnetic field requires complementary knowledge of the large-scale fluid dynamics pervading large fractions of the object's interior. Such large-scale motions are sensitive to the system's geometry which, in planets and stars, is spherical to a good approximation. As a result, computational models designed to study such systems often solve the MHD equations in spherical geometry, frequently employing a spectral approach involving spherical harmonics. We present computational and user-interface design aspects of one such modeling tool, the Rayleigh convection code, which is suitable for deployment on desktop and petascale-hpc architectures alike. In this poster, we will present an overview of this code's parallel design and its built-in diagnostics-output package. Rayleigh has been developed with NSF support through the Computational Infrastructure for Geodynamics and is expected to be released as open-source software in winter 2017/2018.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Slattery, Stuart R

ExaMPM is a mini-application for the Material Point Method (MPM) for studying the application of MPM to future exascale computing systems. MPM is a general method for computational mechanics and fluids and is used in a wide variety of science and engineering disciplines to study problems with large deformations, phase change, fracture, and other phenomena. ExaMPM provides a reference implementation of MPM as described in the 1994 work of Sulsky et.al. (Sulsky, Deborah, Zhen Chen, and Howard L. Schreyer. "A particle method for history-dependent materials." Computer methods in applied mechanics and engineering 118.1-2 (1994): 179-196.). The software can solve basicmore » MPM problems in solid mechanics using the original algorithm of Sulsky with explicit time integration, basic geometries, and free-slip and no-slip boundary conditions as described in the reference. ExaMPM is intended to be used as a starting point to design new parallel algorithms for the next generation of DOE supercomputers.« less
Computer hardware fault administration

DOEpatents

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-09-14

Computer hardware fault administration carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links connected to the compute nodes. Typical embodiments carry out hardware fault administration by identifying a location of a defective link in the first data communications network of the parallel computer and routing communications data around the defective link through the second data communications network of the parallel computer.
New Developments in Modeling MHD Systems on High Performance Computing Architectures

NASA Astrophysics Data System (ADS)

Germaschewski, K.; Raeder, J.; Larson, D. J.; Bhattacharjee, A.

2009-04-01

Modeling the wide range of time and length scales present even in fluid models of plasmas like MHD and X-MHD (Extended MHD including two fluid effects like Hall term, electron inertia, electron pressure gradient) is challenging even on state-of-the-art supercomputers. In the last years, HPC capacity has continued to grow exponentially, but at the expense of making the computer systems more and more difficult to program in order to get maximum performance. In this paper, we will present a new approach to managing the complexity caused by the need to write efficient codes: Separating the numerical description of the problem, in our case a discretized right hand side (r.h.s.), from the actual implementation of efficiently evaluating it. An automatic code generator is used to describe the r.h.s. in a quasi-symbolic form while leaving the translation into efficient and parallelized code to a computer program itself. We implemented this approach for OpenGGCM (Open General Geospace Circulation Model), a model of the Earth's magnetosphere, which was accelerated by a factor of three on regular x86 architecture and a factor of 25 on the Cell BE architecture (commonly known for its deployment in Sony's PlayStation 3).
CFD modelling of abdominal aortic aneurysm on hemodynamic loads using a realistic geometry with CT.

PubMed

Soudah, Eduardo; Ng, E Y K; Loong, T H; Bordone, Maurizio; Pua, Uei; Narayanan, Sriram

2013-01-01

The objective of this study is to find a correlation between the abdominal aortic aneurysm (AAA) geometric parameters, wall stress shear (WSS), abdominal flow patterns, intraluminal thrombus (ILT), and AAA arterial wall rupture using computational fluid dynamics (CFD). Real AAA 3D models were created by three-dimensional (3D) reconstruction of in vivo acquired computed tomography (CT) images from 5 patients. Based on 3D AAA models, high quality volume meshes were created using an optimal tetrahedral aspect ratio for the whole domain. In order to quantify the WSS and the recirculation inside the AAA, a 3D CFD using finite elements analysis was used. The CFD computation was performed assuming that the arterial wall is rigid and the blood is considered a homogeneous Newtonian fluid with a density of 1050 kg/m(3) and a kinematic viscosity of 4 × 10(-3) Pa·s. Parallelization procedures were used in order to increase the performance of the CFD calculations. A relation between AAA geometric parameters (asymmetry index ( β ), saccular index ( γ ), deformation diameter ratio ( χ ), and tortuosity index ( ε )) and hemodynamic loads was observed, and it could be used as a potential predictor of AAA arterial wall rupture and potential ILT formation.

Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

NASA Technical Reports Server (NTRS)

Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

2004-01-01

High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.
Resealable, optically accessible, PDMS-free fluidic platform for ex vivo interrogation of pancreatic islets.

PubMed

Lenguito, Giovanni; Chaimov, Deborah; Weitz, Jonathan R; Rodriguez-Diaz, Rayner; Rawal, Siddarth A K; Tamayo-Garcia, Alejandro; Caicedo, Alejandro; Stabler, Cherie L; Buchwald, Peter; Agarwal, Ashutosh

2017-02-28

We report the design and fabrication of a robust fluidic platform built out of inert plastic materials and micromachined features that promote optimized convective fluid transport. The platform is tested for perfusion interrogation of rodent and human pancreatic islets, dynamic secretion of hormones, concomitant live-cell imaging, and optogenetic stimulation of genetically engineered islets. A coupled quantitative fluid dynamics computational model of glucose stimulated insulin secretion and fluid dynamics was first utilized to design device geometries that are optimal for complete perfusion of three-dimensional islets, effective collection of secreted insulin, and minimization of system volumes and associated delays. Fluidic devices were then fabricated through rapid prototyping techniques, such as micromilling and laser engraving, as two interlocking parts from materials that are non-absorbent and inert. Finally, the assembly was tested for performance using both rodent and human islets with multiple assays conducted in parallel, such as dynamic perfusion, staining and optogenetics on standard microscopes, as well as for integration with commercial perfusion machines. The optimized design of convective fluid flows, use of bio-inert and non-absorbent materials, reversible assembly, manual access for loading and unloading of islets, and straightforward integration with commercial imaging and fluid handling systems proved to be critical for perfusion assay, and particularly suited for time-resolved optogenetics studies.
Thermal noise in confined fluids.

PubMed

Sanghi, T; Aluru, N R

2014-11-07

In this work, we discuss a combined memory function equation (MFE) and generalized Langevin equation (GLE) approach (referred to as MFE/GLE formulation) to characterize thermal noise in confined fluids. Our study reveals that for fluids confined inside nanoscale geometries, the correlation time and the time decay of the autocorrelation function of the thermal noise are not significantly different across the confinement. We show that it is the strong cross-correlation of the mean force with the molecular velocity that gives rise to the spatial anisotropy in the velocity-autocorrelation function of the confined fluids. Further, we use the MFE/GLE formulation to extract the thermal force a fluid molecule experiences in a MD simulation. Noise extraction from MD simulation suggests that the frequency distribution of the thermal force is non-Gaussian. Also, the frequency distribution of the thermal force near the confining surface is found to be different in the direction parallel and perpendicular to the confinement. We also use the formulation to compute the noise correlation time of water confined inside a (6,6) carbon-nanotube (CNT). It is observed that inside the (6,6) CNT, in which water arranges itself in a highly concerted single-file arrangement, the correlation time of thermal noise is about an order of magnitude higher than that of bulk water.
Thermal noise in confined fluids

NASA Astrophysics Data System (ADS)

Sanghi, T.; Aluru, N. R.

2014-11-01

In this work, we discuss a combined memory function equation (MFE) and generalized Langevin equation (GLE) approach (referred to as MFE/GLE formulation) to characterize thermal noise in confined fluids. Our study reveals that for fluids confined inside nanoscale geometries, the correlation time and the time decay of the autocorrelation function of the thermal noise are not significantly different across the confinement. We show that it is the strong cross-correlation of the mean force with the molecular velocity that gives rise to the spatial anisotropy in the velocity-autocorrelation function of the confined fluids. Further, we use the MFE/GLE formulation to extract the thermal force a fluid molecule experiences in a MD simulation. Noise extraction from MD simulation suggests that the frequency distribution of the thermal force is non-Gaussian. Also, the frequency distribution of the thermal force near the confining surface is found to be different in the direction parallel and perpendicular to the confinement. We also use the formulation to compute the noise correlation time of water confined inside a (6,6) carbon-nanotube (CNT). It is observed that inside the (6,6) CNT, in which water arranges itself in a highly concerted single-file arrangement, the correlation time of thermal noise is about an order of magnitude higher than that of bulk water.
Injector Design Tool Improvements: User's manual for FDNS V.4.5

NASA Technical Reports Server (NTRS)

Chen, Yen-Sen; Shang, Huan-Min; Wei, Hong; Liu, Jiwen

1998-01-01

The major emphasis of the current effort is in the development and validation of an efficient parallel machine computational model, based on the FDNS code, to analyze the fluid dynamics of a wide variety of liquid jet configurations for general liquid rocket engine injection system applications. This model includes physical models for droplet atomization, breakup/coalescence, evaporation, turbulence mixing and gas-phase combustion. Benchmark validation cases for liquid rocket engine chamber combustion conditions will be performed for model validation purpose. Test cases may include shear coaxial, swirl coaxial and impinging injection systems with combinations LOXIH2 or LOXISP-1 propellant injector elements used in rocket engine designs. As a final goal of this project, a well tested parallel CFD performance methodology together with a user's operation description in a final technical report will be reported at the end of the proposed research effort.
Turbopump Performance Improved by Evolutionary Algorithms

NASA Technical Reports Server (NTRS)

Oyama, Akira; Liou, Meng-Sing

2002-01-01

The development of design optimization technology for turbomachinery has been initiated using the multiobjective evolutionary algorithm under NASA's Intelligent Synthesis Environment and Revolutionary Aeropropulsion Concepts programs. As an alternative to the traditional gradient-based methods, evolutionary algorithms (EA's) are emergent design-optimization algorithms modeled after the mechanisms found in natural evolution. EA's search from multiple points, instead of moving from a single point. In addition, they require no derivatives or gradients of the objective function, leading to robustness and simplicity in coupling any evaluation codes. Parallel efficiency also becomes very high by using a simple master-slave concept for function evaluations, since such evaluations often consume the most CPU time, such as computational fluid dynamics. Application of EA's to multiobjective design problems is also straightforward because EA's maintain a population of design candidates in parallel. Because of these advantages, EA's are a unique and attractive approach to real-world design optimization problems.
Supersonic civil airplane study and design: Performance and sonic boom

NASA Technical Reports Server (NTRS)

Cheung, Samson

1995-01-01

Since aircraft configuration plays an important role in aerodynamic performance and sonic boom shape, the configuration of the next generation supersonic civil transport has to be tailored to meet high aerodynamic performance and low sonic boom requirements. Computational fluid dynamics (CFD) can be used to design airplanes to meet these dual objectives. The work and results in this report are used to support NASA's High Speed Research Program (HSRP). CFD tools and techniques have been developed for general usages of sonic boom propagation study and aerodynamic design. Parallel to the research effort on sonic boom extrapolation, CFD flow solvers have been coupled with a numeric optimization tool to form a design package for aircraft configuration. This CFD optimization package has been applied to configuration design on a low-boom concept and an oblique all-wing concept. A nonlinear unconstrained optimizer for Parallel Virtual Machine has been developed for aerodynamic design and study.
Impact of mineral precipitation on flow and mixing in porous media determined by microcomputed tomography and MRI

DOE PAGES

Bray, Joshua M.; Lauchnor, Ellen G.; Redden, George D.; ...

2016-12-21

Here, precipitation reactions in porous media influence transport properties of the environment and can control advective and dispersive transport. In subsurface environments, mixing of saline groundwater or injected solutions for remediation with fresh groundwater can induce supersaturation of constituents and drive precipitation reactions. Magnetic resonance imaging (MRI) and micro-computed tomography (µ-CT) were employed as complimentary techniques to evaluate advection, dispersion and formation of precipitate in a 3D porous media flow cell. Two parallel fluids were flowed concentrically through the porous media under two flow rate conditions with Na 2CO 3 and CaCl 2 in the inner and outer fluids, respectively.more » Upon mixing, calcium carbonate became supersaturated and formed a precipitate at the interface of the two fluids. Spatial maps of changing local velocity fields and dispersion in the flow cell were generated from MRI, while high resolution imaging of the precipitate formed in the porous media was achieved via µ-CT imaging. Formation of a precipitate layer minimized dispersive and advective transport between the two fluids and the shape of the precipitation was influenced by the flow rate condition.« less
Electroosmotic flow of biorheological micropolar fluids through microfluidic channels

NASA Astrophysics Data System (ADS)

Chaube, Mithilesh Kumar; Yadav, Ashu; Tripathi, Dharmendra; Bég, O. Anwar

2018-05-01

An analytical analysis is presented in this work to assess the influence of micropolar nature of fluids in fully developed flow induced by electrokinetically driven peristaltic pumping through a parallel plate microchannel. The walls of the channel are assumed as sinusoidal wavy to analyze the peristaltic flow nature. We consider that the wavelength of the wall motion is much larger as compared to the channel width to validate the lubrication theory. To simplify the Poisson Boltzmann equation, we also use the Debye-Hückel linearization. We consider governing equation for micropolar fluid in absence of body force and couple effects however external electric field is employed. The solutions for axial velocity, spin velocity, flow rate, pressure rise, and stream functions subjected to given physical boundary conditions are computed. The effects of pertinent parameters like Debye length and Helmholtz-Smoluchowski velocity which characterize the EDL phenomenon and external electric field, coupling number and micropolar parameter which characterize the micropolar fluid behavior, on peristaltic pumping are discussed through the illustrations. The results show that peristaltic pumping may alter by applying external electric fields. This model can be used to design and engineer the peristalsis-lab-on-chip and micro peristaltic syringe pumps for biomedical applications.
Impact of mineral precipitation on flow and mixing in porous media determined by microcomputed tomography and MRI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bray, Joshua M.; Lauchnor, Ellen G.; Redden, George D.

Here, precipitation reactions in porous media influence transport properties of the environment and can control advective and dispersive transport. In subsurface environments, mixing of saline groundwater or injected solutions for remediation with fresh groundwater can induce supersaturation of constituents and drive precipitation reactions. Magnetic resonance imaging (MRI) and micro-computed tomography (µ-CT) were employed as complimentary techniques to evaluate advection, dispersion and formation of precipitate in a 3D porous media flow cell. Two parallel fluids were flowed concentrically through the porous media under two flow rate conditions with Na 2CO 3 and CaCl 2 in the inner and outer fluids, respectively.more » Upon mixing, calcium carbonate became supersaturated and formed a precipitate at the interface of the two fluids. Spatial maps of changing local velocity fields and dispersion in the flow cell were generated from MRI, while high resolution imaging of the precipitate formed in the porous media was achieved via µ-CT imaging. Formation of a precipitate layer minimized dispersive and advective transport between the two fluids and the shape of the precipitation was influenced by the flow rate condition.« less
Method and apparatus for probing relative volume fractions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jandrasits, W.G.; Kikta, T.J.

1996-12-31

A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining there between a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirelymore » of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction.« less
Research and educational initiatives at the Syracuse University Center for Hypersonics

NASA Technical Reports Server (NTRS)

Spina, E.; Lagraff, J.; Davidson, B.; Bogucz, E.; Dang, T.

1995-01-01

The Department of Mechanical, Aerospace, and Manufacturing Engineering and the Northeast Parallel Architectures Center of Syracuse University have been funded by NASA to establish a program to educate young engineers in the hypersonic disciplines. This goal is being achieved through a comprehensive five-year program that includes elements of undergraduate instruction, advanced graduate coursework, undergraduate research, and leading-edge hypersonics research. The research foci of the Syracuse Center for Hypersonics are three-fold; high-temperature composite materials, measurements in turbulent hypersonic flows, and the application of high-performance computing to hypersonic fluid dynamics.
Droplet flow along the wall of rectangular channel with gradient of wettability

NASA Astrophysics Data System (ADS)

Kupershtokh, A. L.

2018-03-01

The lattice Boltzmann equations (LBE) method (LBM) is applicable for simulating the multiphysics problems of fluid flows with free boundaries, taking into account the viscosity, surface tension, evaporation and wetting degree of a solid surface. Modeling of the nonstationary motion of a drop of liquid along a solid surface with a variable level of wettability is carried out. For the computer simulation of such a problem, the three-dimensional lattice Boltzmann equations method D3Q19 is used. The LBE method allows us to parallelize the calculations on multiprocessor graphics accelerators using the CUDA programming technology.
Optimum parallel step-sector bearing lubricated with an incompressible fluid

NASA Technical Reports Server (NTRS)

Hamrock, B. J.

1983-01-01

The dimensionless parameters normally associated with a step sector thrust bearing are the film thickness ratio, the dimensionless step location, the number of sectors, the radius ratio, and the angular extent of the lubrication feed groove. The optimum number of sectors and the parallel step configuration for a step sector thrust bearing while considering load capacity or stiffness and assuming an incompressible fluid are presented.
Implementation of parallel moment equations in NIMROD

NASA Astrophysics Data System (ADS)

Lee, Hankyu Q.; Held, Eric D.; Ji, Jeong-Young

2017-10-01

As collisionality is low (the Knudsen number is large) in many plasma applications, kinetic effects become important, particularly in parallel dynamics for magnetized plasmas. Fluid models can capture some kinetic effects when integral parallel closures are adopted. The adiabatic and linear approximations are used in solving general moment equations to obtain the integral closures. In this work, we present an effort to incorporate non-adiabatic (time-dependent) and nonlinear effects into parallel closures. Instead of analytically solving the approximate moment system, we implement exact parallel moment equations in the NIMROD fluid code. The moment code is expected to provide a natural convergence scheme by increasing the number of moments. Work in collaboration with the PSI Center and supported by the U.S. DOE under Grant Nos. DE-SC0014033, DE-SC0016256, and DE-FG02-04ER54746.
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Development of Reduced-Order Models for Aeroelastic and Flutter Prediction Using the CFL3Dv6.0 Code

NASA Technical Reports Server (NTRS)

Silva, Walter A.; Bartels, Robert E.

2002-01-01

A reduced-order model (ROM) is developed for aeroelastic analysis using the CFL3D version 6.0 computational fluid dynamics (CFD) code, recently developed at the NASA Langley Research Center. This latest version of the flow solver includes a deforming mesh capability, a modal structural definition for nonlinear aeroelastic analyses, and a parallelization capability that provides a significant increase in computational efficiency. Flutter results for the AGARD 445.6 Wing computed using CFL3D v6.0 are presented, including discussion of associated computational costs. Modal impulse responses of the unsteady aerodynamic system are then computed using the CFL3Dv6 code and transformed into state-space form. Important numerical issues associated with the computation of the impulse responses are presented. The unsteady aerodynamic state-space ROM is then combined with a state-space model of the structure to create an aeroelastic simulation using the MATLAB/SIMULINK environment. The MATLAB/SIMULINK ROM is used to rapidly compute aeroelastic transients including flutter. The ROM shows excellent agreement with the aeroelastic analyses computed using the CFL3Dv6.0 code directly.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

NASA Technical Reports Server (NTRS)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
PEGASUS 5: An Automated Pre-Processor for Overset-Grid CFD

NASA Technical Reports Server (NTRS)

Suhs, Norman E.; Rogers, Stuart E.; Dietz, William E.; Kwak, Dochan (Technical Monitor)

2002-01-01

An all new, automated version of the PEGASUS software has been developed and tested. PEGASUS provides the hole-cutting and connectivity information between overlapping grids, and is used as the final part of the grid generation process for overset-grid computational fluid dynamics approaches. The new PEGASUS code (Version 5) has many new features: automated hole cutting; a projection scheme for fixing gaps in overset surfaces; more efficient interpolation search methods using an alternating digital tree; hole-size optimization based on adding additional layers of fringe points; and an automatic restart capability. The new code has also been parallelized using the Message Passing Interface standard. The parallelization performance provides efficient speed-up of the execution time by an order of magnitude, and up to a factor of 30 for very large problems. The results of three example cases are presented: a three-element high-lift airfoil, a generic business jet configuration, and a complete Boeing 777-200 aircraft in a high-lift landing configuration. Comparisons of the computed flow fields for the airfoil and 777 test cases between the old and new versions of the PEGASUS codes show excellent agreement with each other and with experimental results.
Internal Flow Thermal/Fluid Modeling of STS-107 Port Wing in Support of the Columbia Accident Investigation Board

NASA Technical Reports Server (NTRS)

Sharp, John R.; Kittredge, Ken; Schunk, Richard G.

2003-01-01

As part of the aero-thermodynamics team supporting the Columbia Accident Investigation Board (CAB), the Marshall Space Flight Center was asked to perform engineering analyses of internal flows in the port wing. The aero-thermodynamics team was split into internal flow and external flow teams with the support being divided between shorter timeframe engineering methods and more complex computational fluid dynamics. In order to gain a rough order of magnitude type of knowledge of the internal flow in the port wing for various breach locations and sizes (as theorized by the CAB to have caused the Columbia re-entry failure), a bulk venting model was required to input boundary flow rates and pressures to the computational fluid dynamics (CFD) analyses. This paper summarizes the modeling that was done by MSFC in Thermal Desktop. A venting model of the entire Orbiter was constructed in FloCAD based on Rockwell International s flight substantiation analyses and the STS-107 reentry trajectory. Chemical equilibrium air thermodynamic properties were generated for SINDA/FLUINT s fluid property routines from a code provided by Langley Research Center. In parallel, a simplified thermal mathematical model of the port wing, including the Thermal Protection System (TPS), was based on more detailed Shuttle re-entry modeling previously done by the Dryden Flight Research Center. Once the venting model was coupled with the thermal model of the wing structure with chemical equilibrium air properties, various breach scenarios were assessed in support of the aero-thermodynamics team. The construction of the coupled model and results are presented herein.
Computer simulation of liquid-vapor coexistence of confined quantum fluids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Trejos, Víctor M.; Gil-Villegas, Alejandro, E-mail: gil@fisica.ugto.mx; Martinez, Alejandro

2013-11-14

The liquid-vapor coexistence (LV) of bulk and confined quantum fluids has been studied by Monte Carlo computer simulation for particles interacting via a semiclassical effective pair potential V{sub eff}(r) = V{sub LJ} + V{sub Q}, where V{sub LJ} is the Lennard-Jones 12-6 potential (LJ) and V{sub Q} is the first-order Wigner-Kirkwood (WK-1) quantum potential, that depends on β = 1/kT and de Boer's quantumness parameter Λ=h/σ√(mε), where k and h are the Boltzmann's and Planck's constants, respectively, m is the particle's mass, T is the temperature of the system, and σ and ε are the LJ potential parameters. The non-conformalmore » properties of the system of particles interacting via the effective pair potential V{sub eff}(r) are due to Λ, since the LV phase diagram is modified by varying Λ. We found that the WK-1 system gives an accurate description of the LV coexistence for bulk phases of several quantum fluids, obtained by the Gibbs Ensemble Monte Carlo method (GEMC). Confinement effects were introduced using the Canonical Ensemble (NVT) to simulate quantum fluids contained within parallel hard walls separated by a distance L{sub p}, within the range 2σ ⩽ L{sub p} ⩽ 6σ. The critical temperature of the system is reduced by decreasing L{sub p} and increasing Λ, and the liquid-vapor transition is not longer observed for L{sub p}/σ < 2, in contrast to what has been observed for the classical system.« less
Parallel Computing Using Web Servers and "Servlets".

ERIC Educational Resources Information Center

Lo, Alfred; Bloor, Chris; Choi, Y. K.

2000-01-01

Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method

NASA Astrophysics Data System (ADS)

Bi, Xunqiang

1997-12-01

In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Computer aided design of extrusion forming tools for complex geometry profiles

NASA Astrophysics Data System (ADS)

Goncalves, Nelson Daniel Ferreira

In the profile extrusion, the experience of the die designer is crucial for obtaining good results. In industry, it is quite usual the need of several experimental trials for a specific extrusion die before a balanced flow distribution is obtained. This experimental based trial-and-error procedure is time and money consuming, but, it works, and most of the profile extrusion companies rely on such method. However, the competition is forcing the industry to look for more effective procedures and the design of profile extrusion dies is not an exception. For this purpose, computer aided design seems to be a good route. Nowadays, the available computational rheology numerical codes allow the simulation of complex fluid flows. This permits the die designer to evaluate and to optimize the flow channel, without the need to have a physical die and to perform real extrusion trials. In this work, a finite volume based numerical code was developed, for the simulation of non-Newtonian (inelastic) fluid and non-isothermal flows using unstructured meshes. The developed code is able to model the forming and cooling stages of profile extrusion, and can be used to aid the design of forming tools used in the production of complex profiles. For the code verification three benchmark problems were tested: flow between parallel plates, flow around a cylinder, and the lid driven cavity flow. The code was employed to design two extrusion dies to produce complex cross section profiles: a medical catheter die and a wood plastic composite profile for decking applications. The last was experimentally validated. Simple extrusion dies used to produced L and T shaped profiles were studied in detail, allowing a better understanding of the effect of the main geometry parameters on the flow distribution. To model the cooling stage a new implicit formulation was devised, which allowed the achievement of better convergence rates and thus the reduction of the computation times. Having in mind the solution of large dimension problems, the code was parallelized using graphics processing units (GPUs). Speedups of ten times could be obtained, drastically decreasing the time required to obtain results.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

PubMed

Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

2018-04-20

A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid

NASA Astrophysics Data System (ADS)

Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua

2016-10-01

A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
Design Considerations of a Virtual Laboratory for Advanced X-ray Sources

NASA Astrophysics Data System (ADS)

Luginsland, J. W.; Frese, M. H.; Frese, S. D.; Watrous, J. J.; Heileman, G. L.

2004-11-01

The field of scientific computation has greatly advanced in the last few years, resulting in the ability to perform complex computer simulations that can predict the performance of real-world experiments in a number of fields of study. Among the forces driving this new computational capability is the advent of parallel algorithms, allowing calculations in three-dimensional space with realistic time scales. Electromagnetic radiation sources driven by high-voltage, high-current electron beams offer an area to further push the state-of-the-art in high fidelity, first-principles simulation tools. The physics of these x-ray sources combine kinetic plasma physics (electron beams) with dense fluid-like plasma physics (anode plasmas) and x-ray generation (bremsstrahlung). There are a number of mature techniques and software packages for dealing with the individual aspects of these sources, such as Particle-In-Cell (PIC), Magneto-Hydrodynamics (MHD), and radiation transport codes. The current effort is focused on developing an object-oriented software environment using the Rational© Unified Process and the Unified Modeling Language (UML) to provide a framework where multiple 3D parallel physics packages, such as a PIC code (ICEPIC), a MHD code (MACH), and a x-ray transport code (ITS) can co-exist in a system-of-systems approach to modeling advanced x-ray sources. Initial software design and assessments of the various physics algorithms' fidelity will be presented.
New technologies for advanced three-dimensional optimum shape design in aeronautics

NASA Astrophysics Data System (ADS)

Dervieux, Alain; Lanteri, Stéphane; Malé, Jean-Michel; Marco, Nathalie; Rostaing-Schmidt, Nicole; Stoufflet, Bruno

1999-05-01

The analysis of complex flows around realistic aircraft geometries is becoming more and more predictive. In order to obtain this result, the complexity of flow analysis codes has been constantly increasing, involving more refined fluid models and sophisticated numerical methods. These codes can only run on top computers, exhausting their memory and CPU capabilities. It is, therefore, difficult to introduce best analysis codes in a shape optimization loop: most previous works in the optimum shape design field used only simplified analysis codes. Moreover, as the most popular optimization methods are the gradient-based ones, the more complex the flow solver, the more difficult it is to compute the sensitivity code. However, emerging technologies are contributing to make such an ambitious project, of including a state-of-the-art flow analysis code into an optimisation loop, feasible. Among those technologies, there are three important issues that this paper wishes to address: shape parametrization, automated differentiation and parallel computing. Shape parametrization allows faster optimization by reducing the number of design variable; in this work, it relies on a hierarchical multilevel approach. The sensitivity code can be obtained using automated differentiation. The automated approach is based on software manipulation tools, which allow the differentiation to be quick and the resulting differentiated code to be rather fast and reliable. In addition, the parallel algorithms implemented in this work allow the resulting optimization software to run on increasingly larger geometries. Copyright
Behaviour at high pressure of Rb7NaGa8Si12O40·3H2O (a zeolite with EDI topology): a combined experimental-computational study

NASA Astrophysics Data System (ADS)

Gatta, G. D.; Tabacchi, G.; Fois, E.; Lee, Y.

2016-03-01

The high-pressure behaviour and the P-induced structural evolution of a synthetic zeolite Rb7NaGa8Si12O40·3H2O (with edingtonite-type structure) were investigated both by in situ synchrotron powder diffraction (with a diamond anvil cell and the methanol:ethanol:water = 16:3:1 mixture as pressure-transmitting fluid) up to 3.27 GPa and by ab initio first-principles computational modelling. No evidence of phase transition or penetration of P-fluid molecules was observed within the P-range investigated. The isothermal equation of state was determined; V 0 and K T0 refined with a second-order Birch-Murnaghan equation of state are V 0 = 1311.3(2) Å3 and K T0 = 29.8(7) GPa. The main deformation mechanism (at the atomic scale) in response to the applied pressure is represented by the cooperative rotation of the secondary building units (SBU) about their chain axis (i.e. [001]). The direct consequence of SBU anti-rotation on the zeolitic channels parallel to [001] is the increase in pore ellipticity with pressure, in response to the extension of the major axis and to the contraction of the minor axis of the elliptical channel parallel to [001]. The effect of the applied pressure on the bonding configuration of the extra-framework content is only secondary. A comparison between the P-induced main deformation mechanisms observed in Rb7NaGa8Si12O40·3H2O and those previously found in natural fibrous zeolites is made.
Fluid driven fracture mechanics in highly anisotropic shale: a laboratory study with application to hydraulic fracturing

NASA Astrophysics Data System (ADS)

Gehne, Stephan; Benson, Philip; Koor, Nick; Enfield, Mark

2017-04-01

The finding of considerable volumes of hydrocarbon resources within tight sedimentary rock formations in the UK led to focused attention on the fundamental fracture properties of low permeability rock types and hydraulic fracturing. Despite much research in these fields, there remains a scarcity of available experimental data concerning the fracture mechanics of fluid driven fracturing and the fracture properties of anisotropic, low permeability rock types. In this study, hydraulic fracturing is simulated in a controlled laboratory environment to track fracture nucleation (location) and propagation (velocity) in space and time and assess how environmental factors and rock properties influence the fracture process and the developing fracture network. Here we report data on employing fluid overpressure to generate a permeable network of micro tensile fractures in a highly anisotropic shale ( 50% P-wave velocity anisotropy). Experiments are carried out in a triaxial deformation apparatus using cylindrical samples. The bedding planes are orientated either parallel or normal to the major principal stress direction (σ1). A newly developed technique, using a steel guide arrangement to direct pressurised fluid into a sealed section of an axially drilled conduit, allows the pore fluid to contact the rock directly and to initiate tensile fractures from the pre-defined zone inside the sample. Acoustic Emission location is used to record and map the nucleation and development of the micro-fracture network. Indirect tensile strength measurements at atmospheric pressure show a high tensile strength anisotropy ( 60%) of the shale. Depending on the relative bedding orientation within the stress field, we find that fluid induced fractures in the sample propagate in two of the three principal fracture orientations: Divider and Short-Transverse. The fracture progresses parallel to the bedding plane (Short-Transverse orientation) if the bedding plane is aligned (parallel) with the direction of σ1. Conversely, the crack plane develops perpendicular to the bedding plane, if the bedding plane is orientated normal to σ1. Fracture initiation pressures are higher in the Divider orientation ( 24MPa) than in the Short-Transverse orientation ( 14MPa) showing a tensile strength anisotropy ( 42%) comparable to ambient tensile strength results. We then use X-Ray Computed Tomography (CT) 3D-images to evaluate the evolved fracture network in terms of fracture pattern, aperture and post-test water permeability. For both fracture orientations, very fine, axial fractures evolve over the entire length of the sample. For the fracturing in the Divider orientation, it has been observed, that in some cases, secondary fractures are branching of the main fracture. Test data from fluid driven fracturing experiments suggest that fracture pattern, fracture propagation trajectories and fracturing fluid pressure (initiation and propagation pressure) are predominantly controlled by the interaction between the anisotropic mechanical properties of the shale and the anisotropic stress environment. The orientation of inherent rock anisotropy relative to the principal stress directions seems to be the main control on fracture orientation and required fracturing pressure.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
Ductile creep and compaction: A mechanism for transiently increasing fluid pressure in mostly sealed fault zones

USGS Publications Warehouse

Sleep, Norman H.; Blanpied, M.L.

1994-01-01

A simple cyclic process is proposed to explain why major strike-slip fault zones, including the San Andreas, are weak. Field and laboratory studies suggest that the fluid within fault zones is often mostly sealed from that in the surrounding country rock. Ductile creep driven by the difference between fluid pressure and lithostatic pressure within a fault zone leads to compaction that increases fluid pressure. The increased fluid pressure allows frictional failure in earthquakes at shear tractions far below those required when fluid pressure is hydrostatic. The frictional slip associated with earthquakes creates porosity in the fault zone. The cycle adjusts so that no net porosity is created (if the fault zone remains constant width). The fluid pressure within the fault zone reaches long-term dynamic equilibrium with the (hydrostatic) pressure in the country rock. One-dimensional models of this process lead to repeatable and predictable earthquake cycles. However, even modest complexity, such as two parallel fault splays with different pressure histories, will lead to complicated earthquake cycles. Two-dimensional calculations allowed computation of stress and fluid pressure as a function of depth but had complicated behavior with the unacceptable feature that numerical nodes failed one at a time rather than in large earthquakes. A possible way to remove this unphysical feature from the models would be to include a failure law in which the coefficient of friction increases at first with frictional slip, stabilizing the fault, and then decreases with further slip, destabilizing it. ?? 1994 Birkha??user Verlag.

A hybrid numerical fluid dynamics code for resistive magnetohydrodynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Jeffrey

2006-04-01

Spasmos is a computational fluid dynamics code that uses two numerical methods to solve the equations of resistive magnetohydrodynamic (MHD) flows in compressible, inviscid, conducting media[1]. The code is implemented as a set of libraries for the Python programming language[2]. It represents conducting and non-conducting gases and materials with uncomplicated (analytic) equations of state. It supports calculations in 1D, 2D, and 3D geometry, though only the 1D configuation has received significant testing to date. Because it uses the Python interpreter as a front end, users can easily write test programs to model systems with a variety of different numerical andmore » physical parameters. Currently, the code includes 1D test programs for hydrodynamics (linear acoustic waves, the Sod weak shock[3], the Noh strong shock[4], the Sedov explosion[5], magnetic diffusion (decay of a magnetic pulse[6], a driven oscillatory "wine-cellar" problem[7], magnetic equilibrium), and magnetohydrodynamics (an advected magnetic pulse[8], linear MHD waves, a magnetized shock tube[9]). Spasmos current runs only in a serial configuration. In the future, it will use MPI for parallel computation.« less
Helicopter Blade-Vortex Interaction Noise with Comparisons to CFD Calculations

NASA Technical Reports Server (NTRS)

McCluer, Megan S.

1996-01-01

A comparison of experimental acoustics data and computational predictions was performed for a helicopter rotor blade interacting with a parallel vortex. The experiment was designed to examine the aerodynamics and acoustics of parallel Blade-Vortex Interaction (BVI) and was performed in the Ames Research Center (ARC) 80- by 120-Foot Subsonic Wind Tunnel. An independently generated vortex interacted with a small-scale, nonlifting helicopter rotor at the 180 deg azimuth angle to create the interaction in a controlled environment. Computational Fluid Dynamics (CFD) was used to calculate near-field pressure time histories. The CFD code, called Transonic Unsteady Rotor Navier-Stokes (TURNS), was used to make comparisons with the acoustic pressure measurement at two microphone locations and several test conditions. The test conditions examined included hover tip Mach numbers of 0.6 and 0.7, advance ratio of 0.2, positive and negative vortex rotation, and the vortex passing above and below the rotor blade by 0.25 rotor chords. The results show that the CFD qualitatively predicts the acoustic characteristics very well, but quantitatively overpredicts the peak-to-peak sound pressure level by 15 percent in most cases. There also exists a discrepancy in the phasing (about 4 deg) of the BVI event in some cases. Additional calculations were performed to examine the effects of vortex strength, thickness, time accuracy, and directionality. This study validates the TURNS code for prediction of near-field acoustic pressures of controlled parallel BVI.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
System and method for continuous solids slurry depressurization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leininger, Thomas Frederick; Steele, Raymond Douglas; Yen, Hsien-Chin William

A continuous slag processing system includes a rotating parallel disc pump, coupled to a motor and a brake. The rotating parallel disc pump includes opposing discs coupled to a shaft, an outlet configured to continuously receive a fluid at a first pressure, and an inlet configured to continuously discharge the fluid at a second pressure less than the first pressure. The rotating parallel disc pump is configurable in a reverse-acting pump mode and a letdown turbine mode. The motor is configured to drive the opposing discs about the shaft and against a flow of the fluid to control a differencemore » between the first pressure and the second pressure in the reverse-acting pump mode. The brake is configured to resist rotation of the opposing discs about the shaft to control the difference between the first pressure and the second pressure in the letdown turbine mode.« less
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
Center for Extended Magnetohydrodynamic Modeling Cooperative Agreement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carl R. Sovinec

The Center for Extended Magnetohydrodynamic Modeling (CEMM) is developing computer simulation models for predicting the behavior of magnetically confined plasmas. Over the first phase of support from the Department of Energy’s Scientific Discovery through Advanced Computing (SciDAC) initiative, the focus has been on macroscopic dynamics that alter the confinement properties of magnetic field configurations. The ultimate objective is to provide computational capabilities to predict plasma behavior—not unlike computational weather prediction—to optimize performance and to increase the reliability of magnetic confinement for fusion energy. Numerical modeling aids theoretical research by solving complicated mathematical models of plasma behavior including strong nonlinear effectsmore » and the influences of geometrical shaping of actual experiments. The numerical modeling itself remains an area of active research, due to challenges associated with simulating multiple temporal and spatial scales. The research summarized in this report spans computational and physical topics associated with state of the art simulation of magnetized plasmas. The tasks performed for this grant are categorized according to whether they are primarily computational, algorithmic, or application-oriented in nature. All involve the development and use of the Non-Ideal Magnetohydrodynamics with Rotation, Open Discussion (NIMROD) code, which is described at http://nimrodteam.org. With respect to computation, we have tested and refined methods for solving the large algebraic systems of equations that result from our numerical approximations of the physical model. Collaboration with the Terascale Optimal PDE Solvers (TOPS) SciDAC center led us to the SuperLU_DIST software library [http://crd.lbl.gov/~xiaoye/SuperLU/] for solving large sparse matrices using direct methods on parallel computers. Switching to this solver library boosted NIMROD’s performance by a factor of five in typical large nonlinear simulations, which has been publicized as a success story of SciDAC-fostered collaboration. Furthermore, the SuperLU software does not assume any mathematical symmetry, and its generality provides an important capability for extending the physical model beyond magnetohydrodynamics (MHD). With respect to algorithmic and model development, our most significant accomplishment is the development of a new method for solving plasma models that treat electrons as an independent plasma component. These ‘two-fluid’ models encompass MHD and add temporal and spatial scales that are beyond the response of the ion species. Implementation and testing of a previously published algorithm did not prove successful for NIMROD, and the new algorithm has since been devised, analyzed, and implemented. Two-fluid modeling, an important objective of the original NIMROD project, is now routine in 2D applications. Algorithmic components for 3D modeling are in place and tested; though, further computational work is still needed for efficiency. Other algorithmic work extends the ion-fluid stress tensor to include models for parallel and gyroviscous stresses. In addition, our hot-particle simulation capability received important refinements that permitted completion of a benchmark with the M3D code. A highlight of our applications work is the edge-localized mode (ELM) modeling, which was part of the first-ever computational Performance Target for the DOE Office of Fusion Energy Science, see http://www.science.doe.gov/ofes/performancetargets.shtml. Our efforts allowed MHD simulations to progress late into the nonlinear stage, where energy is conducted to the wall location. They also produced a two-fluid ELM simulation starting from experimental information and demonstrating critical drift effects that are characteristic of two-fluid physics. Another important application is the internal kink mode in a tokamak. Here, the primary purpose of the study has been to benchmark the two main code development lines of CEMM, NIMROD and M3D, on a relevant nonlinear problem. Results from the two codes show repeating nonlinear relaxation events driven by the kink mode over quantitatively comparable timescales. The work has launched a more comprehensive nonlinear benchmarking exercise, where realistic transport effects have an important role.« less
Pececillo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carlson, Neil; Jibben, Zechariah; Brady, Peter

2017-06-28

Pececillo is a proxy-app for the open source Truchas metal processing code (LA-CC-15-097). It implements many of the physics models used in Truchas: free-surface, incompressible Navier-Stokes fluid dynamics (e.g., water waves); heat transport, material phase change, view factor thermal radiation; species advection-diffusion; quasi-static, elastic/plastic solid mechanics with contact; electomagnetics (Maxwell's equations). The models are simplified versions that retain the fundamental computational complexity of the Truchas models while omitting many non-essential features and modeling capabilities. The purpose is to expose Truchas algorithms in a greatly simplified context where computer science problems related to parallel performance on advanced architectures can be moremore » easily investigated. While Pececillo is capable of performing simulations representative of typical Truchas metal casting, welding, and additive manufacturing simulations, it lacks many of the modeling capabilites needed for real applications.« less
Numerical Simulations of the Boundary Layer Transition Flight Experiment

NASA Technical Reports Server (NTRS)

Tang, Chun Y.; Trumble, Kerry A.; Campbell, Charles H.; Lessard, Victor R.; Wood, William A.

2010-01-01

Computational Fluid Dynamics (CFD) simulations were used to study the possible effects that the Boundary Layer Transition (BLT) Flight Experiments may have on the heating environment of the Space Shuttle during its entry to Earth. To investigate this issue, hypersonic calculations using the Data-Parallel Line Relaxation (DPLR) and Langley Aerothermodynamic Upwind Relaxation (LAURA) CFD codes were computed for a 0.75 tall protuberance at flight conditions of Mach 15 and 18. These initial results showed high surface heating on the BLT trip and the areas surrounding the protuberance. Since the predicted peak heating rates would exceed the thermal limits of the materials selected to construct the BLT trip, many changes to the geometry were attempted in order to reduce the surface heat flux. The following paper describes the various geometry revisions and the resulting heating environments predicted by the CFD codes.
Near-Body Grid Adaption for Overset Grids

NASA Technical Reports Server (NTRS)

Buning, Pieter G.; Pulliam, Thomas H.

2016-01-01

A solution adaption capability for curvilinear near-body grids has been implemented in the OVERFLOW overset grid computational fluid dynamics code. The approach follows closely that used for the Cartesian off-body grids, but inserts refined grids in the computational space of original near-body grids. Refined curvilinear grids are generated using parametric cubic interpolation, with one-sided biasing based on curvature and stretching ratio of the original grid. Sensor functions, grid marking, and solution interpolation tasks are implemented in the same fashion as for off-body grids. A goal-oriented procedure, based on largest error first, is included for controlling growth rate and maximum size of the adapted grid system. The adaption process is almost entirely parallelized using MPI, resulting in a capability suitable for viscous, moving body simulations. Two- and three-dimensional examples are presented.
Multiscale Modeling: A Review

NASA Astrophysics Data System (ADS)

Horstemeyer, M. F.

This review of multiscale modeling covers a brief history of various multiscale methodologies related to solid materials and the associated experimental influences, the various influence of multiscale modeling on different disciplines, and some examples of multiscale modeling in the design of structural components. Although computational multiscale modeling methodologies have been developed in the late twentieth century, the fundamental notions of multiscale modeling have been around since da Vinci studied different sizes of ropes. The recent rapid growth in multiscale modeling is the result of the confluence of parallel computing power, experimental capabilities to characterize structure-property relations down to the atomic level, and theories that admit multiple length scales. The ubiquitous research that focus on multiscale modeling has broached different disciplines (solid mechanics, fluid mechanics, materials science, physics, mathematics, biological, and chemistry), different regions of the world (most continents), and different length scales (from atoms to autos).
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.

PubMed

Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel

2012-09-25

Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics

PubMed Central

2012-01-01

Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
Parallel approach in RDF query processing

NASA Astrophysics Data System (ADS)

Vajgl, Marek; Parenica, Jan

2017-07-01

Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.
Parametric electroconvection in a weakly conducting fluid in a horizontal parallel-plate capacitor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kartavykh, N. N.; Smorodin, B. L., E-mail: bsmorodin@yandex.ru; Il’in, V. A.

2015-07-15

We study the flows of a nonuniformly heated weakly conducting fluid in an ac electric field of a horizontal parallel-plate capacitor. Analysis is carried out for fluids in which the charge formation is governed by electroconductive mechanism associated with the temperature dependence of the electrical conductivity of the medium. Periodic and chaotic regimes of fluid flow are investigated in the limiting case of instantaneous charge relaxation and for a finite relaxation time. Bifurcation diagrams and electroconvective regimes charts are constructed. The regions where fluid oscillations synchronize with the frequency of the external field are determined. Hysteretic transitions between electroconvection regimesmore » are studied. The scenarios of transition to chaotic oscillations are analyzed. Depending on the natural frequency of electroconvective system and the external field frequency, the transition from periodic to chaotic oscillations can occur via quasiperiodicity, a subharmonic cascade, or intermittence.« less
Research in parallel computing

NASA Technical Reports Server (NTRS)

Ortega, James M.; Henderson, Charles

1994-01-01

This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Ultrasonic fluid flow measurement method and apparatus

DOEpatents

Kronberg, J.W.

1993-10-12

An apparatus for measuring the flow of a fluid in a pipe using ultrasonic waves. The apparatus comprises an ultrasonic generator, a lens for focusing the sound energy produced by the generator, and means for directing the focused energy into the side of the pipe through an opening and in a direction close to parallel to the long axis of the pipe. A cone carries the sound energy to the lens from the generator. Depending on the choice of materials, there may be a quarter-wave, acoustic impedance matching section between the generator and the cone to reduce the reflections of energy at the cone boundary. The lens material has an acoustic impedance similar to that of the cone material but a different sonic velocity so that the lens can converge the sound waves in the fluid. A transition section between the lens and the fluid helps to couple the energy to the fluid and assures it is directed as close to parallel to the fluid flow direction as possible. 3 figures.
Ultrasonic fluid flow measurement method and apparatus

DOEpatents

Kronberg, James W.

1993-01-01

An apparatus for measuring the flow of a fluid in a pipe using ultrasonic waves. The apparatus comprises an ultrasonic generator, a lens for focusing the sound energy produced by the generator, and means for directing the focused energy into the side of the pipe through an opening and in a direction close to parallel to the long axis of the pipe. A cone carries the sound energy to the lens from the generator. Depending on the choice of materials, there may be a quarter-wave, acoustic impedance matching section between the generator and the cone to reduce the reflections of energy at the cone boundary. The lens material has an acoustic impedance similar to that of the cone material but a different sonic velocity so that the lens can converge the sound waves in the fluid. A transition section between the lens and the fluid helps to couple the energy to the fluid and assures it is directed as close to parallel to the fluid flow direction as possible.

Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Design of a massively parallel computer using bit serial processing elements

NASA Technical Reports Server (NTRS)

Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

1995-01-01

A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
A Correlation-Based Transition Model using Local Variables. Part 2; Test Cases and Industrial Applications

NASA Technical Reports Server (NTRS)

Langtry, R. B.; Menter, F. R.; Likki, S. R.; Suzen, Y. B.; Huang, P. G.; Volker, S.

2006-01-01

A new correlation-based transition model has been developed, which is built strictly on local variables. As a result, the transition model is compatible with modern computational fluid dynamics (CFD) methods using unstructured grids and massive parallel execution. The model is based on two transport equations, one for the intermittency and one for the transition onset criteria in terms of momentum thickness Reynolds number. The proposed transport equations do not attempt to model the physics of the transition process (unlike, e.g., turbulence models), but form a framework for the implementation of correlation-based models into general-purpose CFD methods.
A Correlation-Based Transition Model using Local Variables. Part 1; Model Formation

NASA Technical Reports Server (NTRS)

Menter, F. R.; Langtry, R. B.; Likki, S. R.; Suzen, Y. B.; Huang, P. G.; Volker, S.

2006-01-01

A new correlation-based transition model has been developed, which is based strictly on local variables. As a result, the transition model is compatible with modern computational fluid dynamics (CFD) approaches, such as unstructured grids and massive parallel execution. The model is based on two transport equations, one for intermittency and one for the transition onset criteria in terms of momentum thickness Reynolds number. The proposed transport equations do not attempt to model the physics of the transition process (unlike, e.g., turbulence models) but from a framework for the implementation of correlation-based models into general-purpose CFD methods.
The System of Simulation and Multi-objective Optimization for the Roller Kiln

NASA Astrophysics Data System (ADS)

Huang, He; Chen, Xishen; Li, Wugang; Li, Zhuoqiu

It is somewhat a difficult researching problem, to get the building parameters of the ceramic roller kiln simulation model. A system integrated of evolutionary algorithms (PSO, DE and DEPSO) and computational fluid dynamics (CFD), is proposed to solve the problem. And the temperature field uniformity and the environment disruption are studied in this paper. With the help of the efficient parallel calculation, the ceramic roller kiln temperature field uniformity and the NOx emissions field have been researched in the system at the same time. A multi-objective optimization example of the industrial roller kiln proves that the system is of excellent parameter exploration capability.
Laser Powered Launch Vehicle Performance Analyses

NASA Technical Reports Server (NTRS)

Chen, Yen-Sen; Liu, Jiwen; Wang, Ten-See (Technical Monitor)

2001-01-01

The purpose of this study is to establish the technical ground for modeling the physics of laser powered pulse detonation phenomenon. Laser powered propulsion systems involve complex fluid dynamics, thermodynamics and radiative transfer processes. Successful predictions of the performance of laser powered launch vehicle concepts depend on the sophisticate models that reflects the underlying flow physics including the laser ray tracing the focusing, inverse Bremsstrahlung (IB) effects, finite-rate air chemistry, thermal non-equilibrium, plasma radiation and detonation wave propagation, etc. The proposed work will extend the base-line numerical model to an efficient design analysis tool. The proposed model is suitable for 3-D analysis using parallel computing methods.
Survey of the status of finite element methods for partial differential equations

NASA Technical Reports Server (NTRS)

Temam, Roger

1986-01-01

The finite element methods (FEM) have proved to be a powerful technique for the solution of boundary value problems associated with partial differential equations of either elliptic, parabolic, or hyperbolic type. They also have a good potential for utilization on parallel computers particularly in relation to the concept of domain decomposition. This report is intended as an introduction to the FEM for the nonspecialist. It contains a survey which is totally nonexhaustive, and it also contains as an illustration, a report on some new results concerning two specific applications, namely a free boundary fluid-structure interaction problem and the Euler equations for inviscid flows.
Shock waves generated by sudden expansions of a water jet

NASA Astrophysics Data System (ADS)

Salinas-Vázquez, M.; Echeverría, C.; Porta, D.; Stern, C. E.; Ascanio, G.; Vicente, W.; Aguayo, J. P.

2018-07-01

Direct shadowgraph with parallel light combined with high-speed recording has been used to analyze the water jet of a cutting machine. The use of image processing allowed observing sudden expansions in the jet diameter as well as estimating the jet velocity by means of the Mach angle, obtaining velocities of about 500 m s^{-1}. The technique used here revealed the development of hydrodynamic instabilities in the jet. Additionally, this is the first reporting of the onset of shock waves generated by small fluctuations of a continuous flow of water at high velocity surrounded by air, a result confirmed by a transient computational fluid dynamics simulation.
plasmaFoam: An OpenFOAM framework for computational plasma physics and chemistry

NASA Astrophysics Data System (ADS)

Venkattraman, Ayyaswamy; Verma, Abhishek Kumar

2016-09-01

As emphasized in the 2012 Roadmap for low temperature plasmas (LTP), scientific computing has emerged as an essential tool for the investigation and prediction of the fundamental physical and chemical processes associated with these systems. While several in-house and commercial codes exist, with each having its own advantages and disadvantages, a common framework that can be developed by researchers from all over the world will likely accelerate the impact of computational studies on advances in low-temperature plasma physics and chemistry. In this regard, we present a finite volume computational toolbox to perform high-fidelity simulations of LTP systems. This framework, primarily based on the OpenFOAM solver suite, allows us to enhance our understanding of multiscale plasma phenomenon by performing massively parallel, three-dimensional simulations on unstructured meshes using well-established high performance computing tools that are widely used in the computational fluid dynamics community. In this talk, we will present preliminary results obtained using the OpenFOAM-based solver suite with benchmark three-dimensional simulations of microplasma devices including both dielectric and plasma regions. We will also discuss the future outlook for the solver suite.
Performance Evaluation of Parallel Branch and Bound Search with the Intel iPSC (Intel Personal SuperComputer) Hypercube Computer.

DTIC Science & Technology

1986-12-01

17 III. Analysis of Parallel Design ................................................ 18 Parallel Abstract Data ...Types ........................................... 18 Abstract Data Type .................................................. 19 Parallel ADT...22 Data -Structure Design ........................................... 23 Object-Oriented Design
Unsteady stokes flow of dusty fluid between two parallel plates through porous medium in the presence of magnetic field

NASA Astrophysics Data System (ADS)

Sasikala, R.; Govindarajan, A.; Gayathri, R.

2018-04-01

This paper focus on the result of dust particle between two parallel plates through porous medium in the presence of magnetic field with constant suction in the upper plate and constant injection in the lower plate. The partial differential equations governing the flow are solved by similarity transformation. The velocity of the fluid and the dust particle decreases when there is an increase in the Hartmann number.
Plastohydrodynamic drawing and coating of stainless steel wire using a tapered bore die of no metal to metal contact

NASA Astrophysics Data System (ADS)

Hasan, S.; Basmage, O.; Stokes, J. T.; Hashmi, M. S. J.

2018-05-01

A review of wire coating studies using plasto-hydrodynamic pressure shows that most of the works were carried out by conducting experiments simultaneously with simulation analysis based upon Bernoulli's principle and Euler and Navier-Stokes (N-S) equations. These characteristics relate to the domain of Computational Fluid Dynamics (CFD) which is an interdisciplinary topic (Fluid Mechanics, Numerical Analysis of Fluid flow and Computer Science). This research investigates two aspects: (i) simulation work and (ii) experimentation. A mathematical model was developed to investigate the flow pattern of the molten polymer and pressure distribution within the wire-drawing dies, assessment of polymer coating thickness on the coated wires and speed of coating on the wires at the outlet of the drawing dies, without deploying any pressurizing pump. In addition to a physical model which was developed within ANSYS™ environment through the simulation design of ANSYS™ Workbench. The design was customized to simulate the process of wire-coating on the fine stainless-steel wires using drawing dies having different bore geometries such as: stepped parallel bore, tapered bore and combined parallel and tapered bore. The convergence of the designed CFD model and numerical and physical solution parameters for simulation were dynamically monitored for the viscous flow of the polypropylene (PP) polymer. Simulation results were validated against experimental results and used to predict the ideal bore shape to produce a thin coating on stainless wires with different diameter. Simulation studies confirmed that a specific speed should be attained by the stainless-steel wires while passing through the drawing dies. It has been observed that all the speed values within specific speed range did not produce a coating thickness having the desired coating characteristic features. Therefore, some optimization of the experimental set up through design of experiments (Stat-Ease) was applied to validate the results. Further rapid solidification of the viscous coating on the wires was targeted so that the coated wires do not stick to the winding spool after the coating process.
Advances and trends in structures and dynamics; Proceedings of the Symposium, Washington, DC, October 22-25, 1984

NASA Technical Reports Server (NTRS)

Noor, A. K. (Editor); Hayduk, R. J. (Editor)

1985-01-01

Among the topics discussed are developments in structural engineering hardware and software, computation for fracture mechanics, trends in numerical analysis and parallel algorithms, mechanics of materials, advances in finite element methods, composite materials and structures, determinations of random motion and dynamic response, optimization theory, automotive tire modeling methods and contact problems, the damping and control of aircraft structures, and advanced structural applications. Specific topics covered include structural design expert systems, the evaluation of finite element system architectures, systolic arrays for finite element analyses, nonlinear finite element computations, hierarchical boundary elements, adaptive substructuring techniques in elastoplastic finite element analyses, automatic tracking of crack propagation, a theory of rate-dependent plasticity, the torsional stability of nonlinear eccentric structures, a computation method for fluid-structure interaction, the seismic analysis of three-dimensional soil-structure interaction, a stress analysis for a composite sandwich panel, toughness criterion identification for unidirectional composite laminates, the modeling of submerged cable dynamics, and damping synthesis for flexible spacecraft structures.
Dynamics of face and annular seals with two-phase flow

NASA Technical Reports Server (NTRS)

Hughes, William F.; Basu, Prithwish; Beatty, Paul A.; Beeler, Richard M.; Lau, Stephen

1988-01-01

A detailed study was made of face and annular seals under conditions where boiling, i.e., phase change of the leaking fluid, occurs within the seal. Many seals operate in this mode because of flashing due to pressure drop and/or heat input from frictional heating. Some of the distinctive behavior characteristics of two phase seals are discussed, particularly their axial stability. The main conclusions are that seals with two phase flow may be unstable if improperly balanced. Detailed theoretical analyses of low (laminar) and high (turbulent) leakage seals are presented along with computer codes, parametric studies, and in particular a simplified PC based code that allows for rapid performance prediction: calculations of stiffness coefficients, temperature and pressure distributions, and leakage rates for parallel and coned face seals. A simplified combined computer code for the performance prediction over the laminar and turbulent ranges of a two phase flow is described and documented. The analyses, results, and computer codes are summarized.
GPU computing with Kaczmarz’s and other iterative algorithms for linear systems

PubMed Central

Elble, Joseph M.; Sahinidis, Nikolaos V.; Vouzis, Panagiotis

2009-01-01

The graphics processing unit (GPU) is used to solve large linear systems derived from partial differential equations. The differential equations studied are strongly convection-dominated, of various sizes, and common to many fields, including computational fluid dynamics, heat transfer, and structural mechanics. The paper presents comparisons between GPU and CPU implementations of several well-known iterative methods, including Kaczmarz’s, Cimmino’s, component averaging, conjugate gradient normal residual (CGNR), symmetric successive overrelaxation-preconditioned conjugate gradient, and conjugate-gradient-accelerated component-averaged row projections (CARP-CG). Computations are preformed with dense as well as general banded systems. The results demonstrate that our GPU implementation outperforms CPU implementations of these algorithms, as well as previously studied parallel implementations on Linux clusters and shared memory systems. While the CGNR method had begun to fall out of favor for solving such problems, for the problems studied in this paper, the CGNR method implemented on the GPU performed better than the other methods, including a cluster implementation of the CARP-CG method. PMID:20526446
Hypercluster Parallel Processor

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

1992-01-01

Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

NASA Technical Reports Server (NTRS)

Hsia, T. C.; Lu, G. Z.; Han, W. H.

1987-01-01

In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
On the Lagrangian description of unsteady boundary-layer separation. I - General theory

NASA Technical Reports Server (NTRS)

Van Dommelen, Leon L.; Cowley, Stephen J.

1990-01-01

Although unsteady, high-Reynolds number, laminar boundary layers have conventionally been studied in terms of Eulerian coordinates, a Lagrangian approach may have significant analytical and computational advantages. In Lagrangian coordinates the classical boundary layer equations decouple into a momentum equation for the motion parallel to the boundary, and a hyperbolic continuity equation (essentially a conserved Jacobian) for the motion normal to the boundary. The momentum equations, plus the energy equation if the flow is compressible, can be solved independently of the continuity equation. Unsteady separation occurs when the continuity equation becomes singular as a result of touching characteristics, the condition for which can be expressed in terms of the solution of the momentum equations. The solutions to the momentum and energy equations remain regular. Asymptotic structures for a number of unsteady 3-D separating flows follow and depend on the symmetry properties of the flow. In the absence of any symmetry, the singularity structure just prior to separation is found to be quasi 2-D with a displacement thickness in the form of a crescent shaped ridge. Physically the singularities can be understood in terms of the behavior of a fluid element inside the boundary layer which contracts in a direction parallel to the boundary and expands normal to it, thus forcing the fluid above it to be ejected from the boundary layer.
On the Lagrangian description of unsteady boundary layer separation. Part 1: General theory

NASA Technical Reports Server (NTRS)

Vandommelen, Leon L.; Cowley, Stephen J.

1989-01-01

Although unsteady, high-Reynolds number, laminar boundary layers have conventionally been studied in terms of Eulerian coordinates, a Lagrangian approach may have significant analytical and computational advantages. In Lagrangian coordinates the classical boundary layer equations decouple into a momentum equation for the motion parallel to the boundary, and a hyperbolic continuity equation (essentially a conserved Jacobian) for the motion normal to the boundary. The momentum equations, plus the energy equation if the flow is compressible, can be solved independently of the continuity equation. Unsteady separation occurs when the continuity equation becomes singular as a result of touching characteristics, the condition for which can be expressed in terms of the solution of the momentum equations. The solutions to the momentum and energy equations remain regular. Asymptotic structures for a number of unsteady 3-D separating flows follow and depend on the symmetry properties of the flow. In the absence of any symmetry, the singularity structure just prior to separation is found to be quasi 2-D with a displacement thickness in the form of a crescent shaped ridge. Physically the singularities can be understood in terms of the behavior of a fluid element inside the boundary layer which contracts in a direction parallel to the boundary and expands normal to it, thus forcing the fluid above it to be ejected from the boundary layer.
A scalable parallel black oil simulator on distributed memory parallel computers

NASA Astrophysics Data System (ADS)

Wang, Kun; Liu, Hui; Chen, Zhangxin

2015-11-01

This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.

Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

PubMed

Furuta, Takuya; Sato, Tatsuhiko

2015-01-01

Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

NASA Astrophysics Data System (ADS)

Georgiev, K.; Zlatev, Z.

2010-11-01

The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Application of Local Discretization Methods in the NASA Finite-Volume General Circulation Model

NASA Technical Reports Server (NTRS)

Yeh, Kao-San; Lin, Shian-Jiann; Rood, Richard B.

2002-01-01

We present the basic ideas of the dynamics system of the finite-volume General Circulation Model developed at NASA Goddard Space Flight Center for climate simulations and other applications in meteorology. The dynamics of this model is designed with emphases on conservative and monotonic transport, where the property of Lagrangian conservation is used to maintain the physical consistency of the computational fluid for long-term simulations. As the model benefits from the noise-free solutions of monotonic finite-volume transport schemes, the property of Lagrangian conservation also partly compensates the accuracy of transport for the diffusion effects due to the treatment of monotonicity. By faithfully maintaining the fundamental laws of physics during the computation, this model is able to achieve sufficient accuracy for the global consistency of climate processes. Because the computing algorithms are based on local memory, this model has the advantage of efficiency in parallel computation with distributed memory. Further research is yet desirable to reduce the diffusion effects of monotonic transport for better accuracy, and to mitigate the limitation due to fast-moving gravity waves for better efficiency.
Randomized Dynamic Mode Decomposition

NASA Astrophysics Data System (ADS)

Erichson, N. Benjamin; Brunton, Steven L.; Kutz, J. Nathan

2017-11-01

The dynamic mode decomposition (DMD) is an equation-free, data-driven matrix decomposition that is capable of providing accurate reconstructions of spatio-temporal coherent structures arising in dynamical systems. We present randomized algorithms to compute the near-optimal low-rank dynamic mode decomposition for massive datasets. Randomized algorithms are simple, accurate and able to ease the computational challenges arising with `big data'. Moreover, randomized algorithms are amenable to modern parallel and distributed computing. The idea is to derive a smaller matrix from the high-dimensional input data matrix using randomness as a computational strategy. Then, the dynamic modes and eigenvalues are accurately learned from this smaller representation of the data, whereby the approximation quality can be controlled via oversampling and power iterations. Here, we present randomized DMD algorithms that are categorized by how many passes the algorithm takes through the data. Specifically, the single-pass randomized DMD does not require data to be stored for subsequent passes. Thus, it is possible to approximately decompose massive fluid flows (stored out of core memory, or not stored at all) using single-pass algorithms, which is infeasible with traditional DMD algorithms.
A Debugger for Computational Grid Applications

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation gives an overview of a debugger for computational grid applications. Details are given on NAS parallel tools groups (including parallelization support tools, evaluation of various parallelization strategies, and distributed and aggregated computing), debugger dependencies, scalability, initial implementation, the process grid, and information on Globus.
Application of a Scalable, Parallel, Unstructured-Grid-Based Navier-Stokes Solver

NASA Technical Reports Server (NTRS)

Parikh, Paresh

2001-01-01

A parallel version of an unstructured-grid based Navier-Stokes solver, USM3Dns, previously developed for efficient operation on a variety of parallel computers, has been enhanced to incorporate upgrades made to the serial version. The resultant parallel code has been extensively tested on a variety of problems of aerospace interest and on two sets of parallel computers to understand and document its characteristics. An innovative grid renumbering construct and use of non-blocking communication are shown to produce superlinear computing performance. Preliminary results from parallelization of a recently introduced "porous surface" boundary condition are also presented.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

NASA Astrophysics Data System (ADS)

Decyk, V. K.; Dauger, D. E.

We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Large-scale parallel lattice Boltzmann-cellular automaton model of two-dimensional dendritic growth

NASA Astrophysics Data System (ADS)

Jelinek, Bohumir; Eshraghi, Mohsen; Felicelli, Sergio; Peters, John F.

2014-03-01

An extremely scalable lattice Boltzmann (LB)-cellular automaton (CA) model for simulations of two-dimensional (2D) dendritic solidification under forced convection is presented. The model incorporates effects of phase change, solute diffusion, melt convection, and heat transport. The LB model represents the diffusion, convection, and heat transfer phenomena. The dendrite growth is driven by a difference between actual and equilibrium liquid composition at the solid-liquid interface. The CA technique is deployed to track the new interface cells. The computer program was parallelized using the Message Passing Interface (MPI) technique. Parallel scaling of the algorithm was studied and major scalability bottlenecks were identified. Efficiency loss attributable to the high memory bandwidth requirement of the algorithm was observed when using multiple cores per processor. Parallel writing of the output variables of interest was implemented in the binary Hierarchical Data Format 5 (HDF5) to improve the output performance, and to simplify visualization. Calculations were carried out in single precision arithmetic without significant loss in accuracy, resulting in 50% reduction of memory and computational time requirements. The presented solidification model shows a very good scalability up to centimeter size domains, including more than ten million of dendrites. Catalogue identifier: AEQZ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQZ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, UK Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 29,767 No. of bytes in distributed program, including test data, etc.: 3131,367 Distribution format: tar.gz Programming language: Fortran 90. Computer: Linux PC and clusters. Operating system: Linux. Has the code been vectorized or parallelized?: Yes. Program is parallelized using MPI. Number of processors used: 1-50,000 RAM: Memory requirements depend on the grid size Classification: 6.5, 7.7. External routines: MPI (http://www.mcs.anl.gov/research/projects/mpi/), HDF5 (http://www.hdfgroup.org/HDF5/) Nature of problem: Dendritic growth in undercooled Al-3 wt% Cu alloy melt under forced convection. Solution method: The lattice Boltzmann model solves the diffusion, convection, and heat transfer phenomena. The cellular automaton technique is deployed to track the solid/liquid interface. Restrictions: Heat transfer is calculated uncoupled from the fluid flow. Thermal diffusivity is constant. Unusual features: Novel technique, utilizing periodic duplication of a pre-grown “incubation” domain, is applied for the scaleup test. Running time: Running time varies from minutes to days depending on the domain size and number of computational cores.
Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
Facilitating higher-fidelity simulations of axial compressor instability and other turbomachinery flow conditions

NASA Astrophysics Data System (ADS)

Herrick, Gregory Paul

The quest to accurately capture flow phenomena with length-scales both short and long and to accurately represent complex flow phenomena within disparately sized geometry inspires a need for an efficient, high-fidelity, multi-block structured computational fluid dynamics (CFD) parallel computational scheme. This research presents and demonstrates a more efficient computational method by which to perform multi-block structured CFD parallel computational simulations, thus facilitating higher-fidelity solutions of complicated geometries (due to the inclusion of grids for "small'' flow areas which are often merely modeled) and their associated flows. This computational framework offers greater flexibility and user-control in allocating the resource balance between process count and wall-clock computation time. The principal modifications implemented in this revision consist of a "multiple grid block per processing core'' software infrastructure and an analytic computation of viscous flux Jacobians. The development of this scheme is largely motivated by the desire to simulate axial compressor stall inception with more complete gridding of the flow passages (including rotor tip clearance regions) than has been previously done while maintaining high computational efficiency (i.e., minimal consumption of computational resources), and thus this paradigm shall be demonstrated with an examination of instability in a transonic axial compressor. However, the paradigm presented herein facilitates CFD simulation of myriad previously impractical geometries and flows and is not limited to detailed analyses of axial compressor flows. While the simulations presented herein were technically possible under the previous structure of the subject software, they were much less computationally efficient and thus not pragmatically feasible; the previous research using this software to perform three-dimensional, full-annulus, time-accurate, unsteady, full-stage (with sliding-interface) simulations of rotating stall inception in axial compressors utilized tip clearance periodic models, while the scheme here is demonstrated by a simulation of axial compressor stall inception utilizing gridded rotor tip clearance regions. As will be discussed, much previous research---experimental, theoretical, and computational---has suggested that understanding clearance flow behavior is critical to understanding stall inception, and previous computational research efforts which have used tip clearance models have begged the question, "What about the clearance flows?''. This research begins to address that question.
Parallel Adaptive High-Order CFD Simulations Characterizing Cavity Acoustics for the Complete SOFIA Aircraft

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2014-01-01

This paper presents one-of-a-kind MPI-parallel computational fluid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft of a Boeing 747SP. These simulations focus on how the unsteady flow field inside and over the cavity interferes with the optical path and mounting of the telescope. A temporally fourth-order Runge-Kutta, and spatially fifth-order WENO-5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh refinement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32,000 cores and 4 billion cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregularities caused by the highly complex geometry. Limits to scaling beyond 32K cores are identified, and targeted code optimizations are discussed.
The Microgravity Science Glovebox

NASA Technical Reports Server (NTRS)

Baugher, Charles R.; Primm, Lowell (Technical Monitor)

2001-01-01

The Microgravity Science Glovebox (MSG) provides scientific investigators the opportunity to implement interactive experiments on the International Space Station. The facility has been designed around the concept of an enclosed scientific workbench that allows the crew to assemble and operate an experimental apparatus with participation from ground-based scientists through real-time data and video links. Workbench utilities provided to operate the experiments include power, data acquisition, computer communications, vacuum, nitrogen. and specialized tools. Because the facility work area is enclosed and held at a negative pressure with respect to the crew living area, the requirements on the experiments for containment of small parts, particulates, fluids, and gasses are substantially reduced. This environment allows experiments to be constructed in close parallel with bench type investigations performed in groundbased laboratories. Such an approach enables experimental scientists to develop hardware that more closely parallel their traditional laboratory experience and transfer these experiments into meaningful space-based research. When delivered to the ISS the MSG will represent a significant scientific capability that will be continuously available for a decade of evolutionary research.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Parallel CE/SE Computations via Domain Decomposition

NASA Technical Reports Server (NTRS)

Himansu, Ananda; Jorgenson, Philip C. E.; Wang, Xiao-Yen; Chang, Sin-Chung

2000-01-01

This paper describes the parallelization strategy and achieved parallel efficiency of an explicit time-marching algorithm for solving conservation laws. The Space-Time Conservation Element and Solution Element (CE/SE) algorithm for solving the 2D and 3D Euler equations is parallelized with the aid of domain decomposition. The parallel efficiency of the resultant algorithm on a Silicon Graphics Origin 2000 parallel computer is checked.
The energy density distribution of an ideal gas and Bernoulli’s equations

NASA Astrophysics Data System (ADS)

Santos, Leonardo S. F.

2018-05-01

This work discusses the energy density distribution in an ideal gas and the consequences of Bernoulli’s equation and the corresponding relation for compressible fluids. The aim of this work is to study how Bernoulli’s equation determines the energy flow in a fluid, although Bernoulli’s equation does not describe the energy density itself. The model from molecular dynamic considerations that describes an ideal gas at rest with uniform density is modified to explore the gas in motion with non-uniform density and gravitational effects. The difference between the component of the speed of a particle that is parallel to the gas speed and the gas speed itself is called ‘parallel random speed’. The pressure from the ‘parallel random speed’ is denominated as parallel pressure. The modified model predicts that the energy density is the sum of kinetic and potential gravitational energy densities plus two terms with static and parallel pressures. The application of Bernoulli’s equation and the corresponding relation for compressible fluids in the energy density expression has resulted in two new formulations. For incompressible and compressible gas, the energy density expressions are written as a function of stagnation, static and parallel pressures, without any dependence on kinetic or gravitational potential energy densities. These expressions of the energy density are the main contributions of this work. When the parallel pressure was uniform, the energy density distribution for incompressible approximation and compressible gas did not converge to zero for the limit of null static pressure. This result is rather unusual because the temperature tends to zero for null pressure. When the gas was considered incompressible and the parallel pressure was equal to static pressure, the energy density maintained this unusual behaviour with small pressures. If the parallel pressure was equal to static pressure, the energy density converged to zero for the limit of the null pressure only if the gas was compressible. Only the last situation describes an intuitive behaviour for an ideal gas.
Parallel Algorithms for Least Squares and Related Computations.

DTIC Science & Technology

1991-03-22

for dense computations in linear algebra . The work has recently been published in a general reference book on parallel algorithms by SIAM. AFO SR...written his Ph.D. dissertation with the principal investigator. (See publication 6.) • Parallel Algorithms for Dense Linear Algebra Computations. Our...and describe and to put into perspective a selection of the more important parallel algorithms for numerical linear algebra . We give a major new
Reliability models for dataflow computer systems

NASA Technical Reports Server (NTRS)

Kavi, K. M.; Buckles, B. P.

1985-01-01

The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.

Parallel computing method for simulating hydrological processesof large rivers under climate change

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, Y.

2016-12-01

Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Improving Fidelity of Launch Vehicle Liftoff Acoustic Simulations

NASA Technical Reports Server (NTRS)

Liever, Peter; West, Jeff

2016-01-01

Launch vehicles experience high acoustic loads during ignition and liftoff affected by the interaction of rocket plume generated acoustic waves with launch pad structures. Application of highly parallelized Computational Fluid Dynamics (CFD) analysis tools optimized for application on the NAS computer systems such as the Loci/CHEM program now enable simulation of time-accurate, turbulent, multi-species plume formation and interaction with launch pad geometry and capture the generation of acoustic noise at the source regions in the plume shear layers and impingement regions. These CFD solvers are robust in capturing the acoustic fluctuations, but they are too dissipative to accurately resolve the propagation of the acoustic waves throughout the launch environment domain along the vehicle. A hybrid Computational Fluid Dynamics and Computational Aero-Acoustics (CFD/CAA) modeling framework has been developed to improve such liftoff acoustic environment predictions. The framework combines the existing highly-scalable NASA production CFD code, Loci/CHEM, with a high-order accurate discontinuous Galerkin (DG) solver, Loci/THRUST, developed in the same computational framework. Loci/THRUST employs a low dissipation, high-order, unstructured DG method to accurately propagate acoustic waves away from the source regions across large distances. The DG solver is currently capable of solving up to 4th order solutions for non-linear, conservative acoustic field propagation. Higher order boundary conditions are implemented to accurately model the reflection and refraction of acoustic waves on launch pad components. The DG solver accepts generalized unstructured meshes, enabling efficient application of common mesh generation tools for CHEM and THRUST simulations. The DG solution is coupled with the CFD solution at interface boundaries placed near the CFD acoustic source regions. Both simulations are executed simultaneously with coordinated boundary condition data exchange.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Computational modeling of fully-ionized, magnetized plasmas using the fluid approximation

NASA Astrophysics Data System (ADS)

Schnack, Dalton

2005-10-01

Strongly magnetized plasmas are rich in spatial and temporal scales, making a computational approach useful for studying these systems. The most accurate model of a magnetized plasma is based on a kinetic equation that describes the evolution of the distribution function for each species in six-dimensional phase space. However, the high dimensionality renders this approach impractical for computations for long time scales in relevant geometry. Fluid models, derived by taking velocity moments of the kinetic equation [1] and truncating (closing) the hierarchy at some level, are an approximation to the kinetic model. The reduced dimensionality allows a wider range of spatial and/or temporal scales to be explored. Several approximations have been used [2-5]. Successful computational modeling requires understanding the ordering and closure approximations, the fundamental waves supported by the equations, and the numerical properties of the discretization scheme. We review and discuss several ordering schemes, their normal modes, and several algorithms that can be applied to obtain a numerical solution. The implementation of kinetic parallel closures is also discussed [6].[1] S. Chapman and T.G. Cowling, ``The Mathematical Theory of Non-Uniform Gases'', Cambridge University Press, Cambridge, UK (1939).[2] R.D. Hazeltine and J.D. Meiss, ``Plasma Confinement'', Addison-Wesley Publishing Company, Redwood City, CA (1992).[3] L.E. Sugiyama and W. Park, Physics of Plasmas 7, 4644 (2000).[4] J.J. Ramos, Physics of Plasmas, 10, 3601 (2003).[5] P.J. Catto and A.N. Simakov, Physics of Plasmas, 11, 90 (2004).[6] E.D. Held et al., Phys. Plasmas 11, 2419 (2004)
Linear induction pump

DOEpatents

Meisner, John W.; Moore, Robert M.; Bienvenue, Louis L.

1985-03-19

Electromagnetic linear induction pump for liquid metal which includes a unitary pump duct. The duct comprises two substantially flat parallel spaced-apart wall members, one being located above the other and two parallel opposing side members interconnecting the wall members. Located within the duct are a plurality of web members interconnecting the wall members and extending parallel to the side members whereby the wall members, side members and web members define a plurality of fluid passageways, each of the fluid passageways having substantially the same cross-sectional flow area. Attached to an outer surface of each side member is an electrically conductive end bar for the passage of an induced current therethrough. A multi-phase, electrical stator is located adjacent each of the wall members. The duct, stators, and end bars are enclosed in a housing which is provided with an inlet and outlet in fluid communication with opposite ends of the fluid passageways in the pump duct. In accordance with a preferred embodiment, the inlet and outlet includes a transition means which provides for a transition from a round cross-sectional flow path to a substantially rectangular cross-sectional flow path defined by the pump duct.
Why not make a PC cluster of your own? 5. AppleSeed: A Parallel Macintosh Cluster for Scientific Computing

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.

We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
Multi-Fluid Moment Simulations of Ganymede using the Next-Generation OpenGGCM

NASA Astrophysics Data System (ADS)

Wang, L.; Germaschewski, K.; Hakim, A.; Bhattacharjee, A.; Raeder, J.

2015-12-01

We coupled the multi-fluid moment code Gkeyll[1,2] to the next-generation OpenGGCM[3], and studied the reconnection dynamics at the Ganymede. This work is part of our effort to tackle the grand challenge of integrating kinetic effects into global fluid models. The multi-fluid moment model integrates kinetic effects in that it can capture crucial kinetic physics like pressure tensor effects by evolving moments of the Vlasov equations for each species. This approach has advantages over previous models: desired kinetic effects, together with other important effects like the Hall effect, are self-consistently embedded in the moment equations, and can be efficiently implemented, while not suffering from severe time-step restriction due to plasma oscillation nor artificial whistler modes. This model also handles multiple ion species naturally, which opens up opportunties in investigating the role of oxygen in magnetospheric reconnection and improved coupling to ionosphere models. In this work, the multi-fluid moment solver in Gkeyll was wrapped as a time-stepping module for the high performance, highly flexible next-generation OpenGGCM. Gkeyll is only used to provide the local plasma solver, while computational aspects like parallelization and boundary conditions are handled entirely by OpenGGCM, including interfacing to other models like ionospheric boundary conditions provided by coupling with CTIM [3]. The coupled code is used to study the dynamics near Ganymede, and the results are compared with MHD and Hall MHD results by Dorelli et al. [4]. Hakim, A. (2008). Journal of Fusion Energy, 27, 36-43. Hakim, A., Loverich, J., & Shumlak, U. (2006). Journal of Computational Physics, 219, 418-442. Raeder, J., Larson, D., Li, W., Kepko, E. L., & Fuller-Rowell, T. (2008). Space Science Reviews, 141(1-4), 535-555. Dorelli, J. C., Glocer, A., Collinson, G., & Tóth, G. (2015). Journal of Geophysical Research: Space Physics, 120.
Overview of the NASA Glenn Flux Reconstruction Based High-Order Unstructured Grid Code

NASA Technical Reports Server (NTRS)

Spiegel, Seth C.; DeBonis, James R.; Huynh, H. T.

2016-01-01

A computational fluid dynamics code based on the flux reconstruction (FR) method is currently being developed at NASA Glenn Research Center to ultimately provide a large- eddy simulation capability that is both accurate and efficient for complex aeropropulsion flows. The FR approach offers a simple and efficient method that is easy to implement and accurate to an arbitrary order on common grid cell geometries. The governing compressible Navier-Stokes equations are discretized in time using various explicit Runge-Kutta schemes, with the default being the 3-stage/3rd-order strong stability preserving scheme. The code is written in modern Fortran (i.e., Fortran 2008) and parallelization is attained through MPI for execution on distributed-memory high-performance computing systems. An h- refinement study of the isentropic Euler vortex problem is able to empirically demonstrate the capability of the FR method to achieve super-accuracy for inviscid flows. Additionally, the code is applied to the Taylor-Green vortex problem, performing numerous implicit large-eddy simulations across a range of grid resolutions and solution orders. The solution found by a pseudo-spectral code is commonly used as a reference solution to this problem, and the FR code is able to reproduce this solution using approximately the same grid resolution. Finally, an examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.
THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media

NASA Astrophysics Data System (ADS)

Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu

2015-07-01

The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

NASA Technical Reports Server (NTRS)

Straeter, T. A.; Markos, A. T.

1975-01-01

A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Fluid mechanics based classification of the respiratory efficiency of several nasal cavities.

PubMed

Lintermann, Andreas; Meinke, Matthias; Schröder, Wolfgang

2013-11-01

The flow in the human nasal cavity is of great importance to understand rhinologic pathologies like impaired respiration or heating capabilities, a diminished sense of taste and smell, and the presence of dry mucous membranes. To numerically analyze this flow problem a highly efficient and scalable Thermal Lattice-BGK (TLBGK) solver is used, which is very well suited for flows in intricate geometries. The generation of the computational mesh is completely automatic and highly parallelized such that it can be executed efficiently on High Performance Computers (HPCs). An evaluation of the functionality of nasal cavities is based on an analysis of pressure drop, secondary flow structures, wall-shear stress distributions, and temperature variations from the nostrils to the pharynx. The results of the flow fields of three completely different nasal cavities allow their classification into ability groups and support the a priori decision process on surgical interventions. © 2013 Elsevier Ltd. All rights reserved.
Task Assignment Heuristics for Parallel and Distributed CFD Applications

NASA Technical Reports Server (NTRS)

Lopez-Benitez, Noe; Djomehri, M. Jahed; Biswas, Rupak

2003-01-01

This paper proposes a task graph (TG) model to represent a single discrete step of multi-block overset grid computational fluid dynamics (CFD) applications. The TG model is then used to not only balance the computational workload across the overset grids but also to reduce inter-grid communication costs. We have developed a set of task assignment heuristics based on the constraints inherent in this class of CFD problems. Two basic assignments, the smallest task first (STF) and the largest task first (LTF), are first presented. They are then systematically costs. To predict the performance of the proposed task assignment heuristics, extensive performance evaluations are conducted on a synthetic TG with tasks defined in terms of the number of grid points in predetermined overlapping grids. A TG derived from a realistic problem with eight million grid points is also used as a test case.
Thermal Ablation Modeling for Silicate Materials

NASA Technical Reports Server (NTRS)

Chen, Yih-Kanq

2016-01-01

A thermal ablation model for silicates is proposed. The model includes the mass losses through the balance between evaporation and condensation, and through the moving molten layer driven by surface shear force and pressure gradient. This model can be applied in ablation simulations of the meteoroid or glassy Thermal Protection Systems for spacecraft. Time-dependent axi-symmetric computations are performed by coupling the fluid dynamics code, Data-Parallel Line Relaxation program, with the material response code, Two-dimensional Implicit Thermal Ablation simulation program, to predict the mass lost rates and shape change. For model validation, the surface recession of fused amorphous quartz rod is computed, and the recession predictions reasonably agree with available data. The present parametric studies for two groups of meteoroid earth entry conditions indicate that the mass loss through moving molten layer is negligibly small for heat-flux conditions at around 1 MW/cm(exp. 2).
User's guide of TOUGH2-EGS-MP: A Massively Parallel Simulator with Coupled Geomechanics for Fluid and Heat Flow in Enhanced Geothermal Systems VERSION 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao

2013-12-01

TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporatedmore » into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.« less
Methods of parallel computation applied on granular simulations

NASA Astrophysics Data System (ADS)

Martins, Gustavo H. B.; Atman, Allbens P. F.

2017-06-01

Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.
Parallel computation using boundary elements in solid mechanics

NASA Technical Reports Server (NTRS)

Chien, L. S.; Sun, C. T.

1990-01-01

The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.
Parallel Algorithms for the Exascale Era

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robey, Robert W.

New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this workmore » has been done by undergraduates and published in leading scientific journals.« less
Research in computer science

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1986-01-01

Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

NASA Astrophysics Data System (ADS)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Numerical tool development of fluid-structure interactions for investigation of obstructive sleep apnea

NASA Astrophysics Data System (ADS)

Huang, Chien-Jung; White, Susan; Huang, Shao-Ching; Mallya, Sanjay; Eldredge, Jeff

2016-11-01

Obstructive sleep apnea (OSA) is a medical condition characterized by repetitive partial or complete occlusion of the airway during sleep. The soft tissues in the upper airway of OSA patients are prone to collapse under the low pressure loads incurred during breathing. The ultimate goal of this research is the development of a versatile numerical tool for simulation of air-tissue interactions in the patient specific upper airway geometry. This tool is expected to capture several phenomena, including flow-induced vibration (snoring) and large deformations during airway collapse of the complex airway geometry in respiratory flow conditions. Here, we present our ongoing progress toward this goal. To avoid mesh regeneration, for flow model, a sharp-interface embedded boundary method is used on Cartesian grids for resolving the fluid-structure interface, while for the structural model, a cut-cell finite element method is used. Also, to properly resolve large displacements, non-linear elasticity model is used. The fluid and structure solvers are connected with the strongly coupled iterative algorithm. The parallel computation is achieved with the numerical library PETSc. Some two- and three- dimensional preliminary results are shown to demonstrate the ability of this tool.
RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition

NASA Astrophysics Data System (ADS)

Jiang, Yuning; Kang, Jinfeng; Wang, Xinan

2017-03-01

Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Symplectic molecular dynamics simulations on specially designed parallel computers.

PubMed

Borstnik, Urban; Janezic, Dusanka

2005-01-01

We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Line-by-line spectroscopic simulations on graphics processing units

NASA Astrophysics Data System (ADS)

Collange, Sylvain; Daumas, Marc; Defour, David

2008-01-01

We report here on software that performs line-by-line spectroscopic simulations on gases. Elaborate models (such as narrow band and correlated-K) are accurate and efficient for bands where various components are not simultaneously and significantly active. Line-by-line is probably the most accurate model in the infrared for blends of gases that contain high proportions of H 2O and CO 2 as this was the case for our prototype simulation. Our implementation on graphics processing units sustains a speedup close to 330 on computation-intensive tasks and 12 on memory intensive tasks compared to implementations on one core of high-end processors. This speedup is due to data parallelism, efficient memory access for specific patterns and some dedicated hardware operators only available in graphics processing units. It is obtained leaving most of processor resources available and it would scale linearly with the number of graphics processing units in parallel machines. Line-by-line simulation coupled with simulation of fluid dynamics was long believed to be economically intractable but our work shows that it could be done with some affordable additional resources compared to what is necessary to perform simulations on fluid dynamics alone. Program summaryProgram title: GPU4RE Catalogue identifier: ADZY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADZY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 62 776 No. of bytes in distributed program, including test data, etc.: 1 513 247 Distribution format: tar.gz Programming language: C++ Computer: x86 PC Operating system: Linux, Microsoft Windows. Compilation requires either gcc/g++ under Linux or Visual C++ 2003/2005 and Cygwin under Windows. It has been tested using gcc 4.1.2 under Ubuntu Linux 7.04 and using Visual C++ 2005 with Cygwin 1.5.24 under Windows XP. RAM: 1 gigabyte Classification: 21.2 External routines: OpenGL ( http://www.opengl.org) Nature of problem: Simulating radiative transfer on high-temperature high-pressure gases. Solution method: Line-by-line Monte-Carlo ray-tracing. Unusual features: Parallel computations are moved to the GPU. Additional comments: nVidia GeForce 7000 or ATI Radeon X1000 series graphics processing unit is required. Running time: A few minutes.
Beyond the single-file fluid limit using transfer matrix method: Exact results for confined parallel hard squares

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gurin, Péter; Varga, Szabolcs

2015-06-14

We extend the transfer matrix method of one-dimensional hard core fluids placed between confining walls for that case where the particles can pass each other and at most two layers can form. We derive an eigenvalue equation for a quasi-one-dimensional system of hard squares confined between two parallel walls, where the pore width is between σ and 3σ (σ is the side length of the square). The exact equation of state and the nearest neighbor distribution functions show three different structures: a fluid phase with one layer, a fluid phase with two layers, and a solid-like structure where the fluidmore » layers are strongly correlated. The structural transition between differently ordered fluids develops continuously with increasing density, i.e., no thermodynamic phase transition occurs. The high density structure of the system consists of clusters with two layers which are broken with particles staying in the middle of the pore.« less
Heat transfer at microscopic level in a MHD fractional inertial flow confined between non-isothermal boundaries

NASA Astrophysics Data System (ADS)

Shoaib Anwar, Muhammad; Rasheed, Amer

2017-07-01

Heat transfer through a Forchheimer medium in an unsteady magnetohydrodynamic (MHD) developed differential-type fluid flow is analyzed numerically in this study. The boundary layer flow is modeled with the help of the fractional calculus approach. The fluid is confined between infinite parallel plates and flows by motion of the plates in their own plane. Both the plates have variable surface temperature. Governing partial differential equations with appropriate initial and boundary conditions are solved by employing a finite-difference scheme to discretize the fractional time derivative and finite-element discretization for spatial variables. Coefficients of skin friction and local Nusselt numbers are computed for the fractional model. The flow behavior is presented for various values of the involved parameters. The influence of different dimensionless numbers on skin friction and Nusselt number is discussed by tabular results. Forchheimer medium flows that involve catalytic converters and gas turbines can be modeled in a similar manner.
NIMROD modeling of poloidal flow damping in tokamaks using kinetic closures

NASA Astrophysics Data System (ADS)

Jepson, J. R.; Hegna, C. C.; Held, E. D.

2017-10-01

Calculations of poloidal flow damping in a tokamak are undertaken using two different implementations of the ion drift kinetic equation (DKE) in the extended MHD code NIMROD. The first approach is hybrid fluid/kinetic and uses a Chapman Enskog-like (CEL) Ansatz. Closure of the evolving lower-order fluid moment equations for n, V , and T is provided by solutions to the ion CEL-DKE written in the macroscopic flow reference frame. The second implementation solves the DKE using a delta-f approach. Here, the delta-f distribution describes all of the information beyond a static, lowest-order Maxwellian. We compare the efficiency and accuracy of these two approaches for a simple initial value problem that monitors the relaxation of the poloidal flow profile in high- and low-aspect-ratio tokamak geometry. The computation results are compared against analytic predictions of time dependent closures for the parallel viscous force. Supported by DoE Grants DE-FG02-86ER53218 and DE-FG02-04ER54746.
Combined Numerical/Analytical Perturbation Solutions of the Navier-Stokes Equations for Aerodynamic Ejector/Mixer Nozzle Flows

NASA Technical Reports Server (NTRS)

DeChant, Lawrence Justin

1998-01-01

In spite of rapid advances in both scalar and parallel computational tools, the large number of variables involved in both design and inverse problems make the use of sophisticated fluid flow models impractical, With this restriction, it is concluded that an important family of methods for mathematical/computational development are reduced or approximate fluid flow models. In this study a combined perturbation/numerical modeling methodology is developed which provides a rigorously derived family of solutions. The mathematical model is computationally more efficient than classical boundary layer but provides important two-dimensional information not available using quasi-1-d approaches. An additional strength of the current methodology is its ability to locally predict static pressure fields in a manner analogous to more sophisticated parabolized Navier Stokes (PNS) formulations. To resolve singular behavior, the model utilizes classical analytical solution techniques. Hence, analytical methods have been combined with efficient numerical methods to yield an efficient hybrid fluid flow model. In particular, the main objective of this research has been to develop a system of analytical and numerical ejector/mixer nozzle models, which require minimal empirical input. A computer code, DREA Differential Reduced Ejector/mixer Analysis has been developed with the ability to run sufficiently fast so that it may be used either as a subroutine or called by an design optimization routine. Models are of direct use to the High Speed Civil Transport Program (a joint government/industry project seeking to develop an economically.viable U.S. commercial supersonic transport vehicle) and are currently being adopted by both NASA and industry. Experimental validation of these models is provided by comparison to results obtained from open literature and Limited Exclusive Right Distribution (LERD) sources, as well as dedicated experiments performed at Texas A&M. These experiments have been performed using a hydraulic/gas flow analog. Results of comparisons of DREA computations with experimental data, which include entrainment, thrust, and local profile information, are overall good. Computational time studies indicate that DREA provides considerably more information at a lower computational cost than contemporary ejector nozzle design models. Finally. physical limitations of the method, deviations from experimental data, potential improvements and alternative formulations are described. This report represents closure to the NASA Graduate Researchers Program. Versions of the DREA code and a user's guide may be obtained from the NASA Lewis Research Center.
Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology

NASA Astrophysics Data System (ADS)

Macioł, Piotr; Michalik, Kazimierz

2016-10-01

Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.
Parallel algorithms for mapping pipelined and parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Drag reduction in turbulent channel laden with finite-size oblate spheroids

NASA Astrophysics Data System (ADS)

Niazi Ardekani, Mehdi; Pedro Costa Collaboration; Wim-Paul Breugem Collaboration; Francesco Picano Collaboration; Luca Brandt Collaboration

2016-11-01

Suspensions of oblate rigid particles in a turbulent plane channel flow are investigated for different values of the particle volume fraction. We perform direct numerical simulations (DNS), using a direct-forcing immersed boundary method to account for the particle-fluid interactions, combined with a soft-sphere collision model and lubrication corrections for short-range particle-particle and particle-wall interactions. We show a clear drag reduction and turbulence attenuation in flows laden with oblate spheroids, both with respect to the single phase turbulent flow and to suspensions of rigid spheres. We explain the drag reduction by the lack of the particle layer at the wall, observed before for spherical particles. In addition, the special shape of the oblate particles creates a tendency to stay parallel to the wall in its vicinity, forming a shield of particles that prevents strong fluctuations in the outer layer to reach the wall and vice versa. Detailed statistics of the fluid and particle phase will be presented at the conference to explain the observed drag reduction. Supported by the European Research Council Grant No. ERC-2013-CoG-616186, TRITOS. The authors acknowledge computer time provided by SNIC (Swedish National Infrastructure for Computing) and the support from the COST Action MP1305: Flowing matter.
Development and Characterization of a Parallelizable Perfusion Bioreactor for 3D Cell Culture.

PubMed

Egger, Dominik; Fischer, Monica; Clementi, Andreas; Ribitsch, Volker; Hansmann, Jan; Kasper, Cornelia

2017-05-25

The three dimensional (3D) cultivation of stem cells in dynamic bioreactor systems is essential in the context of regenerative medicine. Still, there is a lack of bioreactor systems that allow the cultivation of multiple independent samples under different conditions while ensuring comprehensive control over the mechanical environment. Therefore, we developed a miniaturized, parallelizable perfusion bioreactor system with two different bioreactor chambers. Pressure sensors were also implemented to determine the permeability of biomaterials which allows us to approximate the shear stress conditions. To characterize the flow velocity and shear stress profile of a porous scaffold in both bioreactor chambers, a computational fluid dynamics analysis was performed. Furthermore, the mixing behavior was characterized by acquisition of the residence time distributions. Finally, the effects of the different flow and shear stress profiles of the bioreactor chambers on osteogenic differentiation of human mesenchymal stem cells were evaluated in a proof of concept study. In conclusion, the data from computational fluid dynamics and shear stress calculations were found to be predictable for relative comparison of the bioreactor geometries, but not for final determination of the optimal flow rate. However, we suggest that the system is beneficial for parallel dynamic cultivation of multiple samples for 3D cell culture processes.
Development and Characterization of a Parallelizable Perfusion Bioreactor for 3D Cell Culture

PubMed Central

Egger, Dominik; Fischer, Monica; Clementi, Andreas; Ribitsch, Volker; Hansmann, Jan; Kasper, Cornelia

2017-01-01

The three dimensional (3D) cultivation of stem cells in dynamic bioreactor systems is essential in the context of regenerative medicine. Still, there is a lack of bioreactor systems that allow the cultivation of multiple independent samples under different conditions while ensuring comprehensive control over the mechanical environment. Therefore, we developed a miniaturized, parallelizable perfusion bioreactor system with two different bioreactor chambers. Pressure sensors were also implemented to determine the permeability of biomaterials which allows us to approximate the shear stress conditions. To characterize the flow velocity and shear stress profile of a porous scaffold in both bioreactor chambers, a computational fluid dynamics analysis was performed. Furthermore, the mixing behavior was characterized by acquisition of the residence time distributions. Finally, the effects of the different flow and shear stress profiles of the bioreactor chambers on osteogenic differentiation of human mesenchymal stem cells were evaluated in a proof of concept study. In conclusion, the data from computational fluid dynamics and shear stress calculations were found to be predictable for relative comparison of the bioreactor geometries, but not for final determination of the optimal flow rate. However, we suggest that the system is beneficial for parallel dynamic cultivation of multiple samples for 3D cell culture processes. PMID:28952530
Computational Fluid Dynamics (CFD) Simulation of Hypersonic Turbine-Based Combined-Cycle (TBCC) Inlet Mode Transition

NASA Technical Reports Server (NTRS)

Slater, John W.; Saunders, John D.

2010-01-01

Methods of computational fluid dynamics were applied to simulate the aerodynamics within the turbine flowpath of a turbine-based combined-cycle propulsion system during inlet mode transition at Mach 4. Inlet mode transition involved the rotation of a splitter cowl to close the turbine flowpath to allow the full operation of a parallel dual-mode ramjet/scramjet flowpath. Steady-state simulations were performed at splitter cowl positions of 0deg, -2deg, -4deg, and -5.7deg, at which the turbine flowpath was closed half way. The simulations satisfied one objective of providing a greater understanding of the flow during inlet mode transition. Comparisons of the simulation results with wind-tunnel test data addressed another objective of assessing the applicability of the simulation methods for simulating inlet mode transition. The simulations showed that inlet mode transition could occur in a stable manner and that accurate modeling of the interactions among the shock waves, boundary layers, and porous bleed regions was critical for evaluating the inlet static and total pressures, bleed flow rates, and bleed plenum pressures. The simulations compared well with some of the wind-tunnel data, but uncertainties in both the windtunnel data and simulations prevented a formal evaluation of the accuracy of the simulation methods.
Nonlinear three-dimensional verification of the SPECYL and PIXIE3D magnetohydrodynamics codes for fusion plasmas

NASA Astrophysics Data System (ADS)

Bonfiglio, D.; Chacón, L.; Cappello, S.

2010-08-01

With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacón, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code in cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.
Nonlinear three-dimensional verification of the SPECYL and PIXIE3D magnetohydrodynamics codes for fusion plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonfiglio, Daniele; Chacon, Luis; Cappello, Susanna

2010-01-01

With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacon, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code inmore » cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.« less
A discontinuous Galerkin conservative level set scheme for interface capturing in multiphase flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Owkes, Mark, E-mail: mfc86@cornell.edu; Desjardins, Olivier

2013-09-15

The accurate conservative level set (ACLS) method of Desjardins et al. [O. Desjardins, V. Moureau, H. Pitsch, An accurate conservative level set/ghost fluid method for simulating turbulent atomization, J. Comput. Phys. 227 (18) (2008) 8395–8416] is extended by using a discontinuous Galerkin (DG) discretization. DG allows for the scheme to have an arbitrarily high order of accuracy with the smallest possible computational stencil resulting in an accurate method with good parallel scaling. This work includes a DG implementation of the level set transport equation, which moves the level set with the flow field velocity, and a DG implementation of themore » reinitialization equation, which is used to maintain the shape of the level set profile to promote good mass conservation. A near second order converging interface curvature is obtained by following a height function methodology (common amongst volume of fluid schemes) in the context of the conservative level set. Various numerical experiments are conducted to test the properties of the method and show excellent results, even on coarse meshes. The tests include Zalesak’s disk, two-dimensional deformation of a circle, time evolution of a standing wave, and a study of the Kelvin–Helmholtz instability. Finally, this novel methodology is employed to simulate the break-up of a turbulent liquid jet.« less
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

NASA Astrophysics Data System (ADS)

Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

1997-12-01

Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
CSM parallel structural methods research

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1989-01-01

Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.

Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
The science of computing - Parallel computation

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
Using Digital Computer Field Mapping of Outcrops to Examine the Preservation of High-P Rocks During Pervasive, Retrograde Greenschist Fluid Infiltration, Tinos, Cyclades Archipelago, Greece

NASA Astrophysics Data System (ADS)

Breeding, C. M.; Ague, J. J.; Broecker, M.

2001-12-01

Digital field mapping of outcrops on the island of Tinos, Greece, was undertaken to investigate the nature of retrograde fluid infiltration during exhumation of high-P metamorphic rocks of the Attic-Cycladic blueschist belt. High-resolution digital photographs of outcrops were taken and loaded into graphics editing software on a portable, belt-mounted computer in the field. Geologic features from outcrops were drawn and labeled on the digital images using the software in real-time. The ability to simultaneously identify geologic features in outcrops and digitize those features onto digital photographs in the field allows the creation of detailed, field-verified, outcrop-scale maps that aid in geologic interpretation. During Cretaceous-Eocene subduction in the Cyclades, downgoing crustal material was metamorphosed to eclogite and blueschist facies. Subsequent Oligocene-Miocene exhumation of the high-P rocks was accompanied by pervasive, retrograde fluid infiltration resulting in nearly complete greenschist facies overprinting. On Tinos, most high-P rocks have undergone intense retrogression; however, adjacent to thick marble horizons with completely retrograded contact zones, small (sub km-scale) enclaves of high-P rocks (blueschist and minor eclogite facies) were preserved. Field observations suggest that the remnant high-P zones consist mostly of massive metabasic rocks and minor adjacent metasediments. Within the enclaves, detailed digital outcrop maps reveal that greenschist retrogression increases in intensity outward from the center, implying interaction with a fluid flowing along enclave perimeters. Permeability contrasts could not have been solely responsible for preservation of the high-P rocks, as similar rock suites distal to marble contacts were completely overprinted. We conclude that the retrograded contacts of the marble units served as high-permeability conduits for regional retrograde fluid flow. Pervasive, layer-parallel flow through metasediments would have been drawn into these more permeable flow channels. Deflections in fluid flow paths toward the high flux contacts likely caused retrograde fluids to flow around the enclaves, preserving the zones of "dry," unretrograded high-P rocks near marble horizons. Digital mapping of outcrops is a unique method for direct examination of the relationships between geologic structure, lithology, and mineral assemblage variation in the field. Outcrop mapping in the Attic-Cycladic blueschist belt has revealed that regional fluid flow along contacts can have important implications for the large-scale distribution of mineral assemblages in metamorphic terranes.
System-wide power management control via clock distribution network

DOEpatents

Coteus, Paul W.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Reed, Don D.

2015-05-19

An apparatus, method and computer program product for automatically controlling power dissipation of a parallel computing system that includes a plurality of processors. A computing device issues a command to the parallel computing system. A clock pulse-width modulator encodes the command in a system clock signal to be distributed to the plurality of processors. The plurality of processors in the parallel computing system receive the system clock signal including the encoded command, and adjusts power dissipation according to the encoded command.
Parallel Computing:. Some Activities in High Energy Physics

NASA Astrophysics Data System (ADS)

Willers, Ian

This paper examines some activities in High Energy Physics that utilise parallel computing. The topic includes all computing from the proposed SIMD front end detectors, the farming applications, high-powered RISC processors and the large machines in the computer centers. We start by looking at the motivation behind using parallelism for general purpose computing. The developments around farming are then described from its simplest form to the more complex system in Fermilab. Finally, there is a list of some developments that are happening close to the experiments.
A parallel interaction potential approach coupled with the immersed boundary method for fully resolved simulations of deformable interfaces and membranes

NASA Astrophysics Data System (ADS)

Spandan, Vamsi; Meschini, Valentina; Ostilla-Mónico, Rodolfo; Lohse, Detlef; Querzoli, Giorgio; de Tullio, Marco D.; Verzicco, Roberto

2017-11-01

In this paper we show and discuss how the deformation dynamics of closed liquid-liquid interfaces (for example drops and bubbles) can be replicated with use of a phenomenological interaction potential model. This new approach to simulate liquid-liquid interfaces is based on the fundamental principle of minimum potential energy where the total potential energy depends on the extent of deformation of a spring network distributed on the surface of the immersed drop or bubble. Simulating liquid-liquid interfaces using this model require computing ad-hoc elastic constants which is done through a reverse-engineered approach. The results from our simulations agree very well with previous studies on the deformation of drops in standard flow configurations such as a deforming drop in a shear flow or cross flow. The interaction potential model is highly versatile, computationally efficient and can be easily incorporated into generic single phase fluid solvers to also simulate complex fluid-structure interaction problems. This is shown by simulating flow in the left ventricle of the heart with mechanical and natural mitral valves where the imposed flow, motion of ventricle and valves dynamically govern the behaviour of each other. Results from these simulations are compared with ad-hoc in-house experimental measurements. Finally, we present a simple and easy to implement parallelisation scheme, as high performance computing is unavoidable when studying large scale problems involving several thousands of simultaneously deforming bodies in highly turbulent flows.
GPU Multi-Scale Particle Tracking and Multi-Fluid Simulations of the Radiation Belts

NASA Astrophysics Data System (ADS)

Ziemba, T.; Carscadden, J.; O'Donnell, D.; Winglee, R.; Harnett, E.; Cash, M.

2007-12-01

The properties of the radiation belts can vary dramatically under the influence of magnetic storms and storm-time substorms. The task of understanding and predicting radiation belt properties is made difficult because their properties determined by global processes as well as small-scale wave-particle interactions. A full solution to the problem will require major innovations in technique and computer hardware. The proposed work will demonstrates liked particle tracking codes with new multi-scale/multi-fluid global simulations that provide the first means to include small-scale processes within the global magnetospheric context. A large hurdle to the problem is having sufficient computer hardware that is able to handle the dissipate temporal and spatial scale sizes. A major innovation of the work is that the codes are designed to run of graphics processing units (GPUs). GPUs are intrinsically highly parallelized systems that provide more than an order of magnitude computing speed over a CPU based systems, for little more cost than a high end-workstation. Recent advancements in GPU technologies allow for full IEEE float specifications with performance up to several hundred GFLOPs per GPU and new software architectures have recently become available to ease the transition from graphics based to scientific applications. This allows for a cheap alternative to standard supercomputing methods and should increase the time to discovery. A demonstration of the code pushing more than 500,000 particles faster than real time is presented, and used to provide new insight into radiation belt dynamics.
Implementation of DFT application on ternary optical computer

NASA Astrophysics Data System (ADS)

Junjie, Peng; Youyi, Fu; Xiaofeng, Zhang; Shuai, Kong; Xinyu, Wei

2018-03-01

As its characteristics of huge number of data bits and low energy consumption, optical computing may be used in the applications such as DFT etc. which needs a lot of computation and can be implemented in parallel. According to this, DFT implementation methods in full parallel as well as in partial parallel are presented. Based on resources ternary optical computer (TOC), extensive experiments were carried out. Experimental results show that the proposed schemes are correct and feasible. They provide a foundation for further exploration of the applications on TOC that needs a large amount calculation and can be processed in parallel.
CFD code evaluation for internal flow modeling

NASA Technical Reports Server (NTRS)

Chung, T. J.

1990-01-01

Research on the computational fluid dynamics (CFD) code evaluation with emphasis on supercomputing in reacting flows is discussed. Advantages of unstructured grids, multigrids, adaptive methods, improved flow solvers, vector processing, parallel processing, and reduction of memory requirements are discussed. As examples, researchers include applications of supercomputing to reacting flow Navier-Stokes equations including shock waves and turbulence and combustion instability problems associated with solid and liquid propellants. Evaluation of codes developed by other organizations are not included. Instead, the basic criteria for accuracy and efficiency have been established, and some applications on rocket combustion have been made. Research toward an ultimate goal, the most accurate and efficient CFD code, is in progress and will continue for years to come.
Modeling and Visualizing Flow of Chemical Agents Across Complex Terrain

NASA Technical Reports Server (NTRS)

Kao, David; Kramer, Marc; Chaderjian, Neal

2005-01-01

Release of chemical agents across complex terrain presents a real threat to homeland security. Modeling and visualization tools are being developed that capture flow fluid terrain interaction as well as point dispersal downstream flow paths. These analytic tools when coupled with UAV atmospheric observations provide predictive capabilities to allow for rapid emergency response as well as developing a comprehensive preemptive counter-threat evacuation plan. The visualization tools involve high-end computing and massive parallel processing combined with texture mapping. We demonstrate our approach across a mountainous portion of North California under two contrasting meteorological conditions. Animations depicting flow over this geographical location provide immediate assistance in decision support and crisis management.
Thermal instability in post-flare plasmas

NASA Technical Reports Server (NTRS)

Antiochos, S. K.

1976-01-01

The cooling of post-flare plasmas is discussed and the formation of loop prominences is explained as due to a thermal instability. A one-dimensional model was developed for active loop prominences. Only the motion and heat fluxes parallel to the existing magnetic fields are considered. The relevant size scales and time scales are such that single-fluid MHD equations are valid. The effects of gravity, the geometry of the field and conduction losses to the chromosphere are included. A computer code was constructed to solve the model equations. Basically, the system is treated as an initial value problem (with certain boundary conditions at the chromosphere-corona transition region), and a two-step time differencing scheme is used.
Automatic Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)

2000-01-01

We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
Thermal Ablation Modeling for Silicate Materials

NASA Technical Reports Server (NTRS)

Chen, Yih-Kanq

2016-01-01

A general thermal ablation model for silicates is proposed. The model includes the mass losses through the balance between evaporation and condensation, and through the moving molten layer driven by surface shear force and pressure gradient. This model can be applied in the ablation simulation of the meteoroid and the glassy ablator for spacecraft Thermal Protection Systems. Time-dependent axisymmetric computations are performed by coupling the fluid dynamics code, Data-Parallel Line Relaxation program, with the material response code, Two-dimensional Implicit Thermal Ablation simulation program, to predict the mass lost rates and shape change. The predicted mass loss rates will be compared with available data for model validation, and parametric studies will also be performed for meteoroid earth entry conditions.
Special purpose parallel computer architecture for real-time control and simulation in robotic applications

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1993-01-01

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Exact parallel algorithms for some members of the traveling salesman problem family

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pekny, J.F.

1989-01-01

The traveling salesman problem and its many generalizations comprise one of the best known combinatorial optimization problem families. Most members of the family are NP-complete problems so that exact algorithms require an unpredictable and sometimes large computational effort. Parallel computers offer hope for providing the power required to meet these demands. A major barrier to applying parallel computers is the lack of parallel algorithms. The contributions presented in this thesis center around new exact parallel algorithms for the asymmetric traveling salesman problem (ATSP), prize collecting traveling salesman problem (PCTSP), and resource constrained traveling salesman problem (RCTSP). The RCTSP is amore » particularly difficult member of the family since finding a feasible solution is an NP-complete problem. An exact sequential algorithm is also presented for the directed hamiltonian cycle problem (DHCP). The DHCP algorithm is superior to current heuristic approaches and represents the first exact method applicable to large graphs. Computational results presented for each of the algorithms demonstrates the effectiveness of combining efficient algorithms with parallel computing methods. Performance statistics are reported for randomly generated ATSPs with 7,500 cities, PCTSPs with 200 cities, RCTSPs with 200 cities, DHCPs with 3,500 vertices, and assignment problems of size 10,000. Sequential results were collected on a Sun 4/260 engineering workstation, while parallel results were collected using a 14 and 100 processor BBN Butterfly Plus computer. The computational results represent the largest instances ever solved to optimality on any type of computer.« less
Use of parallel computing in mass processing of laser data

NASA Astrophysics Data System (ADS)

Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

2015-12-01

The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
Hydrocarbon fluid, ejector refrigeration system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kowalski, G.J.; Foster, A.R.

1993-08-31

A refrigeration system is described comprising: a vapor ejector cycle including a working fluid having a property such that entropy of the working fluid when in a saturated vapor state decreases as pressure decreases, the vapor ejector cycle comprising: a condenser located on a common fluid flow path; a diverter located downstream from the condenser for diverting the working fluid into a primary fluid flow path and a secondary fluid flow path parallel to the primary fluid flow path; an evaporator located on the secondary fluid flow path; an expansion device located on the secondary fluid flow path upstream ofmore » the evaporator; a boiler located on the primary fluid flow path parallel to the evaporator for boiling the working fluid, the boiler comprising an axially extending core region having a substantially constant cross sectional area and a porous capillary region surrounding the core region, the core region extending a length sufficient to produce a near sonic velocity saturated vapor; and an ejector having an outlet in fluid communication with the inlet of the condenser and an inlet in fluid communication with the outlet of the evaporator and the outlet of the boiler and in which the flows of the working fluid from the evaporator and the boiler are mixed and the pressure of the working fluid is increased to at least the pressure of the condenser, the ejector inlet, located downstream from the axially extending core region, including a primary nozzle located sufficiently close to the outlet of the boiler to minimize a pressure drop between the boiler and the primary nozzle, the primary nozzle of the ejector including a converging section having an included angle and length preselected to receive the working fluid from the boiler as a near sonic velocity saturated vapor.« less
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

PubMed

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.