NASA Technical Reports Server (NTRS)
Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry
1998-01-01
Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.
2017-04-13
modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Ye; Ma, Xiaosong; Liu, Qing Gary
2015-01-01
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++
NASA Technical Reports Server (NTRS)
Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.
1996-01-01
This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
NASA Astrophysics Data System (ADS)
Favata, Antonino; Micheletti, Andrea; Ryu, Seunghwa; Pugno, Nicola M.
2016-10-01
An analytical benchmark and a simple consistent Mathematica program are proposed for graphene and carbon nanotubes, that may serve to test any molecular dynamics code implemented with REBO potentials. By exploiting the benchmark, we checked results produced by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) when adopting the second generation Brenner potential, we made evident that this code in its current implementation produces results which are offset from those of the benchmark by a significant amount, and provide evidence of the reason.
Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...
2015-12-21
This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
NASA Technical Reports Server (NTRS)
Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele
2004-01-01
In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Data Race Benchmark Collection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua
2017-03-21
This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
Development and Application of a Parallel LCAO Cluster Method
NASA Astrophysics Data System (ADS)
Patton, David C.
1997-08-01
CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
A comparison of five benchmarks
NASA Technical Reports Server (NTRS)
Huss, Janice E.; Pennline, James A.
1987-01-01
Five benchmark programs were obtained and run on the NASA Lewis CRAY X-MP/24. A comparison was made between the programs codes and between the methods for calculating performance figures. Several multitasking jobs were run to gain experience in how parallel performance is measured.
Spherical harmonic results for the 3D Kobayashi Benchmark suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, P N; Chang, B; Hanebutte, U R
1999-03-02
Spherical harmonic solutions are presented for the Kobayashi benchmark suite. The results were obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
Rubus: A compiler for seamless and extensible parallelism.
Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism
Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
libvdwxc: a library for exchange-correlation functionals in the vdW-DF family
NASA Astrophysics Data System (ADS)
Hjorth Larsen, Ask; Kuisma, Mikael; Löfgren, Joakim; Pouillon, Yann; Erhart, Paul; Hyldgaard, Per
2017-09-01
We present libvdwxc, a general library for evaluating the energy and potential for the family of vdW-DF exchange-correlation functionals. libvdwxc is written in C and provides an efficient implementation of the vdW-DF method and can be interfaced with various general-purpose DFT codes. Currently, the Gpaw and Octopus codes implement interfaces to libvdwxc. The present implementation emphasizes scalability and parallel performance, and thereby enables ab initio calculations of nanometer-scale complexes. The numerical accuracy is benchmarked on the S22 test set whereas parallel performance is benchmarked on ligand-protected gold nanoparticles ({{Au}}144{({{SC}}11{{NH}}25)}60) up to 9696 atoms.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.
Code Parallelization with CAPO: A User Manual
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2001-01-01
A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Turney, Raymond D.
2001-01-01
This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Understanding the Cray X1 System
NASA Technical Reports Server (NTRS)
Cheung, Samson
2004-01-01
This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.
Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal
2015-06-01
Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
Interfacing Computer Aided Parallelization and Performance Analysis
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)
2003-01-01
When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
NASA Technical Reports Server (NTRS)
Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Nonlinear viscoplasticity in ASPECT: benchmarking and applications to subduction
NASA Astrophysics Data System (ADS)
Glerum, Anne; Thieulot, Cedric; Fraters, Menno; Blom, Constantijn; Spakman, Wim
2018-03-01
ASPECT (Advanced Solver for Problems in Earth's ConvecTion) is a massively parallel finite element code originally designed for modeling thermal convection in the mantle with a Newtonian rheology. The code is characterized by modern numerical methods, high-performance parallelism and extensibility. This last characteristic is illustrated in this work: we have extended the use of ASPECT from global thermal convection modeling to upper-mantle-scale applications of subduction. Subduction modeling generally requires the tracking of multiple materials with different properties and with nonlinear viscous and viscoplastic rheologies. To this end, we implemented a frictional plasticity criterion that is combined with a viscous diffusion and dislocation creep rheology. Because ASPECT uses compositional fields to represent different materials, all material parameters are made dependent on a user-specified number of fields. The goal of this paper is primarily to describe and verify our implementations of complex, multi-material rheology by reproducing the results of four well-known two-dimensional benchmarks: the indentor benchmark, the brick experiment, the sandbox experiment and the slab detachment benchmark. Furthermore, we aim to provide hands-on examples for prospective users by demonstrating the use of multi-material viscoplasticity with three-dimensional, thermomechanical models of oceanic subduction, putting ASPECT on the map as a community code for high-resolution, nonlinear rheology subduction modeling.
Parallelization of PANDA discrete ordinates code using spatial decomposition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humbert, P.
2006-07-01
We present the parallel method, based on spatial domain decomposition, implemented in the 2D and 3D versions of the discrete Ordinates code PANDA. The spatial mesh is orthogonal and the spatial domain decomposition is Cartesian. For 3D problems a 3D Cartesian domain topology is created and the parallel method is based on a domain diagonal plane ordered sweep algorithm. The parallel efficiency of the method is improved by directions and octants pipelining. The implementation of the algorithm is straightforward using MPI blocking point to point communications. The efficiency of the method is illustrated by an application to the 3D-Ext C5G7more » benchmark of the OECD/NEA. (authors)« less
Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry
2000-01-01
A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.
INL Results for Phases I and III of the OECD/NEA MHTGR-350 Benchmark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerhard Strydom; Javier Ortensi; Sonat Sen
2013-09-01
The Idaho National Laboratory (INL) Very High Temperature Reactor (VHTR) Technology Development Office (TDO) Methods Core Simulation group led the construction of the Organization for Economic Cooperation and Development (OECD) Modular High Temperature Reactor (MHTGR) 350 MW benchmark for comparing and evaluating prismatic VHTR analysis codes. The benchmark is sponsored by the OECD's Nuclear Energy Agency (NEA), and the project will yield a set of reference steady-state, transient, and lattice depletion problems that can be used by the Department of Energy (DOE), the Nuclear Regulatory Commission (NRC), and vendors to assess their code suits. The Methods group is responsible formore » defining the benchmark specifications, leading the data collection and comparison activities, and chairing the annual technical workshops. This report summarizes the latest INL results for Phase I (steady state) and Phase III (lattice depletion) of the benchmark. The INSTANT, Pronghorn and RattleSnake codes were used for the standalone core neutronics modeling of Exercise 1, and the results obtained from these codes are compared in Section 4. Exercise 2 of Phase I requires the standalone steady-state thermal fluids modeling of the MHTGR-350 design, and the results for the systems code RELAP5-3D are discussed in Section 5. The coupled neutronics and thermal fluids steady-state solution for Exercise 3 are reported in Section 6, utilizing the newly developed Parallel and Highly Innovative Simulation for INL Code System (PHISICS)/RELAP5-3D code suit. Finally, the lattice depletion models and results obtained for Phase III are compared in Section 7. The MHTGR-350 benchmark proved to be a challenging simulation set of problems to model accurately, and even with the simplifications introduced in the benchmark specification this activity is an important step in the code-to-code verification of modern prismatic VHTR codes. A final OECD/NEA comparison report will compare the Phase I and III results of all other international participants in 2014, while the remaining Phase II transient case results will be reported in 2015.« less
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
NAS Grid Benchmarks: A Tool for Grid Space Exploration
NASA Technical Reports Server (NTRS)
Frumkin, Michael; VanderWijngaart, Rob F.; Biegel, Bryan (Technical Monitor)
2001-01-01
We present an approach for benchmarking services provided by computational Grids. It is based on the NAS Parallel Benchmarks (NPB) and is called NAS Grid Benchmark (NGB) in this paper. We present NGB as a data flow graph encapsulating an instance of an NPB code in each graph node, which communicates with other nodes by sending/receiving initialization data. These nodes may be mapped to the same or different Grid machines. Like NPB, NGB will specify several different classes (problem sizes). NGB also specifies the generic Grid services sufficient for running the bench-mark. The implementor has the freedom to choose any specific Grid environment. However, we describe a reference implementation in Java, and present some scenarios for using NGB.
1996-01-01
Real-Time 19 5 Conclusion 23 List of References 25 ii LIST OF FIGURES FIGURE PAGE 3-1 Test Bench Pseudo Code 7 3-2 Fast Convolution...3-1 shows pseudo - code for a test bench with two application nodes. The outer test bench wrapper consists of three functions: pipeline_init, pipeline...exit_func); Figure 3-1. Test Bench Pseudo Code The application wrapper is contained in the pipeline routine and similarly consists of an
Support of Multidimensional Parallelism in the OpenMP Programming Model
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele
2003-01-01
OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG
NASA Astrophysics Data System (ADS)
Griffiths, M. K.; Fedun, V.; Erdélyi, R.
2015-03-01
Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Hybrid parallel code acceleration methods in full-core reactor physics calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Courau, T.; Plagne, L.; Ponicot, A.
2012-07-01
When dealing with nuclear reactor calculation schemes, the need for three dimensional (3D) transport-based reference solutions is essential for both validation and optimization purposes. Considering a benchmark problem, this work investigates the potential of discrete ordinates (Sn) transport methods applied to 3D pressurized water reactor (PWR) full-core calculations. First, the benchmark problem is described. It involves a pin-by-pin description of a 3D PWR first core, and uses a 8-group cross-section library prepared with the DRAGON cell code. Then, a convergence analysis is performed using the PENTRAN parallel Sn Cartesian code. It discusses the spatial refinement and the associated angular quadraturemore » required to properly describe the problem physics. It also shows that initializing the Sn solution with the EDF SPN solver COCAGNE reduces the number of iterations required to converge by nearly a factor of 6. Using a best estimate model, PENTRAN results are then compared to multigroup Monte Carlo results obtained with the MCNP5 code. Good consistency is observed between the two methods (Sn and Monte Carlo), with discrepancies that are less than 25 pcm for the k{sub eff}, and less than 2.1% and 1.6% for the flux at the pin-cell level and for the pin-power distribution, respectively. (authors)« less
NASA Technical Reports Server (NTRS)
1991-01-01
Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
NASA Technical Reports Server (NTRS)
Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit
NASA Technical Reports Server (NTRS)
Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;
1998-01-01
Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang
2017-10-01
The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
PFLOTRAN Verification: Development of a Testing Suite to Ensure Software Quality
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Frederick, J. M.
2016-12-01
In scientific computing, code verification ensures the reliability and numerical accuracy of a model simulation by comparing the simulation results to experimental data or known analytical solutions. The model is typically defined by a set of partial differential equations with initial and boundary conditions, and verification ensures whether the mathematical model is solved correctly by the software. Code verification is especially important if the software is used to model high-consequence systems which cannot be physically tested in a fully representative environment [Oberkampf and Trucano (2007)]. Justified confidence in a particular computational tool requires clarity in the exercised physics and transparency in its verification process with proper documentation. We present a quality assurance (QA) testing suite developed by Sandia National Laboratories that performs code verification for PFLOTRAN, an open source, massively-parallel subsurface simulator. PFLOTRAN solves systems of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport processes in porous media. PFLOTRAN's QA test suite compares the numerical solutions of benchmark problems in heat and mass transport against known, closed-form, analytical solutions, including documentation of the exercised physical process models implemented in each PFLOTRAN benchmark simulation. The QA test suite development strives to follow the recommendations given by Oberkampf and Trucano (2007), which describes four essential elements in high-quality verification benchmark construction: (1) conceptual description, (2) mathematical description, (3) accuracy assessment, and (4) additional documentation and user information. Several QA tests within the suite will be presented, including details of the benchmark problems and their closed-form analytical solutions, implementation of benchmark problems in PFLOTRAN simulations, and the criteria used to assess PFLOTRAN's performance in the code verification procedure. References Oberkampf, W. L., and T. G. Trucano (2007), Verification and Validation Benchmarks, SAND2007-0853, 67 pgs., Sandia National Laboratories, Albuquerque, NM.
High-performance computational fluid dynamics: a custom-code approach
NASA Astrophysics Data System (ADS)
Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.
2016-07-01
We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.
Benchmarking GPU and CPU codes for Heisenberg spin glass over-relaxation
NASA Astrophysics Data System (ADS)
Bernaschi, M.; Parisi, G.; Parisi, L.
2011-06-01
We present a set of possible implementations for Graphics Processing Units (GPU) of the Over-relaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/s of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits the GPU shared memory further reduces this time. Such results are compared with those obtained by means of a highly-tuned vector-parallel code on latest generation multi-core CPUs.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program
NASA Astrophysics Data System (ADS)
Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.
2018-02-01
We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
Synergia: an accelerator modeling tool with 3-D space charge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amundson, James F.; Spentzouris, P.; /Fermilab
2004-07-01
High precision modeling of space-charge effects, together with accurate treatment of single-particle dynamics, is essential for designing future accelerators as well as optimizing the performance of existing machines. We describe Synergia, a high-fidelity parallel beam dynamics simulation package with fully three dimensional space-charge capabilities and a higher order optics implementation. We describe the computational techniques, the advanced human interface, and the parallel performance obtained using large numbers of macroparticles. We also perform code benchmarks comparing to semi-analytic results and other codes. Finally, we present initial results on particle tune spread, beam halo creation, and emittance growth in the Fermilab boostermore » accelerator.« less
NASA Astrophysics Data System (ADS)
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.
2018-05-01
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
Scalability and Portability of Two Parallel Implementations of ADI
NASA Technical Reports Server (NTRS)
Phung, Thanh; VanderWijngaart, Rob F.
1994-01-01
Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Spherical Harmonic Solutions to the 3D Kobayashi Benchmark Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, P.N.; Chang, B.; Hanebutte, U.R.
1999-12-29
Spherical harmonic solutions of order 5, 9 and 21 on spatial grids containing up to 3.3 million cells are presented for the Kobayashi benchmark suite. This suite of three problems with simple geometry of pure absorber with large void region was proposed by Professor Kobayashi at an OECD/NEA meeting in 1996. Each of the three problems contains a source, a void and a shield region. Problem 1 can best be described as a box in a box problem, where a source region is surrounded by a square void region which itself is embedded in a square shield region. Problems 2more » and 3 represent a shield with a void duct. Problem 2 having a straight and problem 3 a dog leg shaped duct. A pure absorber and a 50% scattering case are considered for each of the three problems. The solutions have been obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The Ardra code takes advantage of a two-level parallelization strategy, which combines message passing between processing nodes and thread based parallelism amongst processors on each node. All calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.« less
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Nonlinear 3D visco-resistive MHD modeling of fusion plasmas: a comparison between numerical codes
NASA Astrophysics Data System (ADS)
Bonfiglio, D.; Chacon, L.; Cappello, S.
2008-11-01
Fluid plasma models (and, in particular, the MHD model) are extensively used in the theoretical description of laboratory and astrophysical plasmas. We present here a successful benchmark between two nonlinear, three-dimensional, compressible visco-resistive MHD codes. One is the fully implicit, finite volume code PIXIE3D [1,2], which is characterized by many attractive features, notably the generalized curvilinear formulation (which makes the code applicable to different geometries) and the possibility to include in the computation the energy transport equation and the extended MHD version of Ohm's law. In addition, the parallel version of the code features excellent scalability properties. Results from this code, obtained in cylindrical geometry, are compared with those produced by the semi-implicit cylindrical code SpeCyl, which uses finite differences radially, and spectral formulation in the other coordinates [3]. Both single and multi-mode simulations are benchmarked, regarding both reversed field pinch (RFP) and ohmic tokamak magnetic configurations. [1] L. Chacon, Computer Physics Communications 163, 143 (2004). [2] L. Chacon, Phys. Plasmas 15, 056103 (2008). [3] S. Cappello, Plasma Phys. Control. Fusion 46, B313 (2004) & references therein.
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.
1991-01-01
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
A New Code SORD for Simulation of Polarized Light Scattering in the Earth Atmosphere
NASA Technical Reports Server (NTRS)
Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Aliaksandr; Holben, Brent
2016-01-01
We report a new publicly available radiative transfer (RT) code for numerical simulation of polarized light scattering in plane-parallel atmosphere of the Earth. Using 44 benchmark tests, we prove high accuracy of the new RT code, SORD (Successive ORDers of scattering). We describe capabilities of SORD and show run time for each test on two different machines. At present, SORD is supposed to work as part of the Aerosol Robotic NETwork (AERONET) inversion algorithm. For natural integration with the AERONET software, SORD is coded in Fortran 90/95. The code is available by email request from the corresponding (first) author or from ftp://climate1.gsfc.nasa.gov/skorkin/SORD/.
Automatic Data Traffic Control on DSM Architecture
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)
2000-01-01
We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
GAPD: a GPU-accelerated atom-based polychromatic diffraction simulation code.
E, J C; Wang, L; Chen, S; Zhang, Y Y; Luo, S N
2018-03-01
GAPD, a graphics-processing-unit (GPU)-accelerated atom-based polychromatic diffraction simulation code for direct, kinematics-based, simulations of X-ray/electron diffraction of large-scale atomic systems with mono-/polychromatic beams and arbitrary plane detector geometries, is presented. This code implements GPU parallel computation via both real- and reciprocal-space decompositions. With GAPD, direct simulations are performed of the reciprocal lattice node of ultralarge systems (∼5 billion atoms) and diffraction patterns of single-crystal and polycrystalline configurations with mono- and polychromatic X-ray beams (including synchrotron undulator sources), and validation, benchmark and application cases are presented.
Vector radiative transfer code SORD: Performance analysis and quick start guide
NASA Astrophysics Data System (ADS)
Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Alexander; Holben, Brent; Kokhanovsky, Alexander
2017-10-01
We present a new open source polarized radiative transfer code SORD written in Fortran 90/95. SORD numerically simulates propagation of monochromatic solar radiation in a plane-parallel atmosphere over a reflecting surface using the method of successive orders of scattering (hence the name). Thermal emission is ignored. We did not improve the method in any way, but report the accuracy and runtime in 52 benchmark scenarios. This paper also serves as a quick start user's guide for the code available from ftp://maiac.gsfc.nasa.gov/pub/skorkin, from the JQSRT website, or from the corresponding (first) author.
Heterogeneous Distributed Computing for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Sunderam, Vaidy S.
1998-01-01
The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
Automation of Data Traffic Control on DSM Architecture
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry
2001-01-01
The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
MoMaS reactive transport benchmark using PFLOTRAN
NASA Astrophysics Data System (ADS)
Park, H.
2017-12-01
MoMaS benchmark was developed to enhance numerical simulation capability for reactive transport modeling in porous media. The benchmark was published in late September of 2009; it is not taken from a real chemical system, but realistic and numerically challenging tests. PFLOTRAN is a state-of-art massively parallel subsurface flow and reactive transport code that is being used in multiple nuclear waste repository projects at Sandia National Laboratories including Waste Isolation Pilot Plant and Used Fuel Disposition. MoMaS benchmark has three independent tests with easy, medium, and hard chemical complexity. This paper demonstrates how PFLOTRAN is applied to this benchmark exercise and shows results of the easy benchmark test case which includes mixing of aqueous components and surface complexation. Surface complexations consist of monodentate and bidentate reactions which introduces difficulty in defining selectivity coefficient if the reaction applies to a bulk reference volume. The selectivity coefficient becomes porosity dependent for bidentate reaction in heterogeneous porous media. The benchmark is solved by PFLOTRAN with minimal modification to address the issue and unit conversions were made properly to suit PFLOTRAN.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischer, G.A.
2011-07-01
Document available in abstract form only, full text of document follows: The dosimetry from the H. B. Robinson Unit 2 Pressure Vessel Benchmark is analyzed with a suite of Westinghouse-developed codes and data libraries. The radiation transport from the reactor core to the surveillance capsule and ex-vessel locations is performed by RAPTOR-M3G, a parallel deterministic radiation transport code that calculates high-resolution neutron flux information in three dimensions. The cross-section library used in this analysis is the ALPAN library, an Evaluated Nuclear Data File (ENDF)/B-VII.0-based library designed for reactor dosimetry and fluence analysis applications. Dosimetry is evaluated with the industry-standard SNLRMLmore » reactor dosimetry cross-section data library. (authors)« less
Two-fluid dusty shocks: simple benchmarking problems and applications to protoplanetary discs
NASA Astrophysics Data System (ADS)
Lehmann, Andrew; Wardle, Mark
2018-05-01
The key role that dust plays in the interstellar medium has motivated the development of numerical codes designed to study the coupled evolution of dust and gas in systems such as turbulent molecular clouds and protoplanetary discs. Drift between dust and gas has proven to be important as well as numerically challenging. We provide simple benchmarking problems for dusty gas codes by numerically solving the two-fluid dust-gas equations for steady, plane-parallel shock waves. The two distinct shock solutions to these equations allow a numerical code to test different forms of drag between the two fluids, the strength of that drag and the dust to gas ratio. We also provide an astrophysical application of J-type dust-gas shocks to studying the structure of accretion shocks on to protoplanetary discs. We find that two-fluid effects are most important for grains larger than 1 μm, and that the peak dust temperature within an accretion shock provides a signature of the dust-to-gas ratio of the infalling material.
NASA Astrophysics Data System (ADS)
Barker, H. W.; Stephens, G. L.; Partain, P. T.; Bergman, J. W.; Bonnel, B.; Campana, K.; Clothiaux, E. E.; Clough, S.; Cusack, S.; Delamere, J.; Edwards, J.; Evans, K. F.; Fouquart, Y.; Freidenreich, S.; Galin, V.; Hou, Y.; Kato, S.; Li, J.; Mlawer, E.; Morcrette, J.-J.; O'Hirok, W.; Räisänen, P.; Ramaswamy, V.; Ritter, B.; Rozanov, E.; Schlesinger, M.; Shibata, K.; Sporyshev, P.; Sun, Z.; Wendisch, M.; Wood, N.; Yang, F.
2003-08-01
The primary purpose of this study is to assess the performance of 1D solar radiative transfer codes that are used currently both for research and in weather and climate models. Emphasis is on interpretation and handling of unresolved clouds. Answers are sought to the following questions: (i) How well do 1D solar codes interpret and handle columns of information pertaining to partly cloudy atmospheres? (ii) Regardless of the adequacy of their assumptions about unresolved clouds, do 1D solar codes perform as intended?One clear-sky and two plane-parallel, homogeneous (PPH) overcast cloud cases serve to elucidate 1D model differences due to varying treatments of gaseous transmittances, cloud optical properties, and basic radiative transfer. The remaining four cases involve 3D distributions of cloud water and water vapor as simulated by cloud-resolving models. Results for 25 1D codes, which included two line-by-line (LBL) models (clear and overcast only) and four 3D Monte Carlo (MC) photon transport algorithms, were submitted by 22 groups. Benchmark, domain-averaged irradiance profiles were computed by the MC codes. For the clear and overcast cases, all MC estimates of top-of-atmosphere albedo, atmospheric absorptance, and surface absorptance agree with one of the LBL codes to within ±2%. Most 1D codes underestimate atmospheric absorptance by typically 15-25 W m-2 at overhead sun for the standard tropical atmosphere regardless of clouds.Depending on assumptions about unresolved clouds, the 1D codes were partitioned into four genres: (i) horizontal variability, (ii) exact overlap of PPH clouds, (iii) maximum/random overlap of PPH clouds, and (iv) random overlap of PPH clouds. A single MC code was used to establish conditional benchmarks applicable to each genre, and all MC codes were used to establish the full 3D benchmarks. There is a tendency for 1D codes to cluster near their respective conditional benchmarks, though intragenre variances typically exceed those for the clear and overcast cases. The majority of 1D codes fall into the extreme category of maximum/random overlap of PPH clouds and thus generally disagree with full 3D benchmark values. Given the fairly limited scope of these tests and the inability of any one code to perform extremely well for all cases begs the question that a paradigm shift is due for modeling 1D solar fluxes for cloudy atmospheres.
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions
NASA Astrophysics Data System (ADS)
Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin
2017-01-01
We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
A new code SORD for simulation of polarized light scattering in the Earth atmosphere
NASA Astrophysics Data System (ADS)
Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Aliaksandr; Holben, Brent
2016-05-01
We report a new publicly available radiative transfer (RT) code for numerical simulation of polarized light scattering in plane-parallel Earth atmosphere. Using 44 benchmark tests, we prove high accuracy of the new RT code, SORD (Successive ORDers of scattering1, 2). We describe capabilities of SORD and show run time for each test on two different machines. At present, SORD is supposed to work as part of the Aerosol Robotic NETwork3 (AERONET) inversion algorithm. For natural integration with the AERONET software, SORD is coded in Fortran 90/95. The code is available by email request from the corresponding (first) author or from ftp://climate1.gsfc.nasa.gov/skorkin/SORD/ or ftp://maiac.gsfc.nasa.gov/pub/SORD.zip
Status of BOUT fluid turbulence code: improvements and verification
NASA Astrophysics Data System (ADS)
Umansky, M. V.; Lodestro, L. L.; Xu, X. Q.
2006-10-01
BOUT is an electromagnetic fluid turbulence code for tokamak edge plasma [1]. BOUT performs time integration of reduced Braginskii plasma fluid equations, using spatial discretization in realistic geometry and employing a standard ODE integration package PVODE. BOUT has been applied to several tokamak experiments and in some cases calculated spectra of turbulent fluctuations compared favorably to experimental data. On the other hand, the desire to understand better the code results and to gain more confidence in it motivated investing effort in rigorous verification of BOUT. Parallel to the testing the code underwent substantial modification, mainly to improve its readability and tractability of physical terms, with some algorithmic improvements as well. In the verification process, a series of linear and nonlinear test problems was applied to BOUT, targeting different subgroups of physical terms. The tests include reproducing basic electrostatic and electromagnetic plasma modes in simplified geometry, axisymmetric benchmarks against the 2D edge code UEDGE in real divertor geometry, and neutral fluid benchmarks against the hydrodynamic code LCPFCT. After completion of the testing, the new version of the code is being applied to actual tokamak edge turbulence problems, and the results will be presented. [1] X. Q. Xu et al., Contr. Plas. Phys., 36,158 (1998). *Work performed for USDOE by Univ. Calif. LLNL under contract W-7405-ENG-48.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow
NASA Technical Reports Server (NTRS)
Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry
2005-01-01
Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
NASA Technical Reports Server (NTRS)
Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)
2001-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; VanderWijngaart, Rob F.
2003-01-01
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)
2002-01-01
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton
2014-11-11
We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
MHD code using multi graphical processing units: SMAUG+
NASA Astrophysics Data System (ADS)
Gyenge, N.; Griffiths, M. K.; Erdélyi, R.
2018-01-01
This paper introduces the Sheffield Magnetohydrodynamics Algorithm Using GPUs (SMAUG+), an advanced numerical code for solving magnetohydrodynamic (MHD) problems, using multi-GPU systems. Multi-GPU systems facilitate the development of accelerated codes and enable us to investigate larger model sizes and/or more detailed computational domain resolutions. This is a significant advancement over the parent single-GPU MHD code, SMAUG (Griffiths et al., 2015). Here, we demonstrate the validity of the SMAUG + code, describe the parallelisation techniques and investigate performance benchmarks. The initial configuration of the Orszag-Tang vortex simulations are distributed among 4, 16, 64 and 100 GPUs. Furthermore, different simulation box resolutions are applied: 1000 × 1000, 2044 × 2044, 4000 × 4000 and 8000 × 8000 . We also tested the code with the Brio-Wu shock tube simulations with model size of 800 employing up to 10 GPUs. Based on the test results, we observed speed ups and slow downs, depending on the granularity and the communication overhead of certain parallel tasks. The main aim of the code development is to provide massively parallel code without the memory limitation of a single GPU. By using our code, the applied model size could be significantly increased. We demonstrate that we are able to successfully compute numerically valid and large 2D MHD problems.
GRADSPMHD: A parallel MHD code based on the SPH formalism
NASA Astrophysics Data System (ADS)
Vanaverbeke, S.; Keppens, R.; Poedts, S.
2014-03-01
We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
Time-Dependent Simulations of Turbopump Flows
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan; Chan, William; Williams, Robert
2002-01-01
Unsteady flow simulations for RLV (Reusable Launch Vehicles) 2nd Generation baseline turbopump for one and half impeller rotations have been completed by using a 34.3 Million grid points model. MLP (Multi-Level Parallelism) shared memory parallelism has been implemented in INS3D, and benchmarked. Code optimization for cash based platforms will be completed by the end of September 2001. Moving boundary capability is obtained by using DCF module. Scripting capability from CAD (computer aided design) geometry to solution has been developed. Data compression is applied to reduce data size in post processing. Fluid/Structure coupling has been initiated.
NASA Astrophysics Data System (ADS)
Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.
2017-12-01
As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
Present Status and Extensions of the Monte Carlo Performance Benchmark
NASA Astrophysics Data System (ADS)
Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.
2014-06-01
The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
2015-04-08
The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
Python Radiative Transfer Emission code (PyRaTE): non-LTE spectral lines simulations
NASA Astrophysics Data System (ADS)
Tritsis, A.; Yorke, H.; Tassis, K.
2018-05-01
We describe PyRaTE, a new, non-local thermodynamic equilibrium (non-LTE) line radiative transfer code developed specifically for post-processing astrochemical simulations. Population densities are estimated using the escape probability method. When computing the escape probability, the optical depth is calculated towards all directions with density, molecular abundance, temperature and velocity variations all taken into account. A very easy-to-use interface, capable of importing data from simulations outputs performed with all major astrophysical codes, is also developed. The code is written in PYTHON using an "embarrassingly parallel" strategy and can handle all geometries and projection angles. We benchmark the code by comparing our results with those from RADEX (van der Tak et al. 2007) and against analytical solutions and present case studies using hydrochemical simulations. The code will be released for public use.
Parallelization of sequential Gaussian, indicator and direct simulation algorithms
NASA Astrophysics Data System (ADS)
Nunes, Ruben; Almeida, José A.
2010-08-01
Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao
2014-02-01
We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less
Performance analysis of parallel gravitational N-body codes on large GPU clusters
NASA Astrophysics Data System (ADS)
Huang, Si-Yi; Spurzem, Rainer; Berczik, Peter
2016-01-01
We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large Graphics Processing Unit (GPU) clusters, both of which are pioneers in their own fields as well as on certain mutual scales - NBODY6++ and Bonsai. We carry out benchmarks of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200 - 300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of the two codes, finding that in the same cases Bonsai adopts larger time steps as well as larger relative energy errors than NBODY6++, typically ranging from 10 - 50 times larger, depending on the chosen parameters of the codes. Although the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scales, thus allowing the user to choose either one by fine-tuning parameters accordingly.
Performance effects of irregular communications patterns on massively parallel multiprocessors
NASA Technical Reports Server (NTRS)
Saltz, Joel; Petiton, Serge; Berryman, Harry; Rifkin, Adam
1991-01-01
A detailed study of the performance effects of irregular communications patterns on the CM-2 was conducted. The communications capabilities of the CM-2 were characterized under a variety of controlled conditions. In the process of carrying out the performance evaluation, extensive use was made of a parameterized synthetic mesh. In addition, timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns on non-zeroes were performed. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network.
Benchmarking and performance analysis of the CM-2. [SIMD computer
NASA Technical Reports Server (NTRS)
Myers, David W.; Adams, George B., II
1988-01-01
A suite of benchmarking routines testing communication, basic arithmetic operations, and selected kernel algorithms written in LISP and PARIS was developed for the CM-2. Experiment runs are automated via a software framework that sequences individual tests, allowing for unattended overnight operation. Multiple measurements are made and treated statistically to generate well-characterized results from the noisy values given by cm:time. The results obtained provide a comparison with similar, but less extensive, testing done on a CM-1. Tests were chosen to aid the algorithmist in constructing fast, efficient, and correct code on the CM-2, as well as gain insight into what performance criteria are needed when evaluating parallel processing machines.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing
NASA Technical Reports Server (NTRS)
Fricker, David M.
1997-01-01
The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations
NASA Astrophysics Data System (ADS)
Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.
2016-07-01
Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
NASA Astrophysics Data System (ADS)
Stupakov, Gennady; Zhou, Demin
2016-04-01
We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. All our formulas are benchmarked against numerical simulations with the CSRZ computer code.
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, J.C.H.; Lung, H.; Katsumata, Y.
1995-12-01
In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stupakov, Gennady; Zhou, Demin
2016-04-21
We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. All our formulas are benchmarked against numerical simulations with the CSRZ computer code.
NASA Astrophysics Data System (ADS)
Hadade, Ioan; di Mare, Luca
2016-08-01
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS
NASA Technical Reports Server (NTRS)
Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)
1997-01-01
Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Verification of ARES transport code system with TAKEDA benchmarks
NASA Astrophysics Data System (ADS)
Zhang, Liang; Zhang, Bin; Zhang, Penghe; Chen, Mengteng; Zhao, Jingchang; Zhang, Shun; Chen, Yixue
2015-10-01
Neutron transport modeling and simulation are central to many areas of nuclear technology, including reactor core analysis, radiation shielding and radiation detection. In this paper the series of TAKEDA benchmarks are modeled to verify the critical calculation capability of ARES, a discrete ordinates neutral particle transport code system. SALOME platform is coupled with ARES to provide geometry modeling and mesh generation function. The Koch-Baker-Alcouffe parallel sweep algorithm is applied to accelerate the traditional transport calculation process. The results show that the eigenvalues calculated by ARES are in excellent agreement with the reference values presented in NEACRP-L-330, with a difference less than 30 pcm except for the first case of model 3. Additionally, ARES provides accurate fluxes distribution compared to reference values, with a deviation less than 2% for region-averaged fluxes in all cases. All of these confirms the feasibility of ARES-SALOME coupling and demonstrate that ARES has a good performance in critical calculation.
NASA Astrophysics Data System (ADS)
Helm, Anton; Vieira, Jorge; Silva, Luis; Fonseca, Ricardo
2016-10-01
Laser-driven accelerators gained an increased attention over the past decades. Typical modeling techniques for laser wakefield acceleration (LWFA) are based on particle-in-cell (PIC) simulations. PIC simulations, however, are very computationally expensive due to the disparity of the relevant scales ranging from the laser wavelength, in the micrometer range, to the acceleration length, currently beyond the ten centimeter range. To minimize the gap between these despair scales the ponderomotive guiding center (PGC) algorithm is a promising approach. By describing the evolution of the laser pulse envelope separately, only the scales larger than the plasma wavelength are required to be resolved in the PGC algorithm, leading to speedups in several orders of magnitude. Previous work was limited to two dimensions. Here we present the implementation of the 3D version of a PGC solver into the massively parallel, fully relativistic PIC code OSIRIS. We extended the solver to include periodic boundary conditions and parallelization in all spatial dimensions. We present benchmarks for distributed and shared memory parallelization. We also discuss the stability of the PGC solver.
GPU-accelerated simulations of isolated black holes
NASA Astrophysics Data System (ADS)
Lewis, Adam G. M.; Pfeiffer, Harald P.
2018-05-01
We present a port of the numerical relativity code SpEC which is capable of running on NVIDIA GPUs. Since this code must be maintained in parallel with SpEC itself, a primary design consideration is to perform as few explicit code changes as possible. We therefore rely on a hierarchy of automated porting strategies. At the highest level we use TLoops, a C++ library of our design, to automatically emit CUDA code equivalent to tensorial expressions written into C++ source using a syntax similar to analytic calculation. Next, we trace out and cache explicit matrix representations of the numerous linear transformations in the SpEC code, which allows these to be performed on the GPU using pre-existing matrix-multiplication libraries. We port the few remaining important modules by hand. In this paper we detail the specifics of our port, and present benchmarks of it simulating isolated black hole spacetimes on several generations of NVIDIA GPU.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0
NASA Technical Reports Server (NTRS)
Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine
2004-01-01
We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Multidimensional Multiphysics Simulation of TRISO Particle Fuel
DOE Office of Scientific and Technical Information (OSTI.GOV)
J. D. Hales; R. L. Williamson; S. R. Novascone
2013-11-01
Multidimensional multiphysics analysis of TRISO-coated particle fuel using the BISON finite-element based nuclear fuels code is described. The governing equations and material models applicable to particle fuel and implemented in BISON are outlined. Code verification based on a recent IAEA benchmarking exercise is described, and excellant comparisons are reported. Multiple TRISO-coated particles of increasing geometric complexity are considered. It is shown that the code's ability to perform large-scale parallel computations permits application to complex 3D phenomena while very efficient solutions for either 1D spherically symmetric or 2D axisymmetric geometries are straightforward. Additionally, the flexibility to easily include new physical andmore » material models and uncomplicated ability to couple to lower length scale simulations makes BISON a powerful tool for simulation of coated-particle fuel. Future code development activities and potential applications are identified.« less
Time-Dependent Simulations of Incompressible Flow in a Turbopump Using Overset Grid Approach
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan
2001-01-01
This viewgraph presentation provides information on mathematical modelling of the SSME (space shuttle main engine). The unsteady SSME-rig1 start-up procedure from the pump at rest has been initiated by using 34.3 million grid points. The computational model for the SSME-rig1 has been completed. Moving boundary capability is obtained by using DCF module in OVERFLOW-D. MPI (Message Passing Interface)/OpenMP hybrid parallel code has been benchmarked.
PHISICS/RELAP5-3D RESULTS FOR EXERCISES II-1 AND II-2 OF THE OECD/NEA MHTGR-350 BENCHMARK
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strydom, Gerhard
2016-03-01
The Idaho National Laboratory (INL) Advanced Reactor Technologies (ART) High-Temperature Gas-Cooled Reactor (HTGR) Methods group currently leads the Modular High-Temperature Gas-Cooled Reactor (MHTGR) 350 benchmark. The benchmark consists of a set of lattice-depletion, steady-state, and transient problems that can be used by HTGR simulation groups to assess the performance of their code suites. The paper summarizes the results obtained for the first two transient exercises defined for Phase II of the benchmark. The Parallel and Highly Innovative Simulation for INL Code System (PHISICS), coupled with the INL system code RELAP5-3D, was used to generate the results for the Depressurized Conductionmore » Cooldown (DCC) (exercise II-1a) and Pressurized Conduction Cooldown (PCC) (exercise II-2) transients. These exercises require the time-dependent simulation of coupled neutronics and thermal-hydraulics phenomena, and utilize the steady-state solution previously obtained for exercise I-3 of Phase I. This paper also includes a comparison of the benchmark results obtained with a traditional system code “ring” model against a more detailed “block” model that include kinetics feedback on an individual block level and thermal feedbacks on a triangular sub-mesh. The higher spatial fidelity that can be obtained by the block model is illustrated with comparisons of the maximum fuel temperatures, especially in the case of natural convection conditions that dominate the DCC and PCC events. Differences up to 125 K (or 10%) were observed between the ring and block model predictions of the DCC transient, mostly due to the block model’s capability of tracking individual block decay powers and more detailed helium flow distributions. In general, the block model only required DCC and PCC calculation times twice as long as the ring models, and it therefore seems that the additional development and calculation time required for the block model could be worth the gain that can be obtained in the spatial resolution« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stupakov, Gennady; Zhou, Demin
2016-04-21
We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. Furthermore, all our formulas are benchmarked against numerical simulations with the CSRZ computermore » code.« less
Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results
NASA Technical Reports Server (NTRS)
Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)
1994-01-01
In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
NASA Astrophysics Data System (ADS)
Tinti, S.; Tonini, R.
2013-07-01
Nowadays numerical models are a powerful tool in tsunami research since they can be used (i) to reconstruct modern and historical events, (ii) to cast new light on tsunami sources by inverting tsunami data and observations, (iii) to build scenarios in the frame of tsunami mitigation plans, and (iv) to produce forecasts of tsunami impact and inundation in systems of early warning. In parallel with the general recognition of the importance of numerical tsunami simulations, the demand has grown for reliable tsunami codes, validated through tests agreed upon by the tsunami community. This paper presents the tsunami code UBO-TSUFD that has been developed at the University of Bologna, Italy, and that solves the non-linear shallow water (NSW) equations in a Cartesian frame, with inclusion of bottom friction and exclusion of the Coriolis force, by means of a leapfrog (LF) finite-difference scheme on a staggered grid and that accounts for moving boundaries to compute sea inundation and withdrawal at the coast. Results of UBO-TSUFD applied to four classical benchmark problems are shown: two benchmarks are based on analytical solutions, one on a plane wave propagating on a flat channel with a constant slope beach; and one on a laboratory experiment. The code is proven to perform very satisfactorily since it reproduces quite well the benchmark theoretical and experimental data. Further, the code is applied to a realistic tsunami case: a scenario of a tsunami threatening the coasts of eastern Sicily, Italy, is defined and discussed based on the historical tsunami of 11 January 1693, i.e. one of the most severe events in the Italian history.
NASA Astrophysics Data System (ADS)
Papior, Nick; Lorente, Nicolás; Frederiksen, Thomas; García, Alberto; Brandbyge, Mads
2017-03-01
We present novel methods implemented within the non-equilibrium Green function code (NEGF) TRANSIESTA based on density functional theory (DFT). Our flexible, next-generation DFT-NEGF code handles devices with one or multiple electrodes (Ne ≥ 1) with individual chemical potentials and electronic temperatures. We describe its novel methods for electrostatic gating, contour optimizations, and assertion of charge conservation, as well as the newly implemented algorithms for optimized and scalable matrix inversion, performance-critical pivoting, and hybrid parallelization. Additionally, a generic NEGF "post-processing" code (TBTRANS/PHTRANS) for electron and phonon transport is presented with several novelties such as Hamiltonian interpolations, Ne ≥ 1 electrode capability, bond-currents, generalized interface for user-defined tight-binding transport, transmission projection using eigenstates of a projected Hamiltonian, and fast inversion algorithms for large-scale simulations easily exceeding 106 atoms on workstation computers. The new features of both codes are demonstrated and bench-marked for relevant test systems.
Proceeding On : Parallelisation Of Critical Code Passages In PHOENIX/3D
NASA Astrophysics Data System (ADS)
Arkenberg, Mario; Wichert, Viktoria; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach here, by introducing especially adapted, parallel numerical methods and correspondingly parallelising time critical code passages. In the following, we present our work on PHOENIX/3D.While parallelisation is generally worthwhile, it requires revision of time-consuming subroutines with respect to separability of localised data and variables in order to determine the optimal approach. Of course, the same applies to the code structure. The importance of this ongoing work can be showcased by recently derived benchmark results, which were generated utilis- ing MPI and OpenMP. Furthermore, the need for a careful and thorough choice of an adequate, machine dependent setup is discussed.
The MCNP6 Analytic Criticality Benchmark Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Forrest B.
2016-06-16
Analytical benchmarks provide an invaluable tool for verifying computer codes used to simulate neutron transport. Several collections of analytical benchmark problems [1-4] are used routinely in the verification of production Monte Carlo codes such as MCNP® [5,6]. Verification of a computer code is a necessary prerequisite to the more complex validation process. The verification process confirms that a code performs its intended functions correctly. The validation process involves determining the absolute accuracy of code results vs. nature. In typical validations, results are computed for a set of benchmark experiments using a particular methodology (code, cross-section data with uncertainties, and modeling)more » and compared to the measured results from the set of benchmark experiments. The validation process determines bias, bias uncertainty, and possibly additional margins. Verification is generally performed by the code developers, while validation is generally performed by code users for a particular application space. The VERIFICATION_KEFF suite of criticality problems [1,2] was originally a set of 75 criticality problems found in the literature for which exact analytical solutions are available. Even though the spatial and energy detail is necessarily limited in analytical benchmarks, typically to a few regions or energy groups, the exact solutions obtained can be used to verify that the basic algorithms, mathematics, and methods used in complex production codes perform correctly. The present work has focused on revisiting this benchmark suite. A thorough review of the problems resulted in discarding some of them as not suitable for MCNP benchmarking. For the remaining problems, many of them were reformulated to permit execution in either multigroup mode or in the normal continuous-energy mode for MCNP. Execution of the benchmarks in continuous-energy mode provides a significant advance to MCNP verification methods.« less
Modification and benchmarking of MCNP for low-energy tungsten spectra.
Mercier, J R; Kopp, D T; McDavid, W D; Dove, S B; Lancaster, J L; Tucker, D M
2000-12-01
The MCNP Monte Carlo radiation transport code was modified for diagnostic medical physics applications. In particular, the modified code was thoroughly benchmarked for the production of polychromatic tungsten x-ray spectra in the 30-150 kV range. Validating the modified code for coupled electron-photon transport with benchmark spectra was supplemented with independent electron-only and photon-only transport benchmarks. Major revisions to the code included the proper treatment of characteristic K x-ray production and scoring, new impact ionization cross sections, and new bremsstrahlung cross sections. Minor revisions included updated photon cross sections, electron-electron bremsstrahlung production, and K x-ray yield. The modified MCNP code is benchmarked to electron backscatter factors, x-ray spectra production, and primary and scatter photon transport.
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox
NASA Astrophysics Data System (ADS)
Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas
In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
Block-Parallel Data Analysis with DIY2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morozov, Dmitriy; Peterka, Tom
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Performance Improvements of the CYCOFOS Flow Model
NASA Astrophysics Data System (ADS)
Radhakrishnan, Hari; Moulitsas, Irene; Syrakos, Alexandros; Zodiatis, George; Nikolaides, Andreas; Hayes, Daniel; Georgiou, Georgios C.
2013-04-01
The CYCOFOS-Cyprus Coastal Ocean Forecasting and Observing System has been operational since early 2002, providing daily sea current, temperature, salinity and sea level forecasting data for the next 4 and 10 days to end-users in the Levantine Basin, necessary for operational application in marine safety, particularly concerning oil spills and floating objects predictions. CYCOFOS flow model, similar to most of the coastal and sub-regional operational hydrodynamic forecasting systems of the MONGOOS-Mediterranean Oceanographic Network for Global Ocean Observing System is based on the POM-Princeton Ocean Model. CYCOFOS is nested with the MyOcean Mediterranean regional forecasting data and with SKIRON and ECMWF for surface forcing. The increasing demand for higher and higher resolution data to meet coastal and offshore downstream applications motivated the parallelization of the CYCOFOS POM model. This development was carried out in the frame of the IPcycofos project, funded by the Cyprus Research Promotion Foundation. The parallel processing provides a viable solution to satisfy these demands without sacrificing accuracy or omitting any physical phenomena. Prior to IPcycofos project, there are been several attempts to parallelise the POM, as for example the MP-POM. The existing parallel code models rely on the use of specific outdated hardware architectures and associated software. The objective of the IPcycofos project is to produce an operational parallel version of the CYCOFOS POM code that can replicate the results of the serial version of the POM code used in CYCOFOS. The parallelization of the CYCOFOS POM model use Message Passing Interface-MPI, implemented on commodity computing clusters running open source software and not depending on any specialized vendor hardware. The parallel CYCOFOS POM code constructed in a modular fashion, allowing a fast re-locatable downscaled implementation. The MPI takes advantage of the Cartesian nature of the POM mesh, and use the built-in functionality of MPI routines to split the mesh, using a weighting scheme, along longitude and latitude among the processors. Each server processor work on the model based on domain decomposition techniques. The new parallel CYCOFOS POM code has been benchmarked against the serial POM version of CYCOFOS for speed, accuracy, and resolution and the results are more than satisfactory. With a higher resolution CYCOFOS Levantine model domain the forecasts need much less time than the serial CYCOFOS POM coarser version, both with identical accuracy.
High performance Python for direct numerical simulations of turbulent flows
NASA Astrophysics Data System (ADS)
Mortensen, Mikael; Langtangen, Hans Petter
2016-06-01
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform.
Genetically improved BarraCUDA.
Langdon, W B; Lam, Brian Yee Hong
2017-01-01
BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry
1999-01-01
As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Multiprocessor smalltalk: Implementation, performance, and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pallas, J.I.
1990-01-01
Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.
2000-01-01
Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
Parallelization Issues and Particle-In Codes.
NASA Astrophysics Data System (ADS)
Elster, Anne Cathrine
1994-01-01
"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
The Scalable Checkpoint/Restart Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A.
The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
Implementation of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Performance and Scalability of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
Experimental Mapping and Benchmarking of Magnetic Field Codes on the LHD Ion Accelerator
NASA Astrophysics Data System (ADS)
Chitarin, G.; Agostinetti, P.; Gallo, A.; Marconato, N.; Nakano, H.; Serianni, G.; Takeiri, Y.; Tsumori, K.
2011-09-01
For the validation of the numerical models used for the design of the Neutral Beam Test Facility for ITER in Padua [1], an experimental benchmark against a full-size device has been sought. The LHD BL2 injector [2] has been chosen as a first benchmark, because the BL2 Negative Ion Source and Beam Accelerator are geometrically similar to SPIDER, even though BL2 does not include current bars and ferromagnetic materials. A comprehensive 3D magnetic field model of the LHD BL2 device has been developed based on the same assumptions used for SPIDER. In parallel, a detailed experimental magnetic map of the BL2 device has been obtained using a suitably designed 3D adjustable structure for the fine positioning of the magnetic sensors inside 27 of the 770 beamlet apertures. The calculated values have been compared to the experimental data. The work has confirmed the quality of the numerical model, and has also provided useful information on the magnetic non-uniformities due to the edge effects and to the tolerance on permanent magnet remanence.
Experimental Mapping and Benchmarking of Magnetic Field Codes on the LHD Ion Accelerator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chitarin, G.; University of Padova, Dept. of Management and Engineering, strad. S. Nicola, 36100 Vicenza; Agostinetti, P.
2011-09-26
For the validation of the numerical models used for the design of the Neutral Beam Test Facility for ITER in Padua [1], an experimental benchmark against a full-size device has been sought. The LHD BL2 injector [2] has been chosen as a first benchmark, because the BL2 Negative Ion Source and Beam Accelerator are geometrically similar to SPIDER, even though BL2 does not include current bars and ferromagnetic materials. A comprehensive 3D magnetic field model of the LHD BL2 device has been developed based on the same assumptions used for SPIDER. In parallel, a detailed experimental magnetic map of themore » BL2 device has been obtained using a suitably designed 3D adjustable structure for the fine positioning of the magnetic sensors inside 27 of the 770 beamlet apertures. The calculated values have been compared to the experimental data. The work has confirmed the quality of the numerical model, and has also provided useful information on the magnetic non-uniformities due to the edge effects and to the tolerance on permanent magnet remanence.« less
NASA Astrophysics Data System (ADS)
Rivier, Leonard Gilles
Using an efficient parallel code solving the primitive equations of atmospheric dynamics, the jet structure of a Jupiter like atmosphere is modeled. In the first part of this thesis, a parallel spectral code solving both the shallow water equations and the multi-level primitive equations of atmospheric dynamics is built. The implementation of this code called BOB is done so that it runs effectively on an inexpensive cluster of workstations. A one dimensional decomposition and transposition method insuring load balancing among processes is used. The Legendre transform is cache-blocked. A "compute on the fly" of the Legendre polynomials used in the spectral method produces a lower memory footprint and enables high resolution runs on relatively small memory machines. Performance studies are done using a cluster of workstations located at the National Center for Atmospheric Research (NCAR). BOB performances are compared to the parallel benchmark code PSTSWM and the dynamical core of NCAR's CCM3.6.6. In both cases, the comparison favors BOB. In the second part of this thesis, the primitive equation version of the code described in part I is used to study the formation of organized zonal jets and equatorial superrotation in a planetary atmosphere where the parameters are chosen to best model the upper atmosphere of Jupiter. Two levels are used in the vertical and only large scale forcing is present. The model is forced towards a baroclinically unstable flow, so that eddies are generated by baroclinic instability. We consider several types of forcing, acting on either the temperature or the momentum field. We show that only under very specific parametric conditions, zonally elongated structures form and persist resembling the jet structure observed near the cloud level top (1 bar) on Jupiter. We also study the effect of an equatorial heat source, meant to be a crude representation of the effect of the deep convective planetary interior onto the outer atmospheric layer. We show that such heat forcing is able to produce strong equatorial superrotating winds, one of the most striking feature of the Jovian circulation.
Benchmarking NNWSI flow and transport codes: COVE 1 results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hayden, N.K.
1985-06-01
The code verification (COVE) activity of the Nevada Nuclear Waste Storage Investigations (NNWSI) Project is the first step in certification of flow and transport codes used for NNWSI performance assessments of a geologic repository for disposing of high-level radioactive wastes. The goals of the COVE activity are (1) to demonstrate and compare the numerical accuracy and sensitivity of certain codes, (2) to identify and resolve problems in running typical NNWSI performance assessment calculations, and (3) to evaluate computer requirements for running the codes. This report describes the work done for COVE 1, the first step in benchmarking some of themore » codes. Isothermal calculations for the COVE 1 benchmarking have been completed using the hydrologic flow codes SAGUARO, TRUST, and GWVIP; the radionuclide transport codes FEMTRAN and TRUMP; and the coupled flow and transport code TRACR3D. This report presents the results of three cases of the benchmarking problem solved for COVE 1, a comparison of the results, questions raised regarding sensitivities to modeling techniques, and conclusions drawn regarding the status and numerical sensitivities of the codes. 30 refs.« less
Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors
NASA Astrophysics Data System (ADS)
Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.
2016-05-01
This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob
2003-01-01
The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
SDA 7: A modular and parallel implementation of the simulation of diffusional association software
Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan
2015-01-01
The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Implementation of NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry
2000-01-01
A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS
NASA Technical Reports Server (NTRS)
Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Tucker, Deanne (Technical Monitor)
1994-01-01
We present views and analysis of the execution of several PVM codes for Computational Fluid Dynamics on a network of Sparcstations, including (a) NAS Parallel benchmarks CG and MG (White, Alund and Sunderam 1993); (b) a multi-partitioning algorithm for NAS Parallel Benchmark SP (Wijngaart 1993); and (c) an overset grid flowsolver (Smith 1993). These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains (a) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (b) Monitor, a library of run-time trace-collection routines; (c) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (d) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses X11R5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (a) the impact of long message latencies; (b) the impact of multiprogramming overheads and associated load imbalance; (c) cache and virtual-memory effects; and (4significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (a) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets; and (b) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Revisiting Yasinsky and Henry`s benchmark using modern nodal codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feltus, M.A.; Becker, M.W.
1995-12-31
The numerical experiments analyzed by Yasinsky and Henry are quite trivial by comparison with today`s standards because they used the finite difference code WIGLE for their benchmark. Also, this problem is a simple slab (one-dimensional) case with no feedback mechanisms. This research attempts to obtain STAR (Ref. 2) and NEM (Ref. 3) code results in order to produce a more modern kinetics benchmark with results comparable WIGLE.
VVER-440 and VVER-1000 reactor dosimetry benchmark - BUGLE-96 versus ALPAN VII.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duo, J. I.
2011-07-01
Document available in abstract form only, full text of document follows: Analytical results of the vodo-vodyanoi energetichesky reactor-(VVER-) 440 and VVER-1000 reactor dosimetry benchmarks developed from engineering mockups at the Nuclear Research Inst. Rez LR-0 reactor are discussed. These benchmarks provide accurate determination of radiation field parameters in the vicinity and over the thickness of the reactor pressure vessel. Measurements are compared to calculated results with two sets of tools: TORT discrete ordinates code and BUGLE-96 cross-section library versus the newly Westinghouse-developed RAPTOR-M3G and ALPAN VII.0. The parallel code RAPTOR-M3G enables detailed neutron distributions in energy and space in reducedmore » computational time. ALPAN VII.0 cross-section library is based on ENDF/B-VII.0 and is designed for reactor dosimetry applications. It uses a unique broad group structure to enhance resolution in thermal-neutron-energy range compared to other analogous libraries. The comparison of fast neutron (E > 0.5 MeV) results shows good agreement (within 10%) between BUGLE-96 and ALPAN VII.O libraries. Furthermore, the results compare well with analogous results of participants of the REDOS program (2005). Finally, the analytical results for fast neutrons agree within 15% with the measurements, for most locations in all three mockups. In general, however, the analytical results underestimate the attenuation through the reactor pressure vessel thickness compared to the measurements. (authors)« less
Reliability of cause of death coding: an international comparison.
Antini, Carmen; Rajs, Danuta; Muñoz-Quezada, María Teresa; Mondaca, Boris Andrés Lucero; Heiss, Gerardo
2015-07-01
This study evaluates the agreement of nosologic coding of cardiovascular causes of death between a Chilean coder and one in the United States, in a stratified random sample of death certificates of persons aged ≥ 60, issued in 2008 in the Valparaíso and Metropolitan regions, Chile. All causes of death were converted to ICD-10 codes in parallel by both coders. Concordance was analyzed with inter-coder agreement and Cohen's kappa coefficient by level of specification ICD-10 code for the underlying cause and the total causes of death coding. Inter-coder agreement was 76.4% for all causes of death and 80.6% for the underlying cause (agreement at the four-digit level), with differences by the level of specification of the ICD-10 code, by line of the death certificate, and by number of causes of death per certificate. Cohen's kappa coefficient was 0.76 (95%CI: 0.68-0.84) for the underlying cause and 0.75 (95%CI: 0.74-0.77) for the total causes of death. In conclusion, causes of death coding and inter-coder agreement for cardiovascular diseases in two regions of Chile are comparable to an external benchmark and with reports from other countries.
: A Scalable and Transparent System for Simulating MPI Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2010-01-01
is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
Benchmarking Ada tasking on tightly coupled multiprocessor architectures
NASA Technical Reports Server (NTRS)
Collard, Philippe; Goforth, Andre; Marquardt, Matthew
1989-01-01
The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.
Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming
2017-06-16
Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles
2004-07-15
Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
NAS Parallel Benchmark Results 11-96. 1.0
NASA Technical Reports Server (NTRS)
Bailey, David H.; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
The NAS Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a "pencil and paper" fashion. In other words, the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. These results represent the best results that have been reported to us by the vendors for the specific 3 systems listed. In this report, we present new NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, SGI Origin200, and SGI Origin2000. We also report High Performance Fortran (HPF) based NPB results for IBM SP2 Wide Nodes, HP/Convex Exemplar SPP2000, and SGI/CRAY T3D. These results have been submitted by Applied Parallel Research (APR) and Portland Group Inc. (PGI). We also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
Benchmarking NWP Kernels on Multi- and Many-core Processors
NASA Astrophysics Data System (ADS)
Michalakes, J.; Vachharajani, M.
2008-12-01
Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code.
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling. PMID:28701946
SU-D-BRD-03: A Gateway for GPU Computing in Cancer Radiotherapy Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jia, X; Folkerts, M; Shi, F
Purpose: Graphics Processing Unit (GPU) has become increasingly important in radiotherapy. However, it is still difficult for general clinical researchers to access GPU codes developed by other researchers, and for developers to objectively benchmark their codes. Moreover, it is quite often to see repeated efforts spent on developing low-quality GPU codes. The goal of this project is to establish an infrastructure for testing GPU codes, cross comparing them, and facilitating code distributions in radiotherapy community. Methods: We developed a system called Gateway for GPU Computing in Cancer Radiotherapy Research (GCR2). A number of GPU codes developed by our group andmore » other developers can be accessed via a web interface. To use the services, researchers first upload their test data or use the standard data provided by our system. Then they can select the GPU device on which the code will be executed. Our system offers all mainstream GPU hardware for code benchmarking purpose. After the code running is complete, the system automatically summarizes and displays the computing results. We also released a SDK to allow the developers to build their own algorithm implementation and submit their binary codes to the system. The submitted code is then systematically benchmarked using a variety of GPU hardware and representative data provided by our system. The developers can also compare their codes with others and generate benchmarking reports. Results: It is found that the developed system is fully functioning. Through a user-friendly web interface, researchers are able to test various GPU codes. Developers also benefit from this platform by comprehensively benchmarking their codes on various GPU platforms and representative clinical data sets. Conclusion: We have developed an open platform allowing the clinical researchers and developers to access the GPUs and GPU codes. This development will facilitate the utilization of GPU in radiation therapy field.« less
Accelerating cardiac bidomain simulations using graphics processing units.
Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G
2012-08-01
Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.
Accelerating Cardiac Bidomain Simulations Using Graphics Processing Units
Neic, Aurel; Liebmann, Manfred; Hoetzl, Elena; Mitchell, Lawrence; Vigmond, Edward J.; Haase, Gundolf
2013-01-01
Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6–20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20GPUs, 476 CPU cores were required on a national supercomputing facility. PMID:22692867
Injector Design Tool Improvements: User's manual for FDNS V.4.5
NASA Technical Reports Server (NTRS)
Chen, Yen-Sen; Shang, Huan-Min; Wei, Hong; Liu, Jiwen
1998-01-01
The major emphasis of the current effort is in the development and validation of an efficient parallel machine computational model, based on the FDNS code, to analyze the fluid dynamics of a wide variety of liquid jet configurations for general liquid rocket engine injection system applications. This model includes physical models for droplet atomization, breakup/coalescence, evaporation, turbulence mixing and gas-phase combustion. Benchmark validation cases for liquid rocket engine chamber combustion conditions will be performed for model validation purpose. Test cases may include shear coaxial, swirl coaxial and impinging injection systems with combinations LOXIH2 or LOXISP-1 propellant injector elements used in rocket engine designs. As a final goal of this project, a well tested parallel CFD performance methodology together with a user's operation description in a final technical report will be reported at the end of the proposed research effort.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Peiyuan; Brown, Timothy; Fullmer, William D.
Five benchmark problems are developed and simulated with the computational fluid dynamics and discrete element model code MFiX. The benchmark problems span dilute and dense regimes, consider statistically homogeneous and inhomogeneous (both clusters and bubbles) particle concentrations and a range of particle and fluid dynamic computational loads. Several variations of the benchmark problems are also discussed to extend the computational phase space to cover granular (particles only), bidisperse and heat transfer cases. A weak scaling analysis is performed for each benchmark problem and, in most cases, the scalability of the code appears reasonable up to approx. 103 cores. Profiling ofmore » the benchmark problems indicate that the most substantial computational time is being spent on particle-particle force calculations, drag force calculations and interpolating between discrete particle and continuum fields. Hardware performance analysis was also carried out showing significant Level 2 cache miss ratios and a rather low degree of vectorization. These results are intended to serve as a baseline for future developments to the code as well as a preliminary indicator of where to best focus performance optimizations.« less
Viriato: a Fourier-Hermite spectral code for strongly magnetised fluid-kinetic plasma dynamics
NASA Astrophysics Data System (ADS)
Loureiro, Nuno; Dorland, William; Fazendeiro, Luis; Kanekar, Anjor; Mallet, Alfred; Zocco, Alessandro
2015-11-01
We report on the algorithms and numerical methods used in Viriato, a novel fluid-kinetic code that solves two distinct sets of equations: (i) the Kinetic Reduced Electron Heating Model equations [Zocco & Schekochihin, 2011] and (ii) the kinetic reduced MHD (KRMHD) equations [Schekochihin et al., 2009]. Two main applications of these equations are magnetised (Alfvnénic) plasma turbulence and magnetic reconnection. Viriato uses operator splitting to separate the dynamics parallel and perpendicular to the ambient magnetic field (assumed strong). Along the magnetic field, Viriato allows for either a second-order accurate MacCormack method or, for higher accuracy, a spectral-like scheme. Perpendicular to the field Viriato is pseudo-spectral, and the time integration is performed by means of an iterative predictor-corrector scheme. In addition, a distinctive feature of Viriato is its spectral representation of the parallel velocity-space dependence, achieved by means of a Hermite representation of the perturbed distribution function. A series of linear and nonlinear benchmarks and tests are presented, with focus on 3D decaying kinetic turbulence. Work partially supported by Fundação para a Ciência e Tecnologia via Grants UID/FIS/50010/2013 and IF/00530/2013.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Busby, L.
This is an adaptation of the pre-existing Scimark benchmark code to a variety of Python and Lua implementations. It also measures performance of the Fparser expression parser and C and C++ code on a variety of simple scientific expressions.
Verification and benchmark testing of the NUFT computer code
NASA Astrophysics Data System (ADS)
Lee, K. H.; Nitao, J. J.; Kulshrestha, A.
1993-10-01
This interim report presents results of work completed in the ongoing verification and benchmark testing of the NUFT (Nonisothermal Unsaturated-saturated Flow and Transport) computer code. NUFT is a suite of multiphase, multicomponent models for numerical solution of thermal and isothermal flow and transport in porous media, with application to subsurface contaminant transport problems. The code simulates the coupled transport of heat, fluids, and chemical components, including volatile organic compounds. Grid systems may be cartesian or cylindrical, with one-, two-, or fully three-dimensional configurations possible. In this initial phase of testing, the NUFT code was used to solve seven one-dimensional unsaturated flow and heat transfer problems. Three verification and four benchmarking problems were solved. In the verification testing, excellent agreement was observed between NUFT results and the analytical or quasianalytical solutions. In the benchmark testing, results of code intercomparison were very satisfactory. From these testing results, it is concluded that the NUFT code is ready for application to field and laboratory problems similar to those addressed here. Multidimensional problems, including those dealing with chemical transport, will be addressed in a subsequent report.
Benchmarking a Visual-Basic based multi-component one-dimensional reactive transport modeling tool
NASA Astrophysics Data System (ADS)
Torlapati, Jagadish; Prabhakar Clement, T.
2013-01-01
We present the details of a comprehensive numerical modeling tool, RT1D, which can be used for simulating biochemical and geochemical reactive transport problems. The code can be run within the standard Microsoft EXCEL Visual Basic platform, and it does not require any additional software tools. The code can be easily adapted by others for simulating different types of laboratory-scale reactive transport experiments. We illustrate the capabilities of the tool by solving five benchmark problems with varying levels of reaction complexity. These literature-derived benchmarks are used to highlight the versatility of the code for solving a variety of practical reactive transport problems. The benchmarks are described in detail to provide a comprehensive database, which can be used by model developers to test other numerical codes. The VBA code presented in the study is a practical tool that can be used by laboratory researchers for analyzing both batch and column datasets within an EXCEL platform.
Fast quantum Monte Carlo on a GPU
NASA Astrophysics Data System (ADS)
Lutsyshyn, Y.
2015-02-01
We present a scheme for the parallelization of quantum Monte Carlo method on graphical processing units, focusing on variational Monte Carlo simulation of bosonic systems. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent utilization of the accelerator. The CUDA code is provided along with a package that simulates liquid helium-4. The program was benchmarked on several models of Nvidia GPU, including Fermi GTX560 and M2090, and the Kepler architecture K20 GPU. Special optimization was developed for the Kepler cards, including placement of data structures in the register space of the Kepler GPUs. Kepler-specific optimization is discussed.
RISC Processors and High Performance Computing
NASA Technical Reports Server (NTRS)
Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.
Benchmarking of Neutron Production of Heavy-Ion Transport Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Remec, Igor; Ronningen, Reginald M.; Heilbronn, Lawrence
Accurate prediction of radiation fields generated by heavy ion interactions is important in medical applications, space missions, and in design and operation of rare isotope research facilities. In recent years, several well-established computer codes in widespread use for particle and radiation transport calculations have been equipped with the capability to simulate heavy ion transport and interactions. To assess and validate these capabilities, we performed simulations of a series of benchmark-quality heavy ion experiments with the computer codes FLUKA, MARS15, MCNPX, and PHITS. We focus on the comparisons of secondary neutron production. Results are encouraging; however, further improvements in models andmore » codes and additional benchmarking are required.« less
Benchmarking of Heavy Ion Transport Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Remec, Igor; Ronningen, Reginald M.; Heilbronn, Lawrence
Accurate prediction of radiation fields generated by heavy ion interactions is important in medical applications, space missions, and in designing and operation of rare isotope research facilities. In recent years, several well-established computer codes in widespread use for particle and radiation transport calculations have been equipped with the capability to simulate heavy ion transport and interactions. To assess and validate these capabilities, we performed simulations of a series of benchmark-quality heavy ion experiments with the computer codes FLUKA, MARS15, MCNPX, and PHITS. We focus on the comparisons of secondary neutron production. Results are encouraging; however, further improvements in models andmore » codes and additional benchmarking are required.« less
A high performance scientific cloud computing environment for materials simulations
NASA Astrophysics Data System (ADS)
Jorissen, K.; Vila, F. D.; Rehr, J. J.
2012-09-01
We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.
Di Tommaso, Paolo; Orobitg, Miquel; Guirado, Fernando; Cores, Fernado; Espinosa, Toni; Notredame, Cedric
2010-08-01
We present the first parallel implementation of the T-Coffee consistency-based multiple aligner. We benchmark it on the Amazon Elastic Cloud (EC2) and show that the parallelization procedure is reasonably effective. We also conclude that for a web server with moderate usage (10K hits/month) the cloud provides a cost-effective alternative to in-house deployment. T-Coffee is a freeware open source package available from http://www.tcoffee.org/homepage.html
NASA Astrophysics Data System (ADS)
Pierazzo, E.; Artemieva, N.; Asphaug, E.; Baldwin, E. C.; Cazamias, J.; Coker, R.; Collins, G. S.; Crawford, D. A.; Davison, T.; Elbeshausen, D.; Holsapple, K. A.; Housen, K. R.; Korycansky, D. G.; Wünnemann, K.
2008-12-01
Over the last few decades, rapid improvement of computer capabilities has allowed impact cratering to be modeled with increasing complexity and realism, and has paved the way for a new era of numerical modeling of the impact process, including full, three-dimensional (3D) simulations. When properly benchmarked and validated against observation, computer models offer a powerful tool for understanding the mechanics of impact crater formation. This work presents results from the first phase of a project to benchmark and validate shock codes. A variety of 2D and 3D codes were used in this study, from commercial products like AUTODYN, to codes developed within the scientific community like SOVA, SPH, ZEUS-MP, iSALE, and codes developed at U.S. National Laboratories like CTH, SAGE/RAGE, and ALE3D. Benchmark calculations of shock wave propagation in aluminum-on-aluminum impacts were performed to examine the agreement between codes for simple idealized problems. The benchmark simulations show that variability in code results is to be expected due to differences in the underlying solution algorithm of each code, artificial stability parameters, spatial and temporal resolution, and material models. Overall, the inter-code variability in peak shock pressure as a function of distance is around 10 to 20%. In general, if the impactor is resolved by at least 20 cells across its radius, the underestimation of peak shock pressure due to spatial resolution is less than 10%. In addition to the benchmark tests, three validation tests were performed to examine the ability of the codes to reproduce the time evolution of crater radius and depth observed in vertical laboratory impacts in water and two well-characterized aluminum alloys. Results from these calculations are in good agreement with experiments. There appears to be a general tendency of shock physics codes to underestimate the radius of the forming crater. Overall, the discrepancy between the model and experiment results is between 10 and 20%, similar to the inter-code variability.
NASA Astrophysics Data System (ADS)
Bonfiglio, D.; Chacón, L.; Cappello, S.
2010-08-01
With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacón, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code in cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonfiglio, Daniele; Chacon, Luis; Cappello, Susanna
2010-01-01
With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacon, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code inmore » cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.« less
Benchmarking of neutron production of heavy-ion transport codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Remec, I.; Ronningen, R. M.; Heilbronn, L.
Document available in abstract form only, full text of document follows: Accurate prediction of radiation fields generated by heavy ion interactions is important in medical applications, space missions, and in design and operation of rare isotope research facilities. In recent years, several well-established computer codes in widespread use for particle and radiation transport calculations have been equipped with the capability to simulate heavy ion transport and interactions. To assess and validate these capabilities, we performed simulations of a series of benchmark-quality heavy ion experiments with the computer codes FLUKA, MARS15, MCNPX, and PHITS. We focus on the comparisons of secondarymore » neutron production. Results are encouraging; however, further improvements in models and codes and additional benchmarking are required. (authors)« less
Supercomputer simulations of structure formation in the Universe
NASA Astrophysics Data System (ADS)
Ishiyama, Tomoaki
2017-06-01
We describe the implementation and performance results of our massively parallel MPI†/OpenMP‡ hybrid TreePM code for large-scale cosmological N-body simulations. For domain decomposition, a recursive multi-section algorithm is used and the size of domains are automatically set so that the total calculation time is the same for all processes. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. For two trillion particles benchmark simulation, the average performance on the fullsystem of K computer (82,944 nodes, the total number of core is 663,552) is 5.8 Pflops, which corresponds to 55% of the peak speed.
Ray-tracing 3D dust radiative transfer with DART-Ray: code upgrade and public release
NASA Astrophysics Data System (ADS)
Natale, Giovanni; Popescu, Cristina C.; Tuffs, Richard J.; Clarke, Adam J.; Debattista, Victor P.; Fischera, Jörg; Pasetto, Stefano; Rushton, Mark; Thirlwall, Jordan J.
2017-11-01
We present an extensively updated version of the purely ray-tracing 3D dust radiation transfer code DART-Ray. The new version includes five major upgrades: 1) a series of optimizations for the ray-angular density and the scattered radiation source function; 2) the implementation of several data and task parallelizations using hybrid MPI+OpenMP schemes; 3) the inclusion of dust self-heating; 4) the ability to produce surface brightness maps for observers within the models in HEALPix format; 5) the possibility to set the expected numerical accuracy already at the start of the calculation. We tested the updated code with benchmark models where the dust self-heating is not negligible. Furthermore, we performed a study of the extent of the source influence volumes, using galaxy models, which are critical in determining the efficiency of the DART-Ray algorithm. The new code is publicly available, documented for both users and developers, and accompanied by several programmes to create input grids for different model geometries and to import the results of N-body and SPH simulations. These programmes can be easily adapted to different input geometries, and for different dust models or stellar emission libraries.
Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; ...
2017-10-04
The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overallmore » memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.« less
PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules
NASA Astrophysics Data System (ADS)
Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.
2018-03-01
The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations
NASA Astrophysics Data System (ADS)
Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.
2012-09-01
Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
NASA Astrophysics Data System (ADS)
Guan, Zhen; Pekurovsky, Dmitry; Luce, Jason; Thornton, Katsuyo; Lowengrub, John
The structural phase field crystal (XPFC) model can be used to model grain growth in polycrystalline materials at diffusive time-scales while maintaining atomic scale resolution. However, the governing equation of the XPFC model is an integral-partial-differential-equation (IPDE), which poses challenges in implementation onto high performance computing (HPC) platforms. In collaboration with the XSEDE Extended Collaborative Support Service, we developed a distributed memory HPC solver for the XPFC model, which combines parallel multigrid and P3DFFT. The performance benchmarking on the Stampede supercomputer indicates near linear strong and weak scaling for both multigrid and transfer time between multigrid and FFT modules up to 1024 cores. Scalability of the FFT module begins to decline at 128 cores, but it is sufficient for the type of problem we will be examining. We have demonstrated simulations using 1024 cores, and we expect to achieve 4096 cores and beyond. Ongoing work involves optimization of MPI/OpenMP-based codes for the Intel KNL Many-Core Architecture. This optimizes the code for coming pre-exascale systems, in particular many-core systems such as Stampede 2.0 and Cori 2 at NERSC, without sacrificing efficiency on other general HPC systems.
RETRANO3 benchmarks for Beaver Valley plant transients and FSAR analyses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beaumont, E.T.; Feltus, M.A.
1993-01-01
Any best-estimate code (e.g., RETRANO3) results must be validated against plant data and final safety analysis report (FSAR) predictions. The need for two independent means of benchmarking is necessary to ensure that the results were not biased toward a particular data set and to have a certain degree of accuracy. The code results need to be compared with previous results and show improvements over previous code results. Ideally, the two best means of benchmarking a thermal hydraulics code are comparing results from previous versions of the same code along with actual plant data. This paper describes RETRAN03 benchmarks against RETRAN02more » results, actual plant data, and FSAR predictions. RETRAN03, the Electric Power Research Institute's latest version of the RETRAN thermal-hydraulic analysis codes, offers several upgrades over its predecessor, RETRAN02 Mod5. RETRAN03 can use either implicit or semi-implicit numerics, whereas RETRAN02 Mod5 uses only semi-implicit numerics. Another major upgrade deals with slip model options. RETRAN03 added several new models, including a five-equation model for more accurate modeling of two-phase flow. RETPAN02 Mod5 should give similar but slightly more conservative results than RETRAN03 when executed with RETRAN02 Mod5 options.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H.
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less
A benchmark study of scoring methods for non-coding mutations.
Drubay, Damien; Gautheret, Daniel; Michiels, Stefan
2018-05-15
Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance. We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. The Snakemake, C++ and R codes are freely available from https://github.com/Oncostat/BenchmarkNCVTools and supported on Linux. damien.drubay@gustaveroussy.fr or stefan.michiels@gustaveroussy.fr. Supplementary data are available at Bioinformatics online.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Benchmarking the Multidimensional Stellar Implicit Code MUSIC
NASA Astrophysics Data System (ADS)
Goffrey, T.; Pratt, J.; Viallet, M.; Baraffe, I.; Popov, M. V.; Walder, R.; Folini, D.; Geroux, C.; Constantino, T.
2017-04-01
We present the results of a numerical benchmark study for the MUltidimensional Stellar Implicit Code (MUSIC) based on widely applicable two- and three-dimensional compressible hydrodynamics problems relevant to stellar interiors. MUSIC is an implicit large eddy simulation code that uses implicit time integration, implemented as a Jacobian-free Newton Krylov method. A physics based preconditioning technique which can be adjusted to target varying physics is used to improve the performance of the solver. The problems used for this benchmark study include the Rayleigh-Taylor and Kelvin-Helmholtz instabilities, and the decay of the Taylor-Green vortex. Additionally we show a test of hydrostatic equilibrium, in a stellar environment which is dominated by radiative effects. In this setting the flexibility of the preconditioning technique is demonstrated. This work aims to bridge the gap between the hydrodynamic test problems typically used during development of numerical methods and the complex flows of stellar interiors. A series of multidimensional tests were performed and analysed. Each of these test cases was analysed with a simple, scalar diagnostic, with the aim of enabling direct code comparisons. As the tests performed do not have analytic solutions, we verify MUSIC by comparing it to established codes including ATHENA and the PENCIL code. MUSIC is able to both reproduce behaviour from established and widely-used codes as well as results expected from theoretical predictions. This benchmarking study concludes a series of papers describing the development of the MUSIC code and provides confidence in future applications.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samfass, Philipp
LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. Onemore » of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).« less
NASA Astrophysics Data System (ADS)
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
Validation of the WIMSD4M cross-section generation code with benchmark results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leal, L.C.; Deen, J.R.; Woodruff, W.L.
1995-02-01
The WIMSD4 code has been adopted for cross-section generation in support of the Reduced Enrichment for Research and Test (RERTR) program at Argonne National Laboratory (ANL). Subsequently, the code has undergone several updates, and significant improvements have been achieved. The capability of generating group-collapsed micro- or macroscopic cross sections from the ENDF/B-V library and the more recent evaluation, ENDF/B-VI, in the ISOTXS format makes the modified version of the WIMSD4 code, WIMSD4M, very attractive, not only for the RERTR program, but also for the reactor physics community. The intent of the present paper is to validate the procedure to generatemore » cross-section libraries for reactor analyses and calculations utilizing the WIMSD4M code. To do so, the results of calculations performed with group cross-section data generated with the WIMSD4M code will be compared against experimental results. These results correspond to calculations carried out with thermal reactor benchmarks of the Oak Ridge National Laboratory(ORNL) unreflected critical spheres, the TRX critical experiments, and calculations of a modified Los Alamos highly-enriched heavy-water moderated benchmark critical system. The benchmark calculations were performed with the discrete-ordinates transport code, TWODANT, using WIMSD4M cross-section data. Transport calculations using the XSDRNPM module of the SCALE code system are also included. In addition to transport calculations, diffusion calculations with the DIF3D code were also carried out, since the DIF3D code is used in the RERTR program for reactor analysis and design. For completeness, Monte Carlo results of calculations performed with the VIM and MCNP codes are also presented.« less
Renner, Franziska
2016-09-01
Monte Carlo simulations are regarded as the most accurate method of solving complex problems in the field of dosimetry and radiation transport. In (external) radiation therapy they are increasingly used for the calculation of dose distributions during treatment planning. In comparison to other algorithms for the calculation of dose distributions, Monte Carlo methods have the capability of improving the accuracy of dose calculations - especially under complex circumstances (e.g. consideration of inhomogeneities). However, there is a lack of knowledge of how accurate the results of Monte Carlo calculations are on an absolute basis. A practical verification of the calculations can be performed by direct comparison with the results of a benchmark experiment. This work presents such a benchmark experiment and compares its results (with detailed consideration of measurement uncertainty) with the results of Monte Carlo calculations using the well-established Monte Carlo code EGSnrc. The experiment was designed to have parallels to external beam radiation therapy with respect to the type and energy of the radiation, the materials used and the kind of dose measurement. Because the properties of the beam have to be well known in order to compare the results of the experiment and the simulation on an absolute basis, the benchmark experiment was performed using the research electron accelerator of the Physikalisch-Technische Bundesanstalt (PTB), whose beam was accurately characterized in advance. The benchmark experiment and the corresponding Monte Carlo simulations were carried out for two different types of ionization chambers and the results were compared. Considering the uncertainty, which is about 0.7 % for the experimental values and about 1.0 % for the Monte Carlo simulation, the results of the simulation and the experiment coincide. Copyright © 2015. Published by Elsevier GmbH.
Development of a Benchmark Example for Delamination Fatigue Growth Prediction
NASA Technical Reports Server (NTRS)
Krueger, Ronald
2010-01-01
The development of a benchmark example for cyclic delamination growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of a Double Cantilever Beam (DCB) specimen, which is independent of the analysis software used and allows the assessment of the delamination growth prediction capabilities in commercial finite element codes. First, the benchmark result was created for the specimen. Second, starting from an initially straight front, the delamination was allowed to grow under cyclic loading in a finite element model of a commercial code. The number of cycles to delamination onset and the number of cycles during stable delamination growth for each growth increment were obtained from the analysis. In general, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. Overall, the results are encouraging but further assessment for mixed-mode delamination is required
HACC: Extreme Scaling and Performance Across Diverse Architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin
2013-11-01
Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Static analysis techniques for semiautomatic synthesis of message passing software skeletons
Sottile, Matthew; Dagit, Jason; Zhang, Deli; ...
2015-06-29
The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed formore » the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. Finally, we demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.« less
Performance Metrics for Monitoring Parallel Program Executions
NASA Technical Reports Server (NTRS)
Sarukkai, Sekkar R.; Gotwais, Jacob K.; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
Existing tools for debugging performance of parallel programs either provide graphical representations of program execution or profiles of program executions. However, for performance debugging tools to be useful, such information has to be augmented with information that highlights the cause of poor program performance. Identifying the cause of poor performance necessitates the need for not only determining the significance of various performance problems on the execution time of the program, but also needs to consider the effect of interprocessor communications of individual source level data structures. In this paper, we present a suite of normalized indices which provide a convenient mechanism for focusing on a region of code with poor performance and highlights the cause of the problem in terms of processors, procedures and data structure interactions. All the indices are generated from trace files augmented with data structure information.. Further, we show with the help of examples from the NAS benchmark suite that the indices help in detecting potential cause of poor performance, based on augmented execution traces obtained by monitoring the program.
Subgroup Benchmark Calculations for the Intra-Pellet Nonuniform Temperature Cases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Kang Seog; Jung, Yeon Sang; Liu, Yuxuan
A benchmark suite has been developed by Seoul National University (SNU) for intrapellet nonuniform temperature distribution cases based on the practical temperature profiles according to the thermal power levels. Though a new subgroup capability for nonuniform temperature distribution was implemented in MPACT, no validation calculation has been performed for the new capability. This study focuses on bench-marking the new capability through a code-to-code comparison. Two continuous-energy Monte Carlo codes, McCARD and CE-KENO, are engaged in obtaining reference solutions, and the MPACT results are compared to the SNU nTRACER using a similar cross section library and subgroup method to obtain self-shieldedmore » cross sections.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karl Anderson, Steve Plimpton
2015-01-27
The FireHose Streaming Benchmarks are a suite of stream-processing benchmarks defined to enable comparison of streaming software and hardware, both quantitatively vis-a-vis the rate at which they can process data, and qualitatively by judging the effort involved to implement and run the benchmarks. Each benchmark has two parts. The first is a generator which produces and outputs datums at a high rate in a specific format. The second is an analytic which reads the stream of datums and is required to perform a well-defined calculation on the collection of datums, typically to find anomalous datums that have been created inmore » the stream by the generator. The FireHose suite provides code for the generators, sample code for the analytics (which users are free to re-implement in their own custom frameworks), and a precise definition of each benchmark calculation.« less
Benchmarking and tuning the MILC code on clusters and supercomputers
NASA Astrophysics Data System (ADS)
Gottlieb, Steven
2002-03-01
Recently, we have benchmarked and tuned the MILC code on a number of architectures including Intel Itanium and Pentium IV (PIV), dual-CPU Athlon, and the latest Compaq Alpha nodes. Results will be presented for many of these, and we shall discuss some simple code changes that can result in a very dramatic speedup of the KS conjugate gradient on processors with more advanced memory systems such as PIV, IBM SP and Alpha.
Benchmarking and tuning the MILC code on clusters and supercomputers
NASA Astrophysics Data System (ADS)
Gottlieb, Steven
Recently, we have benchmarked and tuned the MILC code on a number of architectures including Intel Itanium and Pentium IV (PIV), dual-CPU Athlon, and the latest Compaq Alpha nodes. Results will be presented for many of these, and we shall discuss some simple code changes that can result in a very dramatic speedup of the KS conjugate gradient on processors with more advanced memory systems such as PIV, IBM SP and Alpha.
Benchmarking study of the MCNP code against cold critical experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sitaraman, S.
1991-01-01
The purpose of this study was to benchmark the widely used Monte Carlo code MCNP against a set of cold critical experiments with a view to using the code as a means of independently verifying the performance of faster but less accurate Monte Carlo and deterministic codes. The experiments simulated consisted of both fast and thermal criticals as well as fuel in a variety of chemical forms. A standard set of benchmark cold critical experiments was modeled. These included the two fast experiments, GODIVA and JEZEBEL, the TRX metallic uranium thermal experiments, the Babcock and Wilcox oxide and mixed oxidemore » experiments, and the Oak Ridge National Laboratory (ORNL) and Pacific Northwest Laboratory (PNL) nitrate solution experiments. The principal case studied was a small critical experiment that was performed with boiling water reactor bundles.« less
NASA Technical Reports Server (NTRS)
Padovan, J.; Adams, M.; Fertis, J.; Zeid, I.; Lam, P.
1982-01-01
Finite element codes are used in modelling rotor-bearing-stator structure common to the turbine industry. Engine dynamic simulation is used by developing strategies which enable the use of available finite element codes. benchmarking the elements developed are benchmarked by incorporation into a general purpose code (ADINA); the numerical characteristics of finite element type rotor-bearing-stator simulations are evaluated through the use of various types of explicit/implicit numerical integration operators. Improving the overall numerical efficiency of the procedure is improved.
First benchmark of the Unstructured Grid Adaptation Working Group
NASA Technical Reports Server (NTRS)
Ibanez, Daniel; Barral, Nicolas; Krakos, Joshua; Loseille, Adrien; Michal, Todd; Park, Mike
2017-01-01
Unstructured grid adaptation is a technology that holds the potential to improve the automation and accuracy of computational fluid dynamics and other computational disciplines. Difficulty producing the highly anisotropic elements necessary for simulation on complex curved geometries that satisfies a resolution request has limited this technology's widespread adoption. The Unstructured Grid Adaptation Working Group is an open gathering of researchers working on adapting simplicial meshes to conform to a metric field. Current members span a wide range of institutions including academia, industry, and national laboratories. The purpose of this group is to create a common basis for understanding and improving mesh adaptation. We present our first major contribution: a common set of benchmark cases, including input meshes and analytic metric specifications, that are publicly available to be used for evaluating any mesh adaptation code. We also present the results of several existing codes on these benchmark cases, to illustrate their utility in identifying key challenges common to all codes and important differences between available codes. Future directions are defined to expand this benchmark to mature the technology necessary to impact practical simulation workflows.
NASA Astrophysics Data System (ADS)
Fraser, Ryan; Gross, Lutz; Wyborn, Lesley; Evans, Ben; Klump, Jens
2015-04-01
Recent investments in HPC, cloud and Petascale data stores, have dramatically increased the scale and resolution that earth science challenges can now be tackled. These new infrastructures are highly parallelised and to fully utilise them and access the large volumes of earth science data now available, a new approach to software stack engineering needs to be developed. The size, complexity and cost of the new infrastructures mean any software deployed has to be reliable, trusted and reusable. Increasingly software is available via open source repositories, but these usually only enable code to be discovered and downloaded. As a user it is hard for a scientist to judge the suitability and quality of individual codes: rarely is there information on how and where codes can be run, what the critical dependencies are, and in particular, on the version requirements and licensing of the underlying software stack. A trusted software framework is proposed to enable reliable software to be discovered, accessed and then deployed on multiple hardware environments. More specifically, this framework will enable those who generate the software, and those who fund the development of software, to gain credit for the effort, IP, time and dollars spent, and facilitate quantification of the impact of individual codes. For scientific users, the framework delivers reviewed and benchmarked scientific software with mechanisms to reproduce results. The trusted framework will have five separate, but connected components: Register, Review, Reference, Run, and Repeat. 1) The Register component will facilitate discovery of relevant software from multiple open source code repositories. The registration process of the code should include information about licensing, hardware environments it can be run on, define appropriate validation (testing) procedures and list the critical dependencies. 2) The Review component is targeting on the verification of the software typically against a set of benchmark cases. This will be achieved by linking the code in the software framework to peer review forums such as Mozilla Science or appropriate Journals (e.g. Geoscientific Model Development Journal) to assist users to know which codes to trust. 3) Referencing will be accomplished by linking the Software Framework to groups such as Figshare or ImpactStory that help disseminate and measure the impact of scientific research, including program code. 4) The Run component will draw on information supplied in the registration process, benchmark cases described in the review and relevant information to instantiate the scientific code on the selected environment. 5) The Repeat component will tap into existing Provenance Workflow engines that will automatically capture information that relate to a particular run of that software, including identification of all input and output artefacts, and all elements and transactions within that workflow. The proposed trusted software framework will enable users to rapidly discover and access reliable code, reduce the time to deploy it and greatly facilitate sharing, reuse and reinstallation of code. Properly designed it could enable an ability to scale out to massively parallel systems and be accessed nationally/ internationally for multiple use cases, including Supercomputer centres, cloud facilities, and local computers.
NASA Astrophysics Data System (ADS)
Lee, Yi-Kang
2017-09-01
Nuclear decommissioning takes place in several stages due to the radioactivity in the reactor structure materials. A good estimation of the neutron activation products distributed in the reactor structure materials impacts obviously on the decommissioning planning and the low-level radioactive waste management. Continuous energy Monte-Carlo radiation transport code TRIPOLI-4 has been applied on radiation protection and shielding analyses. To enhance the TRIPOLI-4 application in nuclear decommissioning activities, both experimental and computational benchmarks are being performed. To calculate the neutron activation of the shielding and structure materials of nuclear facilities, the knowledge of 3D neutron flux map and energy spectra must be first investigated. To perform this type of neutron deep penetration calculations with the Monte Carlo transport code, variance reduction techniques are necessary in order to reduce the uncertainty of the neutron activation estimation. In this study, variance reduction options of the TRIPOLI-4 code were used on the NAIADE 1 light water shielding benchmark. This benchmark document is available from the OECD/NEA SINBAD shielding benchmark database. From this benchmark database, a simplified NAIADE 1 water shielding model was first proposed in this work in order to make the code validation easier. Determination of the fission neutron transport was performed in light water for penetration up to 50 cm for fast neutrons and up to about 180 cm for thermal neutrons. Measurement and calculation results were benchmarked. Variance reduction options and their performance were discussed and compared.
Arthur, Jennifer; Bahran, Rian; Hutchinson, Jesson; ...
2018-06-14
Historically, radiation transport codes have uncorrelated fission emissions. In reality, the particles emitted by both spontaneous and induced fissions are correlated in time, energy, angle, and multiplicity. This work validates the performance of various current Monte Carlo codes that take into account the underlying correlated physics of fission neutrons, specifically neutron multiplicity distributions. The performance of 4 Monte Carlo codes - MCNP®6.2, MCNP®6.2/FREYA, MCNP®6.2/CGMF, and PoliMi - was assessed using neutron multiplicity benchmark experiments. In addition, MCNP®6.2 simulations were run using JEFF-3.2 and JENDL-4.0, rather than ENDF/B-VII.1, data for 239Pu and 240Pu. The sensitive benchmark parameters that in this workmore » represent the performance of each correlated fission multiplicity Monte Carlo code include the singles rate, the doubles rate, leakage multiplication, and Feynman histograms. Although it is difficult to determine which radiation transport code shows the best overall performance in simulating subcritical neutron multiplication inference benchmark measurements, it is clear that correlations exist between the underlying nuclear data utilized by (or generated by) the various codes, and the correlated neutron observables of interest. This could prove useful in nuclear data validation and evaluation applications, in which a particular moment of the neutron multiplicity distribution is of more interest than the other moments. It is also quite clear that, because transport is handled by MCNP®6.2 in 3 of the 4 codes, with the 4th code (PoliMi) being based on an older version of MCNP®, the differences in correlated neutron observables of interest are most likely due to the treatment of fission event generation in each of the different codes, as opposed to the radiation transport.« less
Validation of the WIMSD4M cross-section generation code with benchmark results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deen, J.R.; Woodruff, W.L.; Leal, L.E.
1995-01-01
The WIMSD4 code has been adopted for cross-section generation in support of the Reduced Enrichment Research and Test Reactor (RERTR) program at Argonne National Laboratory (ANL). Subsequently, the code has undergone several updates, and significant improvements have been achieved. The capability of generating group-collapsed micro- or macroscopic cross sections from the ENDF/B-V library and the more recent evaluation, ENDF/B-VI, in the ISOTXS format makes the modified version of the WIMSD4 code, WIMSD4M, very attractive, not only for the RERTR program, but also for the reactor physics community. The intent of the present paper is to validate the WIMSD4M cross-section librariesmore » for reactor modeling of fresh water moderated cores. The results of calculations performed with multigroup cross-section data generated with the WIMSD4M code will be compared against experimental results. These results correspond to calculations carried out with thermal reactor benchmarks of the Oak Ridge National Laboratory (ORNL) unreflected HEU critical spheres, the TRX LEU critical experiments, and calculations of a modified Los Alamos HEU D{sub 2}O moderated benchmark critical system. The benchmark calculations were performed with the discrete-ordinates transport code, TWODANT, using WIMSD4M cross-section data. Transport calculations using the XSDRNPM module of the SCALE code system are also included. In addition to transport calculations, diffusion calculations with the DIF3D code were also carried out, since the DIF3D code is used in the RERTR program for reactor analysis and design. For completeness, Monte Carlo results of calculations performed with the VIM and MCNP codes are also presented.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arthur, Jennifer; Bahran, Rian; Hutchinson, Jesson
Historically, radiation transport codes have uncorrelated fission emissions. In reality, the particles emitted by both spontaneous and induced fissions are correlated in time, energy, angle, and multiplicity. This work validates the performance of various current Monte Carlo codes that take into account the underlying correlated physics of fission neutrons, specifically neutron multiplicity distributions. The performance of 4 Monte Carlo codes - MCNP®6.2, MCNP®6.2/FREYA, MCNP®6.2/CGMF, and PoliMi - was assessed using neutron multiplicity benchmark experiments. In addition, MCNP®6.2 simulations were run using JEFF-3.2 and JENDL-4.0, rather than ENDF/B-VII.1, data for 239Pu and 240Pu. The sensitive benchmark parameters that in this workmore » represent the performance of each correlated fission multiplicity Monte Carlo code include the singles rate, the doubles rate, leakage multiplication, and Feynman histograms. Although it is difficult to determine which radiation transport code shows the best overall performance in simulating subcritical neutron multiplication inference benchmark measurements, it is clear that correlations exist between the underlying nuclear data utilized by (or generated by) the various codes, and the correlated neutron observables of interest. This could prove useful in nuclear data validation and evaluation applications, in which a particular moment of the neutron multiplicity distribution is of more interest than the other moments. It is also quite clear that, because transport is handled by MCNP®6.2 in 3 of the 4 codes, with the 4th code (PoliMi) being based on an older version of MCNP®, the differences in correlated neutron observables of interest are most likely due to the treatment of fission event generation in each of the different codes, as opposed to the radiation transport.« less
SIGACE Code for Generating High-Temperature ACE Files; Validation and Benchmarking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharma, Amit R.; Ganesan, S.; Trkov, A.
2005-05-24
A code named SIGACE has been developed as a tool for MCNP users within the scope of a research contract awarded by the Nuclear Data Section of the International Atomic Energy Agency (IAEA) (Ref: 302-F4-IND-11566 B5-IND-29641). A new recipe has been evolved for generating high-temperature ACE files for use with the MCNP code. Under this scheme the low-temperature ACE file is first converted to an ENDF formatted file using the ACELST code and then Doppler broadened, essentially limited to the data in the resolved resonance region, to any desired higher temperature using SIGMA1. The SIGACE code then generates a high-temperaturemore » ACE file for use with the MCNP code. A thinning routine has also been introduced in the SIGACE code for reducing the size of the ACE files. The SIGACE code and the recipe for generating ACE files at higher temperatures has been applied to the SEFOR fast reactor benchmark problem (sodium-cooled fast reactor benchmark described in ENDF-202/BNL-19302, 1974 document). The calculated Doppler coefficient is in good agreement with the experimental value. A similar calculation using ACE files generated directly with the NJOY system also agrees with our SIGACE computed results. The SIGACE code and the recipe is further applied to study the numerical benchmark configuration of selected idealized PWR pin cell configurations with five different fuel enrichments as reported by Mosteller and Eisenhart. The SIGACE code that has been tested with several FENDL/MC files will be available, free of cost, upon request, from the Nuclear Data Section of the IAEA.« less
A new deadlock resolution protocol and message matching algorithm for the extreme-scale simulator
Engelmann, Christian; Naughton, III, Thomas J.
2016-03-22
Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2)~a new simulated MPI message matchingmore » algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. Furthermore, with highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ganapol, B.D.; Kornreich, D.E.
Because of the requirement of accountability and quality control in the scientific world, a demand for high-quality analytical benchmark calculations has arisen in the neutron transport community. The intent of these benchmarks is to provide a numerical standard to which production neutron transport codes may be compared in order to verify proper operation. The overall investigation as modified in the second year renewal application includes the following three primary tasks. Task 1 on two dimensional neutron transport is divided into (a) single medium searchlight problem (SLP) and (b) two-adjacent half-space SLP. Task 2 on three-dimensional neutron transport covers (a) pointmore » source in arbitrary geometry, (b) single medium SLP, and (c) two-adjacent half-space SLP. Task 3 on code verification, includes deterministic and probabilistic codes. The primary aim of the proposed investigation was to provide a suite of comprehensive two- and three-dimensional analytical benchmarks for neutron transport theory applications. This objective has been achieved. The suite of benchmarks in infinite media and the three-dimensional SLP are a relatively comprehensive set of one-group benchmarks for isotropically scattering media. Because of time and resource limitations, the extensions of the benchmarks to include multi-group and anisotropic scattering are not included here. Presently, however, enormous advances in the solution for the planar Green`s function in an anisotropically scattering medium have been made and will eventually be implemented in the two- and three-dimensional solutions considered under this grant. Of particular note in this work are the numerical results for the three-dimensional SLP, which have never before been presented. The results presented were made possible only because of the tremendous advances in computing power that have occurred during the past decade.« less
TRUST. I. A 3D externally illuminated slab benchmark for dust radiative transfer
NASA Astrophysics Data System (ADS)
Gordon, K. D.; Baes, M.; Bianchi, S.; Camps, P.; Juvela, M.; Kuiper, R.; Lunttila, T.; Misselt, K. A.; Natale, G.; Robitaille, T.; Steinacker, J.
2017-07-01
Context. The radiative transport of photons through arbitrary three-dimensional (3D) structures of dust is a challenging problem due to the anisotropic scattering of dust grains and strong coupling between different spatial regions. The radiative transfer problem in 3D is solved using Monte Carlo or Ray Tracing techniques as no full analytic solution exists for the true 3D structures. Aims: We provide the first 3D dust radiative transfer benchmark composed of a slab of dust with uniform density externally illuminated by a star. This simple 3D benchmark is explicitly formulated to provide tests of the different components of the radiative transfer problem including dust absorption, scattering, and emission. Methods: The details of the external star, the slab itself, and the dust properties are provided. This benchmark includes models with a range of dust optical depths fully probing cases that are optically thin at all wavelengths to optically thick at most wavelengths. The dust properties adopted are characteristic of the diffuse Milky Way interstellar medium. This benchmark includes solutions for the full dust emission including single photon (stochastic) heating as well as two simplifying approximations: One where all grains are considered in equilibrium with the radiation field and one where the emission is from a single effective grain with size-distribution-averaged properties. A total of six Monte Carlo codes and one Ray Tracing code provide solutions to this benchmark. Results: The solution to this benchmark is given as global spectral energy distributions (SEDs) and images at select diagnostic wavelengths from the ultraviolet through the infrared. Comparison of the results revealed that the global SEDs are consistent on average to a few percent for all but the scattered stellar flux at very high optical depths. The image results are consistent within 10%, again except for the stellar scattered flux at very high optical depths. The lack of agreement between different codes of the scattered flux at high optical depths is quantified for the first time. Convergence tests using one of the Monte Carlo codes illustrate the sensitivity of the solutions to various model parameters. Conclusions: We provide the first 3D dust radiative transfer benchmark and validate the accuracy of this benchmark through comparisons between multiple independent codes and detailed convergence tests.
An Object-Oriented Serial DSMC Simulation Package
NASA Astrophysics Data System (ADS)
Liu, Hongli; Cai, Chunpei
2011-05-01
A newly developed three-dimensional direct simulation Monte Carlo (DSMC) simulation package, named GRASP ("Generalized Rarefied gAs Simulation Package"), is reported in this paper. This package utilizes the concept of simulation engine, many C++ features and software design patterns. The package has an open architecture which can benefit further development and maintenance of the code. In order to reduce the engineering time for three-dimensional models, a hybrid grid scheme, combined with a flexible data structure compiled by C++ language, are implemented in this package. This scheme utilizes a local data structure based on the computational cell to achieve high performance on workstation processors. This data structure allows the DSMC algorithm to be very efficiently parallelized with domain decomposition and it provides much flexibility in terms of grid types. This package can utilize traditional structured, unstructured or hybrid grids within the framework of a single code to model arbitrarily complex geometries and to simulate rarefied gas flows. Benchmark test cases indicate that this package has satisfactory accuracy for complex rarefied gas flows.
Use Computer-Aided Tools to Parallelize Large CFD Applications
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Yan, J.
2000-01-01
Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.
Reference Solutions for Benchmark Turbulent Flows in Three Dimensions
NASA Technical Reports Server (NTRS)
Diskin, Boris; Thomas, James L.; Pandya, Mohagna J.; Rumsey, Christopher L.
2016-01-01
A grid convergence study is performed to establish benchmark solutions for turbulent flows in three dimensions (3D) in support of turbulence-model verification campaign at the Turbulence Modeling Resource (TMR) website. The three benchmark cases are subsonic flows around a 3D bump and a hemisphere-cylinder configuration and a supersonic internal flow through a square duct. Reference solutions are computed for Reynolds Averaged Navier Stokes equations with the Spalart-Allmaras turbulence model using a linear eddy-viscosity model for the external flows and a nonlinear eddy-viscosity model based on a quadratic constitutive relation for the internal flow. The study involves three widely-used practical computational fluid dynamics codes developed and supported at NASA Langley Research Center: FUN3D, USM3D, and CFL3D. Reference steady-state solutions computed with these three codes on families of consistently refined grids are presented. Grid-to-grid and code-to-code variations are described in detail.
Verification of TEMPEST with neoclassical transport theory
NASA Astrophysics Data System (ADS)
Xiong, Z.; Cohen, B. I.; Cohen, R. H.; Dorr, M.; Hittinger, J.; Kerbel, G.; Nevins, W. M.; Rognlien, T.; Umansky, M.; Xu, X.
2006-10-01
TEMPEST is an edge gyro-kinetic continuum code developed to study boundary plasma transport over the region extending from the H-mode pedestal across the separatrix to the divertor plates. For benchmark purposes, we present results from the 4D (2r,2v) TEMPEST for both steady-state transport and time-dependent Geodesic Acoustic Modes (GAMs). We focus on an annular region inside the separatrix of a circular cross-section tokamak where analytical and numerical results are available. The parallel flow velocity and radial particle flux are obtained for different collisional regimes and compared with previous neoclassical results. The effect of radial electric field and the transition to steep edge gradients is emphasized. The dynamical response of GAMs is also shown and compared to recent theory.
NASA Astrophysics Data System (ADS)
KIM, Jong Woon; LEE, Young-Ouk
2017-09-01
As computing power gets better and better, computer codes that use a deterministic method seem to be less useful than those using the Monte Carlo method. In addition, users do not like to think about space, angles, and energy discretization for deterministic codes. However, a deterministic method is still powerful in that we can obtain a solution of the flux throughout the problem, particularly as when particles can barely penetrate, such as in a deep penetration problem with small detection volumes. Recently, a new state-of-the-art discrete-ordinates code, ATTILA, was developed and has been widely used in several applications. ATTILA provides the capabilities to solve geometrically complex 3-D transport problems by using an unstructured tetrahedral mesh. Since 2009, we have been developing our own code by benchmarking ATTILA. AETIUS is a discrete ordinates code that uses an unstructured tetrahedral mesh such as ATTILA. For pre- and post- processing, Gmsh is used to generate an unstructured tetrahedral mesh by importing a CAD file (*.step) and visualizing the calculation results of AETIUS. Using a CAD tool, the geometry can be modeled very easily. In this paper, we describe a brief overview of AETIUS and provide numerical results from both AETIUS and a Monte Carlo code, MCNP5, in a deep penetration problem with small detection volumes. The results demonstrate the effectiveness and efficiency of AETIUS for such calculations.
Simulation Studies for Inspection of the Benchmark Test with PATRASH
NASA Astrophysics Data System (ADS)
Shimosaki, Y.; Igarashi, S.; Machida, S.; Shirakata, M.; Takayama, K.; Noda, F.; Shigaki, K.
2002-12-01
In order to delineate the halo-formation mechanisms in a typical FODO lattice, a 2-D simulation code PATRASH (PArticle TRAcking in a Synchrotron for Halo analysis) has been developed. The electric field originating from the space charge is calculated by the Hybrid Tree code method. Benchmark tests utilizing three simulation codes of ACCSIM, PATRASH and SIMPSONS were carried out. These results have been confirmed to be fairly in agreement with each other. The details of PATRASH simulation are discussed with some examples.
1991-05-31
benchmarks ............ .... . .. .. . . .. 220 Appendix G : Source code of the Aquarius Prolog compiler ........ . 224 Chapter I Introduction "You’re given...notation, a tool that is used throughout the compiler’s implementation. Appendix F lists the source code of the C and Prolog benchmarks. Appendix G lists the...source code of the compilcr. 5 "- standard form Prolog / a-sfomadon / head umrvln Convert to tmeikernel Prol g vrans~fonaon 1symbolic execution
Benchmark of the local drift-kinetic models for neoclassical transport simulation in helical plasmas
NASA Astrophysics Data System (ADS)
Huang, B.; Satake, S.; Kanno, R.; Sugama, H.; Matsuoka, S.
2017-02-01
The benchmarks of the neoclassical transport codes based on the several local drift-kinetic models are reported here. Here, the drift-kinetic models are zero orbit width (ZOW), zero magnetic drift, DKES-like, and global, as classified in Matsuoka et al. [Phys. Plasmas 22, 072511 (2015)]. The magnetic geometries of Helically Symmetric Experiment, Large Helical Device (LHD), and Wendelstein 7-X are employed in the benchmarks. It is found that the assumption of E ×B incompressibility causes discrepancy of neoclassical radial flux and parallel flow among the models when E ×B is sufficiently large compared to the magnetic drift velocities. For example, Mp≤0.4 where Mp is the poloidal Mach number. On the other hand, when E ×B and the magnetic drift velocities are comparable, the tangential magnetic drift, which is included in both the global and ZOW models, fills the role of suppressing unphysical peaking of neoclassical radial-fluxes found in the other local models at Er≃0 . In low collisionality plasmas, in particular, the tangential drift effect works well to suppress such unphysical behavior of the radial transport caused in the simulations. It is demonstrated that the ZOW model has the advantage of mitigating the unphysical behavior in the several magnetic geometries, and that it also implements the evaluation of bootstrap current in LHD with the low computation cost compared to the global model.
Automatic Thread-Level Parallelization in the Chombo AMR Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christen, Matthias; Keen, Noel; Ligocki, Terry
2011-05-26
The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less
NASA Astrophysics Data System (ADS)
Jacques, Diederik
2017-04-01
As soil functions are governed by a multitude of interacting hydrological, geochemical and biological processes, simulation tools coupling mathematical models for interacting processes are needed. Coupled reactive transport models are a typical example of such coupled tools mainly focusing on hydrological and geochemical coupling (see e.g. Steefel et al., 2015). Mathematical and numerical complexity for both the tool itself or of the specific conceptual model can increase rapidly. Therefore, numerical verification of such type of models is a prerequisite for guaranteeing reliability and confidence and qualifying simulation tools and approaches for any further model application. In 2011, a first SeSBench -Subsurface Environmental Simulation Benchmarking- workshop was held in Berkeley (USA) followed by four other ones. The objective is to benchmark subsurface environmental simulation models and methods with a current focus on reactive transport processes. The final outcome was a special issue in Computational Geosciences (2015, issue 3 - Reactive transport benchmarks for subsurface environmental simulation) with a collection of 11 benchmarks. Benchmarks, proposed by the participants of the workshops, should be relevant for environmental or geo-engineering applications; the latter were mostly related to radioactive waste disposal issues - excluding benchmarks defined for pure mathematical reasons. Another important feature is the tiered approach within a benchmark with the definition of a single principle problem and different sub problems. The latter typically benchmarked individual or simplified processes (e.g. inert solute transport, simplified geochemical conceptual model) or geometries (e.g. batch or one-dimensional, homogeneous). Finally, three codes should be involved into a benchmark. The SeSBench initiative contributes to confidence building for applying reactive transport codes. Furthermore, it illustrates the use of those type of models for different environmental and geo-engineering applications. SeSBench will organize new workshops to add new benchmarks in a new special issue. Steefel, C. I., et al. (2015). "Reactive transport codes for subsurface environmental simulation." Computational Geosciences 19: 445-478.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burke, B. J.; Kruger, S. E.; Hegna, C. C.
A linear benchmark between the linear ideal MHD stability codes ELITE [H. R. Wilson et al., Phys. Plasmas 9, 1277 (2002)], GATO [L. Bernard et al., Comput. Phys. Commun. 24, 377 (1981)], and the extended nonlinear magnetohydrodynamic (MHD) code, NIMROD [C. R. Sovinec et al.., J. Comput. Phys. 195, 355 (2004)] is undertaken for edge-localized (MHD) instabilities. Two ballooning-unstable, shifted-circle tokamak equilibria are compared where the stability characteristics are varied by changing the equilibrium plasma profiles. The equilibria model an H-mode plasma with a pedestal pressure profile and parallel edge currents. For both equilibria, NIMROD accurately reproduces the transition tomore » instability (the marginally unstable mode), as well as the ideal growth spectrum for a large range of toroidal modes (n=1-20). The results use the compressible MHD model and depend on a precise representation of 'ideal-like' and 'vacuumlike' or 'halo' regions within the code. The halo region is modeled by the introduction of a Lundquist-value profile that transitions from a large to a small value at a flux surface location outside of the pedestal region. To model an ideal-like MHD response in the core and a vacuumlike response outside the transition, separate criteria on the plasma and halo Lundquist values are required. For the benchmarked equilibria the critical Lundquist values are 10{sup 8} and 10{sup 3} for the ideal-like and halo regions, respectively. Notably, this gives a ratio on the order of 10{sup 5}, which is much larger than experimentally measured values using T{sub e} values associated with the top of the pedestal and separatrix. Excellent agreement with ELITE and GATO calculations are made when sharp boundary transitions in the resistivity are used and a small amount of physical dissipation is added for conditions very near and below marginal ideal stability.« less
DRG benchmarking study establishes national coding norms.
Vaul, J H
1998-05-01
With the increase in fraud and abuse investigations, healthcare financial managers should examine their organization's medical record coding procedures. The Federal government and third-party payers are looking specifically for improper billing of outpatient services, unbundling of procedures to increase payment, assigning higher-paying DRG codes for inpatient claims, and other abuses. A recent benchmarking study of Medicare Provider Analysis and Review (MEDPAR) data has established national norms for hospital coding and case mix based on DRGs and has revealed the majority of atypical coding cases fall into six DRG pairs. Organizations with a greater percentage of atypical cases--those more likely to be scrutinized by Federal investigators--will want to conduct suitable review and be sure appropriate documentation exists to justify the coding.
Solution of the neutronics code dynamic benchmark by finite element method
NASA Astrophysics Data System (ADS)
Avvakumov, A. V.; Vabishchevich, P. N.; Vasilev, A. O.; Strizhov, V. F.
2016-10-01
The objective is to analyze the dynamic benchmark developed by Atomic Energy Research for the verification of best-estimate neutronics codes. The benchmark scenario includes asymmetrical ejection of a control rod in a water-type hexagonal reactor at hot zero power. A simple Doppler feedback mechanism assuming adiabatic fuel temperature heating is proposed. The finite element method on triangular calculation grids is used to solve the three-dimensional neutron kinetics problem. The software has been developed using the engineering and scientific calculation library FEniCS. The matrix spectral problem is solved using the scalable and flexible toolkit SLEPc. The solution accuracy of the dynamic benchmark is analyzed by condensing calculation grid and varying degree of finite elements.
CHORUS code for solar and planetary convection
NASA Astrophysics Data System (ADS)
Wang, Junfeng
Turbulent, density stratified convection is ubiquitous in stars and planets. Numerical simulation has become an indispensable tool for understanding it. A primary contribution of this dissertation work is the creation of the Compressible High-ORder Unstructured Spectral-difference (CHORUS) code for simulating the convection and related fluid dynamics in the interiors of stars and planets. In this work, the CHORUS code is verified by using two newly defined benchmark cases and demonstrates excellent parallel performance. It has unique potential to simulate challenging physical phenomena such as multi-scale solar convection, core convection, and convection in oblate, rapidly-rotating stars. In order to exploit its unique capabilities, the CHORUS code has been extended to perform the first 3D simulations of convection in oblate, rapidly rotating solar-type stars. New insights are obtained with respect to the influence of oblateness on the convective structure and heat flux transport. With the presence of oblateness resulting from the centrifugal force effect, the convective structure in the polar regions decouples from the main convective modes in the equatorial regions. Our convection simulations predict that heat flux peaks in both the polar and equatorial regions, contrary to previous theoretical results that predict darker equators. High latitudinal zonal jets are also observed in the simulations.
A Parallel Processing Algorithm for Remote Sensing Classification
NASA Technical Reports Server (NTRS)
Gualtieri, J. Anthony
2005-01-01
A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Accelerating execution of the integrated TIGER series Monte Carlo radiation transport codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, L.M.; Hochstedler, R.D.
1997-02-01
Execution of the integrated TIGER series (ITS) of coupled electron/photon Monte Carlo radiation transport codes has been accelerated by modifying the FORTRAN source code for more efficient computation. Each member code of ITS was benchmarked and profiled with a specific test case that directed the acceleration effort toward the most computationally intensive subroutines. Techniques for accelerating these subroutines included replacing linear search algorithms with binary versions, replacing the pseudo-random number generator, reducing program memory allocation, and proofing the input files for geometrical redundancies. All techniques produced identical or statistically similar results to the original code. Final benchmark timing of themore » accelerated code resulted in speed-up factors of 2.00 for TIGER (the one-dimensional slab geometry code), 1.74 for CYLTRAN (the two-dimensional cylindrical geometry code), and 1.90 for ACCEPT (the arbitrary three-dimensional geometry code).« less
Spacecraft charging analysis with the implicit particle-in-cell code iPic3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deca, J.; Lapenta, G.; Marchand, R.
2013-10-15
We present the first results on the analysis of spacecraft charging with the implicit particle-in-cell code iPic3D, designed for running on massively parallel supercomputers. The numerical algorithm is presented, highlighting the implementation of the electrostatic solver and the immersed boundary algorithm; the latter which creates the possibility to handle complex spacecraft geometries. As a first step in the verification process, a comparison is made between the floating potential obtained with iPic3D and with Orbital Motion Limited theory for a spherical particle in a uniform stationary plasma. Second, the numerical model is verified for a CubeSat benchmark by comparing simulation resultsmore » with those of PTetra for space environment conditions with increasing levels of complexity. In particular, we consider spacecraft charging from plasma particle collection, photoelectron and secondary electron emission. The influence of a background magnetic field on the floating potential profile near the spacecraft is also considered. Although the numerical approaches in iPic3D and PTetra are rather different, good agreement is found between the two models, raising the level of confidence in both codes to predict and evaluate the complex plasma environment around spacecraft.« less
Nonlinear 3D MHD verification study: SpeCyl and PIXIE3D codes for RFP and Tokamak plasmas
NASA Astrophysics Data System (ADS)
Bonfiglio, D.; Cappello, S.; Chacon, L.
2010-11-01
A strong emphasis is presently placed in the fusion community on reaching predictive capability of computational models. An essential requirement of such endeavor is the process of assessing the mathematical correctness of computational tools, termed verification [1]. We present here a successful nonlinear cross-benchmark verification study between the 3D nonlinear MHD codes SpeCyl [2] and PIXIE3D [3]. Excellent quantitative agreement is obtained in both 2D and 3D nonlinear visco-resistive dynamics for reversed-field pinch (RFP) and tokamak configurations [4]. RFP dynamics, in particular, lends itself as an ideal non trivial test-bed for 3D nonlinear verification. Perspectives for future application of the fully-implicit parallel code PIXIE3D to RFP physics, in particular to address open issues on RFP helical self-organization, will be provided. [4pt] [1] M. Greenwald, Phys. Plasmas 17, 058101 (2010) [0pt] [2] S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996) [0pt] [3] L. Chac'on, Phys. Plasmas 15, 056103 (2008) [0pt] [4] D. Bonfiglio, L. Chac'on and S. Cappello, Phys. Plasmas 17 (2010)
Predicting Cost/Performance Trade-Offs for Whitney: A Commodity Computing Cluster
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.; Nitzberg, Bill; VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)
1997-01-01
Recent advances in low-end processor and network technology have made it possible to build a "supercomputer" out of commodity components. We develop simple models of the NAS Parallel Benchmarks version 2 (NPB 2) to explore the cost/performance trade-offs involved in building a balanced parallel computer supporting a scientific workload. We develop closed form expressions detailing the number and size of messages sent by each benchmark. Coupling these with measured single processor performance, network latency, and network bandwidth, our models predict benchmark performance to within 30%. A comparison based on total system cost reveals that current commodity technology (200 MHz Pentium Pros with 100baseT Ethernet) is well balanced for the NPBs up to a total system cost of around $1,000,000.
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)
1997-01-01
We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
Benchmarking hypercube hardware and software
NASA Technical Reports Server (NTRS)
Grunwald, Dirk C.; Reed, Daniel A.
1986-01-01
It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Ali, F; Waker, A J; Waller, E J
2014-10-01
Tissue-equivalent proportional counters (TEPC) can potentially be used as a portable and personal dosemeter in mixed neutron and gamma-ray fields, but what hinders this use is their typically large physical size. To formulate compact TEPC designs, the use of a Monte Carlo transport code is necessary to predict the performance of compact designs in these fields. To perform this modelling, three candidate codes were assessed: MCNPX 2.7.E, FLUKA 2011.2 and PHITS 2.24. In each code, benchmark simulations were performed involving the irradiation of a 5-in. TEPC with monoenergetic neutron fields and a 4-in. wall-less TEPC with monoenergetic gamma-ray fields. The frequency and dose mean lineal energies and dose distributions calculated from each code were compared with experimentally determined data. For the neutron benchmark simulations, PHITS produces data closest to the experimental values and for the gamma-ray benchmark simulations, FLUKA yields data closest to the experimentally determined quantities. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerhard Strydom; Cristian Rabiti; Andrea Alfonsi
2012-10-01
PHISICS is a neutronics code system currently under development at the Idaho National Laboratory (INL). Its goal is to provide state of the art simulation capability to reactor designers. The different modules for PHISICS currently under development are a nodal and semi-structured transport core solver (INSTANT), a depletion module (MRTAU) and a cross section interpolation (MIXER) module. The INSTANT module is the most developed of the mentioned above. Basic functionalities are ready to use, but the code is still in continuous development to extend its capabilities. This paper reports on the effort of coupling the nodal kinetics code package PHISICSmore » (INSTANT/MRTAU/MIXER) to the thermal hydraulics system code RELAP5-3D, to enable full core and system modeling. This will enable the possibility to model coupled (thermal-hydraulics and neutronics) problems with more options for 3D neutron kinetics, compared to the existing diffusion theory neutron kinetics module in RELAP5-3D (NESTLE). In the second part of the paper, an overview of the OECD/NEA MHTGR-350 MW benchmark is given. This benchmark has been approved by the OECD, and is based on the General Atomics 350 MW Modular High Temperature Gas Reactor (MHTGR) design. The benchmark includes coupled neutronics thermal hydraulics exercises that require more capabilities than RELAP5-3D with NESTLE offers. Therefore, the MHTGR benchmark makes extensive use of the new PHISICS/RELAP5-3D coupling capabilities. The paper presents the preliminary results of the three steady state exercises specified in Phase I of the benchmark using PHISICS/RELAP5-3D.« less
NASA Astrophysics Data System (ADS)
Steefel, C. I.
2015-12-01
Over the last 20 years, we have seen the evolution of multicomponent reactive transport modeling and the expanding range and increasing complexity of subsurface environmental applications it is being used to address. Reactive transport modeling is being asked to provide accurate assessments of engineering performance and risk for important issues with far-reaching consequences. As a result, the complexity and detail of subsurface processes, properties, and conditions that can be simulated have significantly expanded. Closed form solutions are necessary and useful, but limited to situations that are far simpler than typical applications that combine many physical and chemical processes, in many cases in coupled form. In the absence of closed form and yet realistic solutions for complex applications, numerical benchmark problems with an accepted set of results will be indispensable to qualifying codes for various environmental applications. The intent of this benchmarking exercise, now underway for more than five years, is to develop and publish a set of well-described benchmark problems that can be used to demonstrate simulator conformance with norms established by the subsurface science and engineering community. The objective is not to verify this or that specific code--the reactive transport codes play a supporting role in this regard—but rather to use the codes to verify that a common solution of the problem can be achieved. Thus, the objective of each of the manuscripts is to present an environmentally-relevant benchmark problem that tests the conceptual model capabilities, numerical implementation, process coupling, and accuracy. The benchmark problems developed to date include 1) microbially-mediated reactions, 2) isotopes, 3) multi-component diffusion, 4) uranium fate and transport, 5) metal mobility in mining affected systems, and 6) waste repositories and related aspects.
High Performance Computing at NASA
NASA Technical Reports Server (NTRS)
Bailey, David H.; Cooper, D. M. (Technical Monitor)
1994-01-01
The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
A suite of exercises for verifying dynamic earthquake rupture codes
Harris, Ruth A.; Barall, Michael; Aagaard, Brad T.; Ma, Shuo; Roten, Daniel; Olsen, Kim B.; Duan, Benchun; Liu, Dunyu; Luo, Bin; Bai, Kangchen; Ampuero, Jean-Paul; Kaneko, Yoshihiro; Gabriel, Alice-Agnes; Duru, Kenneth; Ulrich, Thomas; Wollherr, Stephanie; Shi, Zheqiang; Dunham, Eric; Bydlon, Sam; Zhang, Zhenguo; Chen, Xiaofei; Somala, Surendra N.; Pelties, Christian; Tago, Josue; Cruz-Atienza, Victor Manuel; Kozdon, Jeremy; Daub, Eric; Aslam, Khurram; Kase, Yuko; Withers, Kyle; Dalguer, Luis
2018-01-01
We describe a set of benchmark exercises that are designed to test if computer codes that simulate dynamic earthquake rupture are working as intended. These types of computer codes are often used to understand how earthquakes operate, and they produce simulation results that include earthquake size, amounts of fault slip, and the patterns of ground shaking and crustal deformation. The benchmark exercises examine a range of features that scientists incorporate in their dynamic earthquake rupture simulations. These include implementations of simple or complex fault geometry, off‐fault rock response to an earthquake, stress conditions, and a variety of formulations for fault friction. Many of the benchmarks were designed to investigate scientific problems at the forefronts of earthquake physics and strong ground motions research. The exercises are freely available on our website for use by the scientific community.
Parallel Ada benchmarks for the SVMS
NASA Technical Reports Server (NTRS)
Collard, Philippe E.
1990-01-01
The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.
Communication Studies of DMP and SMP Machines
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
GEN-IV Benchmarking of Triso Fuel Performance Models under accident conditions modeling input data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collin, Blaise Paul
This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: • The modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release. • The modeling of the AGR-1 and HFR-EU1bis safety testing experiments. •more » The comparison of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from “Case 5” of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. “Case 5” of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to “effects of the numerical calculation method rather than the physical model” [IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary. 09/2016: Tables 6 and 8 updated. AGR-2 input data added« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collin, Blaise P.
2014-09-01
This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: the modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release; the modeling of the AGR-1 and HFR-EU1bis safety testing experiments; and, the comparisonmore » of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from ''Case 5'' of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. ''Case 5'' of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to ''effects of the numerical calculation method rather than the physical model''[IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary.« less
MARC calculations for the second WIPP structural benchmark problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morgan, H.S.
1981-05-01
This report describes calculations made with the MARC structural finite element code for the second WIPP structural benchmark problem. Specific aspects of problem implementation such as element choice, slip line modeling, creep law implementation, and thermal-mechanical coupling are discussed in detail. Also included are the computational results specified in the benchmark problem formulation.
Analysis of a benchmark suite to evaluate mixed numeric and symbolic processing
NASA Technical Reports Server (NTRS)
Ragharan, Bharathi; Galant, David
1992-01-01
The suite of programs that formed the benchmark for a proposed advanced computer is described and analyzed. The features of the processor and its operating system that are tested by the benchmark are discussed. The computer codes and the supporting data for the analysis are given as appendices.
Benchmark Testing of a New 56Fe Evaluation for Criticality Safety Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leal, Luiz C; Ivanov, E.
2015-01-01
The SAMMY code was used to evaluate resonance parameters of the 56Fe cross section in the resolved resonance energy range of 0–2 MeV using transmission data, capture, elastic, inelastic, and double differential elastic cross sections. The resonance analysis was performed with the code SAMMY that fits R-matrix resonance parameters using the generalized least-squares technique (Bayes’ theory). The evaluation yielded a set of resonance parameters that reproduced the experimental data very well, along with a resonance parameter covariance matrix for data uncertainty calculations. Benchmark tests were conducted to assess the evaluation performance in benchmark calculations.
3D streamers simulation in a pin to plane configuration using massively parallel computing
NASA Astrophysics Data System (ADS)
Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.
2018-03-01
This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
2D imaging of helium ion velocity in the DIII-D divertor
NASA Astrophysics Data System (ADS)
Samuell, C. M.; Porter, G. D.; Meyer, W. H.; Rognlien, T. D.; Allen, S. L.; Briesemeister, A.; Mclean, A. G.; Zeng, L.; Jaervinen, A. E.; Howard, J.
2018-05-01
Two-dimensional imaging of parallel ion velocities is compared to fluid modeling simulations to understand the role of ions in determining divertor conditions and benchmark the UEDGE fluid modeling code. Pure helium discharges are used so that spectroscopic He+ measurements represent the main-ion population at small electron temperatures. Electron temperatures and densities in the divertor match simulated values to within about 20%-30%, establishing the experiment/model match as being at least as good as those normally obtained in the more regularly simulated deuterium plasmas. He+ brightness (HeII) comparison indicates that the degree of detachment is captured well by UEDGE, principally due to the inclusion of E ×B drifts. Tomographically inverted Coherence Imaging Spectroscopy measurements are used to determine the He+ parallel velocities which display excellent agreement between the model and the experiment near the divertor target where He+ is predicted to be the main-ion species and where electron-dominated physics dictates the parallel momentum balance. Upstream near the X-point where He+ is a minority species and ion-dominated physics plays a more important role, there is an underestimation of the flow velocity magnitude by a factor of 2-3. These results indicate that more effort is required to be able to correctly predict ion momentum in these challenging regimes.
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
TRIPOLI-4® - MCNP5 ITER A-lite neutronic model benchmarking
NASA Astrophysics Data System (ADS)
Jaboulay, J.-C.; Cayla, P.-Y.; Fausser, C.; Lee, Y.-K.; Trama, J.-C.; Li-Puma, A.
2014-06-01
The aim of this paper is to present the capability of TRIPOLI-4®, the CEA Monte Carlo code, to model a large-scale fusion reactor with complex neutron source and geometry. In the past, numerous benchmarks were conducted for TRIPOLI-4® assessment on fusion applications. Experiments (KANT, OKTAVIAN, FNG) analysis and numerical benchmarks (between TRIPOLI-4® and MCNP5) on the HCLL DEMO2007 and ITER models were carried out successively. In this previous ITER benchmark, nevertheless, only the neutron wall loading was analyzed, its main purpose was to present MCAM (the FDS Team CAD import tool) extension for TRIPOLI-4®. Starting from this work a more extended benchmark has been performed about the estimation of neutron flux, nuclear heating in the shielding blankets and tritium production rate in the European TBMs (HCLL and HCPB) and it is presented in this paper. The methodology to build the TRIPOLI-4® A-lite model is based on MCAM and the MCNP A-lite model (version 4.1). Simplified TBMs (from KIT) have been integrated in the equatorial-port. Comparisons of neutron wall loading, flux, nuclear heating and tritium production rate show a good agreement between the two codes. Discrepancies are mainly included in the Monte Carlo codes statistical error.
ELEFANT: a user-friendly multipurpose geodynamics code
NASA Astrophysics Data System (ADS)
Thieulot, C.
2014-07-01
A new finite element code for the solution of the Stokes and heat transport equations is presented. It has purposely been designed to address geological flow problems in two and three dimensions at crustal and lithospheric scales. The code relies on the Marker-in-Cell technique and Lagrangian markers are used to track materials in the simulation domain which allows recording of the integrated history of deformation; their (number) density is variable and dynamically adapted. A variety of rheologies has been implemented including nonlinear thermally activated dislocation and diffusion creep and brittle (or plastic) frictional models. The code is built on the Arbitrary Lagrangian Eulerian kinematic description: the computational grid deforms vertically and allows for a true free surface while the computational domain remains of constant width in the horizontal direction. The solution to the large system of algebraic equations resulting from the finite element discretisation and linearisation of the set of coupled partial differential equations to be solved is obtained by means of the efficient parallel direct solver MUMPS whose performance is thoroughly tested, or by means of the WISMP and AGMG iterative solvers. The code accuracy is assessed by means of many geodynamically relevant benchmark experiments which highlight specific features or algorithms, e.g., the implementation of the free surface stabilisation algorithm, the (visco-)plastic rheology implementation, the temperature advection, the capacity of the code to handle large viscosity contrasts. A two-dimensional application to salt tectonics presented as case study illustrates the potential of the code to model large scale high resolution thermo-mechanically coupled free surface flows.
Development and verification of NRC`s single-rod fuel performance codes FRAPCON-3 AND FRAPTRAN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beyer, C.E.; Cunningham, M.E.; Lanning, D.D.
1998-03-01
The FRAPCON and FRAP-T code series, developed in the 1970s and early 1980s, are used by the US Nuclear Regulatory Commission (NRC) to predict fuel performance during steady-state and transient power conditions, respectively. Both code series are now being updated by Pacific Northwest National Laboratory to improve their predictive capabilities at high burnup levels. The newest versions of the codes are called FRAPCON-3 and FRAPTRAN. The updates to fuel property and behavior models are focusing on providing best estimate predictions under steady-state and fast transient power conditions up to extended fuel burnups (> 55 GWd/MTU). Both codes will be assessedmore » against a data base independent of the data base used for code benchmarking and an estimate of code predictive uncertainties will be made based on comparisons to the benchmark and independent data bases.« less
Computer aided design of extrusion forming tools for complex geometry profiles
NASA Astrophysics Data System (ADS)
Goncalves, Nelson Daniel Ferreira
In the profile extrusion, the experience of the die designer is crucial for obtaining good results. In industry, it is quite usual the need of several experimental trials for a specific extrusion die before a balanced flow distribution is obtained. This experimental based trial-and-error procedure is time and money consuming, but, it works, and most of the profile extrusion companies rely on such method. However, the competition is forcing the industry to look for more effective procedures and the design of profile extrusion dies is not an exception. For this purpose, computer aided design seems to be a good route. Nowadays, the available computational rheology numerical codes allow the simulation of complex fluid flows. This permits the die designer to evaluate and to optimize the flow channel, without the need to have a physical die and to perform real extrusion trials. In this work, a finite volume based numerical code was developed, for the simulation of non-Newtonian (inelastic) fluid and non-isothermal flows using unstructured meshes. The developed code is able to model the forming and cooling stages of profile extrusion, and can be used to aid the design of forming tools used in the production of complex profiles. For the code verification three benchmark problems were tested: flow between parallel plates, flow around a cylinder, and the lid driven cavity flow. The code was employed to design two extrusion dies to produce complex cross section profiles: a medical catheter die and a wood plastic composite profile for decking applications. The last was experimentally validated. Simple extrusion dies used to produced L and T shaped profiles were studied in detail, allowing a better understanding of the effect of the main geometry parameters on the flow distribution. To model the cooling stage a new implicit formulation was devised, which allowed the achievement of better convergence rates and thus the reduction of the computation times. Having in mind the solution of large dimension problems, the code was parallelized using graphics processing units (GPUs). Speedups of ten times could be obtained, drastically decreasing the time required to obtain results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Rui; Sumner, Tyler S.
2016-04-17
An advanced system analysis tool SAM is being developed for fast-running, improved-fidelity, and whole-plant transient analyses at Argonne National Laboratory under DOE-NE’s Nuclear Energy Advanced Modeling and Simulation (NEAMS) program. As an important part of code development, companion validation activities are being conducted to ensure the performance and validity of the SAM code. This paper presents the benchmark simulations of two EBR-II tests, SHRT-45R and BOP-302R, whose data are available through the support of DOE-NE’s Advanced Reactor Technology (ART) program. The code predictions of major primary coolant system parameter are compared with the test results. Additionally, the SAS4A/SASSYS-1 code simulationmore » results are also included for a code-to-code comparison.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Enabling the High Level Synthesis of Data Analytics Accelerators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Applications Performance Under MPL and MPI on NAS IBM SP2
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
COMPUTERIZED TRAINING OF CRYOSURGERY – A SYSTEM APPROACH
Keelan, Robert; Yamakawa, Soji; Shimada, Kenji; Rabin, Yoed
2014-01-01
The objective of the current study is to provide the foundation for a computerized training platform for cryosurgery. Consistent with clinical practice, the training process targets the correlation of the frozen region contour with the target region shape, using medical imaging and accepted criteria for clinical success. The current study focuses on system design considerations, including a bioheat transfer model, simulation techniques, optimal cryoprobe layout strategy, and a simulation core framework. Two fundamentally different approaches were considered for the development of a cryosurgery simulator, based on a finite-elements (FE) commercial code (ANSYS) and a proprietary finite-difference (FD) code. Results of this study demonstrate that the FE simulator is superior in terms of geometric modeling, while the FD simulator is superior in terms of runtime. Benchmarking results further indicate that the FD simulator is superior in terms of usage of memory resources, pre-processing, parallel processing, and post-processing. It is envisioned that future integration of a human-interface module and clinical data into the proposed computer framework will make computerized training of cryosurgery a practical reality. PMID:23995400
Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy
2017-10-06
The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
GPU-BASED MONTE CARLO DUST RADIATIVE TRANSFER SCHEME APPLIED TO ACTIVE GALACTIC NUCLEI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heymann, Frank; Siebenmorgen, Ralf, E-mail: fheymann@pa.uky.edu
2012-05-20
A three-dimensional parallel Monte Carlo (MC) dust radiative transfer code is presented. To overcome the huge computing-time requirements of MC treatments, the computational power of vectorized hardware is used, utilizing either multi-core computer power or graphics processing units. The approach is a self-consistent way to solve the radiative transfer equation in arbitrary dust configurations. The code calculates the equilibrium temperatures of two populations of large grains and stochastic heated polycyclic aromatic hydrocarbons. Anisotropic scattering is treated applying the Heney-Greenstein phase function. The spectral energy distribution (SED) of the object is derived at low spatial resolution by a photon counting proceduremore » and at high spatial resolution by a vectorized ray tracer. The latter allows computation of high signal-to-noise images of the objects at any frequencies and arbitrary viewing angles. We test the robustness of our approach against other radiative transfer codes. The SED and dust temperatures of one- and two-dimensional benchmarks are reproduced at high precision. The parallelization capability of various MC algorithms is analyzed and included in our treatment. We utilize the Lucy algorithm for the optical thin case where the Poisson noise is high, the iteration-free Bjorkman and Wood method to reduce the calculation time, and the Fleck and Canfield diffusion approximation for extreme optical thick cells. The code is applied to model the appearance of active galactic nuclei (AGNs) at optical and infrared wavelengths. The AGN torus is clumpy and includes fluffy composite grains of various sizes made up of silicates and carbon. The dependence of the SED on the number of clumps in the torus and the viewing angle is studied. The appearance of the 10 {mu}m silicate features in absorption or emission is discussed. The SED of the radio-loud quasar 3C 249.1 is fit by the AGN model and a cirrus component to account for the far-infrared emission.« less
Benchmark problems for numerical implementations of phase field models
Jokisaari, A. M.; Voorhees, P. W.; Guyer, J. E.; ...
2016-10-01
Here, we present the first set of benchmark problems for phase field models that are being developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST). While many scientific research areas use a limited set of well-established software, the growing phase field community continues to develop a wide variety of codes and lacks benchmark problems to consistently evaluate the numerical performance of new implementations. Phase field modeling has become significantly more popular as computational power has increased and is now becoming mainstream, driving the need for benchmark problems to validate and verifymore » new implementations. We follow the example set by the micromagnetics community to develop an evolving set of benchmark problems that test the usability, computational resources, numerical capabilities and physical scope of phase field simulation codes. In this paper, we propose two benchmark problems that cover the physics of solute diffusion and growth and coarsening of a second phase via a simple spinodal decomposition model and a more complex Ostwald ripening model. We demonstrate the utility of benchmark problems by comparing the results of simulations performed with two different adaptive time stepping techniques, and we discuss the needs of future benchmark problems. The development of benchmark problems will enable the results of quantitative phase field models to be confidently incorporated into integrated computational materials science and engineering (ICME), an important goal of the Materials Genome Initiative.« less
Benchmarking Heavy Ion Transport Codes FLUKA, HETC-HEDS MARS15, MCNPX, and PHITS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ronningen, Reginald Martin; Remec, Igor; Heilbronn, Lawrence H.
Powerful accelerators such as spallation neutron sources, muon-collider/neutrino facilities, and rare isotope beam facilities must be designed with the consideration that they handle the beam power reliably and safely, and they must be optimized to yield maximum performance relative to their design requirements. The simulation codes used for design purposes must produce reliable results. If not, component and facility designs can become costly, have limited lifetime and usefulness, and could even be unsafe. The objective of this proposal is to assess the performance of the currently available codes PHITS, FLUKA, MARS15, MCNPX, and HETC-HEDS that could be used for designmore » simulations involving heavy ion transport. We plan to access their performance by performing simulations and comparing results against experimental data of benchmark quality. Quantitative knowledge of the biases and the uncertainties of the simulations is essential as this potentially impacts the safe, reliable and cost effective design of any future radioactive ion beam facility. Further benchmarking of heavy-ion transport codes was one of the actions recommended in the Report of the 2003 RIA R&D Workshop".« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shumaker, Dana E.; Steefel, Carl I.
The code CRUNCH_PARALLEL is a parallel version of the CRUNCH code. CRUNCH code version 2.0 was previously released by LLNL, (UCRL-CODE-200063). Crunch is a general purpose reactive transport code developed by Carl Steefel and Yabusake (Steefel Yabsaki 1996). The code handles non-isothermal transport and reaction in one, two, and three dimensions. The reaction algorithm is generic in form, handling an arbitrary number of aqueous and surface complexation as well as mineral dissolution/precipitation. A standardized database is used containing thermodynamic and kinetic data. The code includes advective, dispersive, and diffusive transport.
Translating an AI application from Lisp to Ada: A case study
NASA Technical Reports Server (NTRS)
Davis, Gloria J.
1991-01-01
A set of benchmarks was developed to test the performance of a newly designed computer executing both Lisp and Ada. Among these was AutoClassII -- a large Artificial Intelligence (AI) application written in Common Lisp. The extraction of a representative subset of this complex application was aided by a Lisp Code Analyzer (LCA). The LCA enabled rapid analysis of the code, putting it in a concise and functionally readable form. An equivalent benchmark was created in Ada through manual translation of the Lisp version. A comparison of the execution results of both programs across a variety of compiler-machine combinations indicate that line-by-line translation coupled with analysis of the initial code can produce relatively efficient and reusable target code.
NASA Technical Reports Server (NTRS)
Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos
1996-01-01
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mkhabela, P.; Han, J.; Tyobeka, B.
2006-07-01
The Nuclear Energy Agency (NEA) of the Organization for Economic Cooperation and Development (OECD) has accepted, through the Nuclear Science Committee (NSC), the inclusion of the Pebble-Bed Modular Reactor 400 MW design (PBMR-400) coupled neutronics/thermal hydraulics transient benchmark problem as part of their official activities. The scope of the benchmark is to establish a well-defined problem, based on a common given library of cross sections, to compare methods and tools in core simulation and thermal hydraulics analysis with a specific focus on transient events through a set of multi-dimensional computational test problems. The benchmark includes three steady state exercises andmore » six transient exercises. This paper describes the first two steady state exercises, their objectives and the international participation in terms of organization, country and computer code utilized. This description is followed by a comparison and analysis of the participants' results submitted for these two exercises. The comparison of results from different codes allows for an assessment of the sensitivity of a result to the method employed and can thus help to focus the development efforts on the most critical areas. The two first exercises also allow for removing of user-related modeling errors and prepare core neutronics and thermal-hydraulics models of the different codes for the rest of the exercises in the benchmark. (authors)« less
Rethinking key–value store for parallel I/O optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kougkas, Anthony; Eslami, Hassan; Sun, Xian-He
2015-01-26
Key-value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key-value stores. We propose using key-value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application.more » We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key-value stores.« less
Unstructured Adaptive Meshes: Bad for Your Memory?
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob
2003-01-01
This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Saini, Subhash
1997-01-01
Compilers supporting High Performance Form (HPF) features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR), Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI) combinations will be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition, we would also present NPB, (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu CAPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, and SGI Origin2000. We would also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
A Programming Model Performance Study Using the NAS Parallel Benchmarks
Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...
2010-01-01
Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers
NASA Technical Reports Server (NTRS)
Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak
1996-01-01
This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers
NASA Astrophysics Data System (ADS)
Ogino, T.
High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
NASA Astrophysics Data System (ADS)
Pei, Youbin; Xiang, Nong; Shen, Wei; Hu, Youjun; Todo, Y.; Zhou, Deng; Huang, Juan
2018-05-01
Kinetic-MagnetoHydroDynamic (MHD) hybrid simulations are carried out to study fast ion driven toroidal Alfvén eigenmodes (TAEs) on the Experimental Advanced Superconducting Tokamak (EAST). The first part of this article presents the linear benchmark between two kinetic-MHD codes, namely MEGA and M3D-K, based on a realistic EAST equilibrium. Parameter scans show that the frequency and the growth rate of the TAE given by the two codes agree with each other. The second part of this article discusses the resonance interaction between the TAE and fast ions simulated by the MEGA code. The results show that the TAE exchanges energy with the co-current passing particles with the parallel velocity |v∥ | ≈VA 0/3 or |v∥ | ≈VA 0/5 , where VA 0 is the Alfvén speed on the magnetic axis. The TAE destabilized by the counter-current passing ions is also analyzed and found to have a much smaller growth rate than the co-current ions driven TAE. One of the reasons for this is found to be that the overlapping region of the TAE spatial location and the counter-current ion orbits is narrow, and thus the wave-particle energy exchange is not efficient.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faillace, E.R.; Cheng, J.J.; Yu, C.
A series of benchmarking runs were conducted so that results obtained with the RESRAD code could be compared against those obtained with six pathway analysis models used to determine the radiation dose to an individual living on a radiologically contaminated site. The RESRAD computer code was benchmarked against five other computer codes - GENII-S, GENII, DECOM, PRESTO-EPA-CPG, and PATHRAE-EPA - and the uncodified methodology presented in the NUREG/CR-5512 report. Estimated doses for the external gamma pathway; the dust inhalation pathway; and the soil, food, and water ingestion pathways were calculated for each methodology by matching, to the extent possible, inputmore » parameters such as occupancy, shielding, and consumption factors.« less
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.
HELIOS: A new open-source radiative transfer code
NASA Astrophysics Data System (ADS)
Malik, Matej; Grosheintz, Luc; Lukas Grimm, Simon; Mendonça, João; Kitzmann, Daniel; Heng, Kevin
2015-12-01
I present the new open-source code HELIOS, developed to accurately describe radiative transfer in a wide variety of irradiated atmospheres. We employ a one-dimensional multi-wavelength two-stream approach with scattering. Written in Cuda C++, HELIOS uses the GPU’s potential of massive parallelization and is able to compute the TP-profile of an atmosphere in radiative equilibrium and the subsequent emission spectrum in a few minutes on a single computer (for 60 layers and 1000 wavelength bins).The required molecular opacities are obtained with the recently published code HELIOS-K [1], which calculates the line shapes from an input line list and resamples the numerous line-by-line data into a manageable k-distribution format. Based on simple equilibrium chemistry theory [2] we combine the k-distribution functions of the molecules H2O, CO2, CO & CH4 to generate a k-table, which we then employ in HELIOS.I present our results of the following: (i) Various numerical tests, e.g. isothermal vs. non-isothermal treatment of layers. (ii) Comparison of iteratively determined TP-profiles with their analytical parametric prescriptions [3] and of the corresponding spectra. (iii) Benchmarks of TP-profiles & spectra for various elemental abundances. (iv) Benchmarks of averaged TP-profiles & spectra for the exoplanets GJ1214b, HD189733b & HD209458b. (v) Comparison with secondary eclipse data for HD189733b, XO-1b & Corot-2b.HELIOS is being developed, together with the dynamical core THOR and the chemistry solver VULCAN, in the group of Kevin Heng at the University of Bern as part of the Exoclimes Simulation Platform (ESP) [4], which is an open-source project aimed to provide community tools to model exoplanetary atmospheres.-----------------------------[1] Grimm & Heng 2015, ArXiv, 1503.03806[2] Heng, Lyons & Tsai, Arxiv, 1506.05501Heng & Lyons, ArXiv, 1507.01944[3] e.g. Heng, Mendonca & Lee, 2014, ApJS, 215, 4H[4] exoclime.net
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suzuoka, Daiki; Takahashi, Hideaki, E-mail: hideaki@m.tohoku.ac.jp; Morita, Akihiro
2014-04-07
We developed a perturbation approach to compute solvation free energy Δμ within the framework of QM (quantum mechanical)/MM (molecular mechanical) method combined with a theory of energy representation (QM/MM-ER). The energy shift η of the whole system due to the electronic polarization of the solute is evaluated using the second-order perturbation theory (PT2), where the electric field formed by surrounding solvent molecules is treated as the perturbation to the electronic Hamiltonian of the isolated solute. The point of our approach is that the energy shift η, thus obtained, is to be adopted for a novel energy coordinate of the distributionmore » functions which serve as fundamental variables in the free energy functional developed in our previous work. The most time-consuming part in the QM/MM-ER simulation can be, thus, avoided without serious loss of accuracy. For our benchmark set of molecules, it is demonstrated that the PT2 approach coupled with QM/MM-ER gives hydration free energies in excellent agreements with those given by the conventional method utilizing the Kohn-Sham SCF procedure except for a few molecules in the benchmark set. A variant of the approach is also proposed to deal with such difficulties associated with the problematic systems. The present approach is also advantageous to parallel implementations. We examined the parallel efficiency of our PT2 code on multi-core processors and found that the speedup increases almost linearly with respect to the number of cores. Thus, it was demonstrated that QM/MM-ER coupled with PT2 deserves practical applications to systems of interest.« less
Issues and opportunities: beam simulations for heavy ion fusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedman, A
1999-07-15
UCRL- JC- 134975 PREPRINT code offering 3- D, axisymmetric, and ''transverse slice'' (steady flow) geometries, with a hierarchy of models for the ''lattice'' of focusing, bending, and accelerating elements. Interactive and script- driven code steering is afforded through an interpreter interface. The code runs with good parallel scaling on the T3E. Detailed simulations of machine segments and of complete small experiments, as well as simplified full- system runs, have been carried out, partially benchmarking the code. A magnetoinductive model, with module impedance and multi- beam effects, is under study. experiments, including an injector scalable to multi- beam arrays, a high-more » current beam transport and acceleration experiment, and a scaled final- focusing experiment. These ''phase I'' projects are laying the groundwork for the next major step in HIF development, the Integrated Research Experiment (IRE). Simulations aimed directly at the IRE must enable us to: design a facility with maximum power on target at minimal cost; set requirements for hardware tolerances, beam steering, etc.; and evaluate proposed chamber propagation modes. Finally, simulations must enable us to study all issues which arise in the context of a fusion driver, and must facilitate the assessment of driver options. In all of this, maximum advantage must be taken of emerging terascale computer architectures, requiring an aggressive code development effort. An organizing principle should be pursuit of the goal of integrated and detailed source- to- target simulation. methods for analysis of the beam dynamics in the various machine concepts, using moment- based methods for purposes of design, waveform synthesis, steering algorithm synthesis, etc. Three classes of discrete- particle models should be coupled: (1) electrostatic/ magnetoinductive PIC simulations should track the beams from the source through the final- focusing optics, passing details of the time- dependent distribution function to (2) electromagnetic or magnetoinductive PIC or hybrid PIG/ fluid simulations in the fusion chamber (which would finally pass their particle trajectory information to the radiation- hydrodynamics codes used for target design); in parallel, (3) detailed PIC, delta- f, core/ test- particle, and perhaps continuum Vlasov codes should be used to study individual sections of the driver and chamber very carefully; consistency may be assured by linking data from the PIC sequence, and knowledge gained may feed back into that sequence.« less
Utilizing GPUs to Accelerate Turbomachinery CFD Codes
NASA Technical Reports Server (NTRS)
MacCalla, Weylin; Kulkarni, Sameer
2016-01-01
GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. Rewriting and restructuring of the source code was avoided to limit the introduction of new bugs. The code was profiled and investigated for parallelization potential, then OpenACC directives were used to indicate parallel parts of the code. The use of OpenACC directives was not able to reduce the runtime of APNASA on either the NVIDIA Tesla discrete graphics card, or the AMD accelerated processing unit. Additionally, it was found that in order to justify the use of GPGPU, the amount of parallel work being done within a kernel would have to greatly exceed the work being done by any one portion of the APNASA code. It was determined that in order for an application like APNASA to be accelerated on the GPU, it should not be modular in nature, and the parallel portions of the code must contain a large portion of the code's computation time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niita, K.; Matsuda, N.; Iwamoto, Y.
The paper presents a brief description of the models incorporated in PHITS and the present status of the code, showing some benchmarking tests of the PHITS code for accelerator facilities and space radiation.
National Combustion Code Parallel Performance Enhancements
NASA Technical Reports Server (NTRS)
Quealy, Angela; Benyo, Theresa (Technical Monitor)
2002-01-01
The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. The unstructured grid, reacting flow code uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC code to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This report describes recent parallel processing modifications to NCC that have improved the parallel scalability of the code, enabling a two hour turnaround for a 1.3 million element fully reacting combustion simulation on an SGI Origin 2000.
Experimental benchmarking of a Monte Carlo dose simulation code for pediatric CT
NASA Astrophysics Data System (ADS)
Li, Xiang; Samei, Ehsan; Yoshizumi, Terry; Colsher, James G.; Jones, Robert P.; Frush, Donald P.
2007-03-01
In recent years, there has been a desire to reduce CT radiation dose to children because of their susceptibility and prolonged risk for cancer induction. Concerns arise, however, as to the impact of dose reduction on image quality and thus potentially on diagnostic accuracy. To study the dose and image quality relationship, we are developing a simulation code to calculate organ dose in pediatric CT patients. To benchmark this code, a cylindrical phantom was built to represent a pediatric torso, which allows measurements of dose distributions from its center to its periphery. Dose distributions for axial CT scans were measured on a 64-slice multidetector CT (MDCT) scanner (GE Healthcare, Chalfont St. Giles, UK). The same measurements were simulated using a Monte Carlo code (PENELOPE, Universitat de Barcelona) with the applicable CT geometry including bowtie filter. The deviations between simulated and measured dose values were generally within 5%. To our knowledge, this work is one of the first attempts to compare measured radial dose distributions on a cylindrical phantom with Monte Carlo simulated results. It provides a simple and effective method for benchmarking organ dose simulation codes and demonstrates the potential of Monte Carlo simulation for investigating the relationship between dose and image quality for pediatric CT patients.
Development of Benchmark Examples for Static Delamination Propagation and Fatigue Growth Predictions
NASA Technical Reports Server (NTRS)
Kruger, Ronald
2011-01-01
The development of benchmark examples for static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of an End-Notched Flexure (ENF) specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, static benchmark examples were created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during stable delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall, the results are encouraging but further assessment for mixed-mode delamination is required.
NASA Technical Reports Server (NTRS)
Krueger, Ronald
2011-01-01
The development of benchmark examples for static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of an End-Notched Flexure (ENF) specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, static benchmark examples were created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall the results are encouraging, but further assessment for mixed-mode delamination is required.
Benchmarking Defmod, an open source FEM code for modeling episodic fault rupture
NASA Astrophysics Data System (ADS)
Meng, Chunfang
2017-03-01
We present Defmod, an open source (linear) finite element code that enables us to efficiently model the crustal deformation due to (quasi-)static and dynamic loadings, poroelastic flow, viscoelastic flow and frictional fault slip. Ali (2015) provides the original code introducing an implicit solver for (quasi-)static problem, and an explicit solver for dynamic problem. The fault constraint is implemented via Lagrange Multiplier. Meng (2015) combines these two solvers into a hybrid solver that uses failure criteria and friction laws to adaptively switch between the (quasi-)static state and dynamic state. The code is capable of modeling episodic fault rupture driven by quasi-static loadings, e.g. due to reservoir fluid withdraw or injection. Here, we focus on benchmarking the Defmod results against some establish results.
Benchmarking of Improved DPAC Transient Deflagration Analysis Code
Laurinat, James E.; Hensel, Steve J.
2017-09-27
The deflagration pressure analysis code (DPAC) has been upgraded for use in modeling hydrogen deflagration transients. The upgraded code is benchmarked using data from vented hydrogen deflagration tests conducted at the HYDRO-SC Test Facility at the University of Pisa. DPAC originally was written to calculate peak pressures for deflagrations in radioactive waste storage tanks and process facilities at the Savannah River Site. Upgrades include the addition of a laminar flame speed correlation for hydrogen deflagrations and a mechanistic model for turbulent flame propagation, incorporation of inertial effects during venting, and inclusion of the effect of water vapor condensation on vesselmore » walls. In addition, DPAC has been coupled with chemical equilibrium with applications (CEA), a NASA combustion chemistry code. The deflagration tests are modeled as end-to-end deflagrations. As a result, the improved DPAC code successfully predicts both the peak pressures during the deflagration tests and the times at which the pressure peaks.« less
Benchmarking of Improved DPAC Transient Deflagration Analysis Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laurinat, James E.; Hensel, Steve J.
The deflagration pressure analysis code (DPAC) has been upgraded for use in modeling hydrogen deflagration transients. The upgraded code is benchmarked using data from vented hydrogen deflagration tests conducted at the HYDRO-SC Test Facility at the University of Pisa. DPAC originally was written to calculate peak pressures for deflagrations in radioactive waste storage tanks and process facilities at the Savannah River Site. Upgrades include the addition of a laminar flame speed correlation for hydrogen deflagrations and a mechanistic model for turbulent flame propagation, incorporation of inertial effects during venting, and inclusion of the effect of water vapor condensation on vesselmore » walls. In addition, DPAC has been coupled with chemical equilibrium with applications (CEA), a NASA combustion chemistry code. The deflagration tests are modeled as end-to-end deflagrations. As a result, the improved DPAC code successfully predicts both the peak pressures during the deflagration tests and the times at which the pressure peaks.« less
Parallelization of ARC3D with Computer-Aided Tools
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.
Support for Debugging Automatically Parallelized Programs
NASA Technical Reports Server (NTRS)
Hood, Robert; Jost, Gabriele
2001-01-01
This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit
NASA Technical Reports Server (NTRS)
MacNeice, Peter; Olson, Kevin M.; Mobarry, Clark; deFainchtein, Rosalinda; Packer, Charles
1999-01-01
In this paper, we describe a community toolkit which is designed to provide parallel support with adaptive mesh capability for a large and important class of computational models, those using structured, logically cartesian meshes. The package of Fortran 90 subroutines, called PARAMESH, is designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement. Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package can provide them with an incremental evolutionary path for their code, converting it first to uniformly refined parallel code, and then later if they so desire, adding adaptivity.
IPRT polarized radiative transfer model intercomparison project - Phase A
NASA Astrophysics Data System (ADS)
Emde, Claudia; Barlakas, Vasileios; Cornet, Céline; Evans, Frank; Korkin, Sergey; Ota, Yoshifumi; Labonnote, Laurent C.; Lyapustin, Alexei; Macke, Andreas; Mayer, Bernhard; Wendisch, Manfred
2015-10-01
The polarization state of electromagnetic radiation scattered by atmospheric particles such as aerosols, cloud droplets, or ice crystals contains much more information about the optical and microphysical properties than the total intensity alone. For this reason an increasing number of polarimetric observations are performed from space, from the ground and from aircraft. Polarized radiative transfer models are required to interpret and analyse these measurements and to develop retrieval algorithms exploiting polarimetric observations. In the last years a large number of new codes have been developed, mostly for specific applications. Benchmark results are available for specific cases, but not for more sophisticated scenarios including polarized surface reflection and multi-layer atmospheres. The International Polarized Radiative Transfer (IPRT) working group of the International Radiation Commission (IRC) has initiated a model intercomparison project in order to fill this gap. This paper presents the results of the first phase A of the IPRT project which includes ten test cases, from simple setups with only one layer and Rayleigh scattering to rather sophisticated setups with a cloud embedded in a standard atmosphere above an ocean surface. All scenarios in the first phase A of the intercomparison project are for a one-dimensional plane-parallel model geometry. The commonly established benchmark results are available at the IPRT website.
Acoustic 3D modeling by the method of integral equations
NASA Astrophysics Data System (ADS)
Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.
2018-02-01
This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
A Monte-Carlo Benchmark of TRIPOLI-4® and MCNP on ITER neutronics
NASA Astrophysics Data System (ADS)
Blanchet, David; Pénéliau, Yannick; Eschbach, Romain; Fontaine, Bruno; Cantone, Bruno; Ferlet, Marc; Gauthier, Eric; Guillon, Christophe; Letellier, Laurent; Proust, Maxime; Mota, Fernando; Palermo, Iole; Rios, Luis; Guern, Frédéric Le; Kocan, Martin; Reichle, Roger
2017-09-01
Radiation protection and shielding studies are often based on the extensive use of 3D Monte-Carlo neutron and photon transport simulations. ITER organization hence recommends the use of MCNP-5 code (version 1.60), in association with the FENDL-2.1 neutron cross section data library, specifically dedicated to fusion applications. The MCNP reference model of the ITER tokamak, the `C-lite', is being continuously developed and improved. This article proposes to develop an alternative model, equivalent to the 'C-lite', but for the Monte-Carlo code TRIPOLI-4®. A benchmark study is defined to test this new model. Since one of the most critical areas for ITER neutronics analysis concerns the assessment of radiation levels and Shutdown Dose Rates (SDDR) behind the Equatorial Port Plugs (EPP), the benchmark is conducted to compare the neutron flux through the EPP. This problem is quite challenging with regard to the complex geometry and considering the important neutron flux attenuation ranging from 1014 down to 108 n•cm-2•s-1. Such code-to-code comparison provides independent validation of the Monte-Carlo simulations, improving the confidence in neutronic results.
PARAVT: Parallel Voronoi tessellation code
NASA Astrophysics Data System (ADS)
González, R. E.
2016-10-01
In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
The Army Pollution Prevention Program: Improving Performance Through Benchmarking.
1995-06-01
Washington, DC 20503. 1. AGENCY USE ONLY (Leave Blank) 2. REPORT DATE June 1995 3. REPORT TYPE AND DATES COVERED Final 4. TITLE AND SUBTITLE...unlimited 12b. DISTRIBUTION CODE 13. ABSTRACT (Maximum 200 words) This report investigates the feasibility of using benchmarking as a method for...could use to determine to what degree it should integrate benchmarking with other quality management tools to support the pollution prevention program
NASA Technical Reports Server (NTRS)
Krueger, Ronald
2012-01-01
The development of benchmark examples for quasi-static delamination propagation prediction is presented and demonstrated for a commercial code. The examples are based on finite element models of the Mixed-Mode Bending (MMB) specimen. The examples are independent of the analysis software used and allow the assessment of the automated delamination propagation prediction capability in commercial finite element codes based on the virtual crack closure technique (VCCT). First, quasi-static benchmark examples were created for the specimen. Second, starting from an initially straight front, the delamination was allowed to propagate under quasi-static loading. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Good agreement between the results obtained from the automated propagation analysis and the benchmark results could be achieved by selecting input parameters that had previously been determined during analyses of mode I Double Cantilever Beam and mode II End Notched Flexure specimens. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Overall the results are encouraging, but further assessment for mixed-mode delamination fatigue onset and growth is required.
NASA Astrophysics Data System (ADS)
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.
1996-01-01
This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
Development of a new EMP code at LANL
NASA Astrophysics Data System (ADS)
Colman, J. J.; Roussel-Dupré, R. A.; Symbalisty, E. M.; Triplett, L. A.; Travis, B. J.
2006-05-01
A new code for modeling the generation of an electromagnetic pulse (EMP) by a nuclear explosion in the atmosphere is being developed. The source of the EMP is the Compton current produced by the prompt radiation (γ-rays, X-rays, and neutrons) of the detonation. As a first step in building a multi- dimensional EMP code we have written three kinetic codes, Plume, Swarm, and Rad. Plume models the transport of energetic electrons in air. The Plume code solves the relativistic Fokker-Planck equation over a specified energy range that can include ~ 3 keV to 50 MeV and computes the resulting electron distribution function at each cell in a two dimensional spatial grid. The energetic electrons are allowed to transport, scatter, and experience Coulombic drag. Swarm models the transport of lower energy electrons in air, spanning 0.005 eV to 30 keV. The swarm code performs a full 2-D solution to the Boltzmann equation for electrons in the presence of an applied electric field. Over this energy range the relevant processes to be tracked are elastic scattering, three body attachment, two body attachment, rotational excitation, vibrational excitation, electronic excitation, and ionization. All of these occur due to collisions between the electrons and neutral bodies in air. The Rad code solves the full radiation transfer equation in the energy range of 1 keV to 100 MeV. It includes effects of photo-absorption, Compton scattering, and pair-production. All of these codes employ a spherical coordinate system in momentum space and a cylindrical coordinate system in configuration space. The "z" axis of the momentum and configuration spaces is assumed to be parallel and we are currently also assuming complete spatial symmetry around the "z" axis. Benchmarking for each of these codes will be discussed as well as the way forward towards an integrated modern EMP code.
Benchmarking: your performance measurement and improvement tool.
Senn, G F
2000-01-01
Many respected professional healthcare organizations and societies today are seeking to establish data-driven performance measurement strategies such as benchmarking. Clinicians are, however, resistant to "benchmarking" that is based on financial data alone, concerned that it may be adverse to the patients' best interests. Benchmarking of clinical procedures that uses physician's codes such as Current Procedural Terminology (CPTs) has greater credibility with practitioners. Better Performers, organizations that can perform procedures successfully at lower cost and in less time, become the "benchmark" against which other organizations can measure themselves. The Better Performers' strategies can be adopted by other facilities to save time or money while maintaining quality patient care.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)
2002-01-01
The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
Performance Analysis of the ARL Linux Networx Cluster
2004-06-01
OVERFLOW, used processors selected by SGE. All benchmarks on the GAMESS, COBALT, LSDYNA and FLUENT. Each code Origin 3800 were executed using IRIX cpusets...scheduler. for these benchmarks defines a missile with grid fins consisting of seventeen million cells [31. 4. Application Performance Results and
Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.
Palkowski, Marek; Bielecki, Wlodzimierz
2018-01-15
RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality. To choose the best tile size and tile dimension, we first construct parallel parametric tiled code (parameters are variables defining tile size). With this purpose, we first generate two nonparametric tiled codes with different fixed tile sizes but with the same code structure and then derive a general affine model, which describes all integer factors available in expressions of those codes. Using this model and known integer factors present in the mentioned expressions (they define the left-hand side of the model), we find unknown integers in this model for each integer factor available in the same fixed tiled code position and replace in this code expressions, including integer factors, with those including parameters. Then we use this parallel parametric tiled code to implement the well-known tile size selection (TSS) technique, which allows us to discover in a given search space the best tile size and tile dimension maximizing target code performance. For a given search space, the presented approach allows us to choose the best tile size and tile dimension in parallel tiled code implementing Nussinov's RNA folding. Experimental results, received on modern Intel multi-core processors, demonstrate that this code outperforms known closely related implementations when the length of RNA strands is bigger than 2500.
NASA Astrophysics Data System (ADS)
Lin, Yi-Chun; Liu, Yuan-Hao; Nievaart, Sander; Chen, Yen-Fu; Wu, Shu-Wei; Chou, Wen-Tsae; Jiang, Shiang-Huei
2011-10-01
High energy photon (over 10 MeV) and neutron beams adopted in radiobiology and radiotherapy always produce mixed neutron/gamma-ray fields. The Mg(Ar) ionization chambers are commonly applied to determine the gamma-ray dose because of its neutron insensitive characteristic. Nowadays, many perturbation corrections for accurate dose estimation and lots of treatment planning systems are based on Monte Carlo technique. The Monte Carlo codes EGSnrc, FLUKA, GEANT4, MCNP5, and MCNPX were used to evaluate energy dependent response functions of the Exradin M2 Mg(Ar) ionization chamber to a parallel photon beam with mono-energies from 20 keV to 20 MeV. For the sake of validation, measurements were carefully performed in well-defined (a) primary M-100 X-ray calibration field, (b) primary 60Co calibration beam, (c) 6-MV, and (d) 10-MV therapeutic beams in hospital. At energy region below 100 keV, MCNP5 and MCNPX both had lower responses than other codes. For energies above 1 MeV, the MCNP ITS-mode greatly resembled other three codes and the differences were within 5%. Comparing to the measured currents, MCNP5 and MCNPX using ITS-mode had perfect agreement with the 60Co, and 10-MV beams. But at X-ray energy region, the derivations reached 17%. This work shows us a better insight into the performance of different Monte Carlo codes in photon-electron transport calculation. Regarding the application of the mixed field dosimetry like BNCT, MCNP with ITS-mode is recognized as the most suitable tool by this work.
GRILLIX: a 3D turbulence code based on the flux-coordinate independent approach
NASA Astrophysics Data System (ADS)
Stegmeir, Andreas; Coster, David; Ross, Alexander; Maj, Omar; Lackner, Karl; Poli, Emanuele
2018-03-01
The GRILLIX code is presented with which plasma turbulence/transport in various geometries can be simulated in 3D. The distinguishing feature of the code is that it is based on the flux-coordinate independent approach (FCI) (Hariri and Ottaviani 2013 Comput. Phys. Commun. 184 2419; Stegmeir et al 2016 Comput. Phys. Commun. 198 139). Cylindrical or Cartesian grids are used on which perpendicular operators are discretised via standard finite difference methods and parallel operators via a field line tracing and interpolation procedure (field line map). This offers a very high flexibility with respect to geometry, especially a separatrix with X-point(s) or a magnetic axis can be treated easily in contrast to approaches which are based on field aligned coordinates and suffer from coordinate singularities. Aiming finally for simulation of edge and scrape-off layer (SOL) turbulence, an isothermal electrostatic drift-reduced Braginskii model (Zeiler et al 1997 Phys. Plasmas 4 2134) has been implemented in GRILLIX. We present the numerical approach, which is based on a toroidally staggered formulation of the FCI, we show verification of the code with the method of manufactured solutions and show a benchmark based on a TORPEX blob experiment, previously performed by several edge/SOL codes (Riva et al 2016 Plasma Phys. Control. Fusion 58 044005). Examples for slab, circular, limiter and diverted geometry are presented. Finally, the results show that the FCI approach in general and GRILLIX in particular are viable approaches in order to tackle simulation of edge/SOL turbulence in diverted geometry.
Implementation of a 3D mixing layer code on parallel computers
NASA Technical Reports Server (NTRS)
Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.
1995-01-01
This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dokhane, A.; Canepa, S.; Ferroukhi, H.
For stability analyses of the Swiss operating Boiling-Water-Reactors (BWRs), the methodology employed and validated so far at the Paul Scherrer Inst. (PSI) was based on the RAMONA-3 code with a hybrid upstream static lattice/core analysis approach using CASMO-4 and PRESTO-2. More recently, steps were undertaken towards a new methodology based on the SIMULATE-3K (S3K) code for the dynamical analyses combined with the CMSYS system relying on the CASMO/SIMULATE-3 suite of codes and which was established at PSI to serve as framework for the development and validation of reference core models of all the Swiss reactors and operated cycles. This papermore » presents a first validation of the new methodology on the basis of a benchmark recently organised by a Swiss utility and including the participation of several international organisations with various codes/methods. Now in parallel, a transition from CASMO-4E (C4E) to CASMO-5M (C5M) as basis for the CMSYS core models was also recently initiated at PSI. Consequently, it was considered adequate to address the impact of this transition both for the steady-state core analyses as well as for the stability calculations and to achieve thereby, an integral approach for the validation of the new S3K methodology. Therefore, a comparative assessment of C4 versus C5M is also presented in this paper with particular emphasis on the void coefficients and their impact on the downstream stability analysis results. (authors)« less
Shift Verification and Validation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pandya, Tara M.; Evans, Thomas M.; Davidson, Gregory G
2016-09-07
This documentation outlines the verification and validation of Shift for the Consortium for Advanced Simulation of Light Water Reactors (CASL). Five main types of problems were used for validation: small criticality benchmark problems; full-core reactor benchmarks for light water reactors; fixed-source coupled neutron-photon dosimetry benchmarks; depletion/burnup benchmarks; and full-core reactor performance benchmarks. We compared Shift results to measured data and other simulated Monte Carlo radiation transport code results, and found very good agreement in a variety of comparison measures. These include prediction of critical eigenvalue, radial and axial pin power distributions, rod worth, leakage spectra, and nuclide inventories over amore » burn cycle. Based on this validation of Shift, we are confident in Shift to provide reference results for CASL benchmarking.« less
Roofline model toolkit: A practical tool for architectural and program analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian
We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
The rotating movement of three immiscible fluids - A benchmark problem
Bakker, M.; Oude, Essink G.H.P.; Langevin, C.D.
2004-01-01
A benchmark problem involving the rotating movement of three immiscible fluids is proposed for verifying the density-dependent flow component of groundwater flow codes. The problem consists of a two-dimensional strip in the vertical plane filled with three fluids of different densities separated by interfaces. Initially, the interfaces between the fluids make a 45??angle with the horizontal. Over time, the fluids rotate to the stable position whereby the interfaces are horizontal; all flow is caused by density differences. Two cases of the problem are presented, one resulting in a symmetric flow field and one resulting in an asymmetric flow field. An exact analytical solution for the initial flow field is presented by application of the vortex theory and complex variables. Numerical results are obtained using three variable-density groundwater flow codes (SWI, MOCDENS3D, and SEAWAT). Initial horizontal velocities of the interfaces, as simulated by the three codes, compare well with the exact solution. The three codes are used to simulate the positions of the interfaces at two times; the three codes produce nearly identical results. The agreement between the results is evidence that the specific rotational behavior predicted by the models is correct. It also shows that the proposed problem may be used to benchmark variable-density codes. It is concluded that the three models can be used to model accurately the movement of interfaces between immiscible fluids, and have little or no numerical dispersion. ?? 2003 Elsevier B.V. All rights reserved.
a Proposed Benchmark Problem for Scatter Calculations in Radiographic Modelling
NASA Astrophysics Data System (ADS)
Jaenisch, G.-R.; Bellon, C.; Schumm, A.; Tabary, J.; Duvauchelle, Ph.
2009-03-01
Code Validation is a permanent concern in computer modelling, and has been addressed repeatedly in eddy current and ultrasonic modeling. A good benchmark problem is sufficiently simple to be taken into account by various codes without strong requirements on geometry representation capabilities, focuses on few or even a single aspect of the problem at hand to facilitate interpretation and to avoid that compound errors compensate themselves, yields a quantitative result and is experimentally accessible. In this paper we attempt to address code validation for one aspect of radiographic modeling, the scattered radiation prediction. Many NDT applications can not neglect scattered radiation, and the scatter calculation thus is important to faithfully simulate the inspection situation. Our benchmark problem covers the wall thickness range of 10 to 50 mm for single wall inspections, with energies ranging from 100 to 500 keV in the first stage, and up to 1 MeV with wall thicknesses up to 70 mm in the extended stage. A simple plate geometry is sufficient for this purpose, and the scatter data is compared on a photon level, without a film model, which allows for comparisons with reference codes like MCNP. We compare results of three Monte Carlo codes (McRay, Sindbad and Moderato) as well as an analytical first order scattering code (VXI), and confront them to results obtained with MCNP. The comparison with an analytical scatter model provides insights into the application domain where this kind of approach can successfully replace Monte-Carlo calculations.
Implementation and performance of parallel Prolog interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, S.; Kale, L.V.; Balkrishna, R.
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Assessment of the MPACT Resonance Data Generation Procedure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Kang Seog; Williams, Mark L.
Currently, heterogeneous models are being used to generate resonance self-shielded cross-section tables as a function of background cross sections for important nuclides such as 235U and 238U by performing the CENTRM (Continuous Energy Transport Model) slowing down calculation with the MOC (Method of Characteristics) spatial discretization and ESSM (Embedded Self-Shielding Method) calculations to obtain background cross sections. And then the resonance self-shielded cross section tables are converted into subgroup data which are to be used in estimating problem-dependent self-shielded cross sections in MPACT (Michigan Parallel Characteristics Transport Code). Although this procedure has been developed and thus resonance data have beenmore » generated and validated by benchmark calculations, assessment has never been performed to review if the resonance data are properly generated by the procedure and utilized in MPACT. This study focuses on assessing the procedure and a proper use in MPACT.« less
Tezaur, I. K.; Perego, M.; Salinger, A. G.; ...
2015-04-27
This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, alongmore » with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.« less
National Combustion Code: Parallel Implementation and Performance
NASA Technical Reports Server (NTRS)
Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.
2000-01-01
The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.
1991-01-01
Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
NASA Astrophysics Data System (ADS)
Nardi, Albert; Idiart, Andrés; Trinchero, Paolo; de Vries, Luis Manuel; Molinero, Jorge
2014-08-01
This paper presents the development, verification and application of an efficient interface, denoted as iCP, which couples two standalone simulation programs: the general purpose Finite Element framework COMSOL Multiphysics® and the geochemical simulator PHREEQC. The main goal of the interface is to maximize the synergies between the aforementioned codes, providing a numerical platform that can efficiently simulate a wide number of multiphysics problems coupled with geochemistry. iCP is written in Java and uses the IPhreeqc C++ dynamic library and the COMSOL Java-API. Given the large computational requirements of the aforementioned coupled models, special emphasis has been placed on numerical robustness and efficiency. To this end, the geochemical reactions are solved in parallel by balancing the computational load over multiple threads. First, a benchmark exercise is used to test the reliability of iCP regarding flow and reactive transport. Then, a large scale thermo-hydro-chemical (THC) problem is solved to show the code capabilities. The results of the verification exercise are successfully compared with those obtained using PHREEQC and the application case demonstrates the scalability of a large scale model, at least up to 32 threads.
Benchmark studies of thermal jet mixing in SFRs using a two-jet model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Omotowa, O. A.; Skifton, R.; Tokuhiro, A.
To guide the modeling, simulations and design of Sodium Fast Reactors (SFRs), we explore and compare the predictive capabilities of two numerical solvers COMSOL and OpenFOAM in the thermal jet mixing of two buoyant jets typical of the outlet flow from a SFR tube bundle. This process will help optimize on-going experimental efforts at obtaining high resolution data for V and V of CFD codes as anticipated in next generation nuclear systems. Using the k-{epsilon} turbulence models of both codes as reference, their ability to simulate the turbulence behavior in similar environments was first validated for single jet experimental datamore » reported in literature. This study investigates the thermal mixing of two parallel jets having a temperature difference (hot-to-cold) {Delta}T{sub hc}= 5 deg. C, 10 deg. C and velocity ratios U{sub c}/U{sub h} = 0.5, 1. Results of the computed turbulent quantities due to convective mixing and the variations in flow field along the axial position are presented. In addition, this study also evaluates the effect of spacing ratio between jets in predicting the flow field and jet behavior in near and far fields. (authors)« less
The PARTRAC code: Status and recent developments
NASA Astrophysics Data System (ADS)
Friedland, Werner; Kundrat, Pavel
Biophysical modeling is of particular value for predictions of radiation effects due to manned space missions. PARTRAC is an established tool for Monte Carlo-based simulations of radiation track structures, damage induction in cellular DNA and its repair [1]. Dedicated modules describe interactions of ionizing particles with the traversed medium, the production and reactions of reactive species, and score DNA damage determined by overlapping track structures with multi-scale chromatin models. The DNA repair module describes the repair of DNA double-strand breaks (DSB) via the non-homologous end-joining pathway; the code explicitly simulates the spatial mobility of individual DNA ends in parallel with their processing by major repair enzymes [2]. To simulate the yields and kinetics of radiation-induced chromosome aberrations, the repair module has been extended by tracking the information on the chromosome origin of ligated fragments as well as the presence of centromeres [3]. PARTRAC calculations have been benchmarked against experimental data on various biological endpoints induced by photon and ion irradiation. The calculated DNA fragment distributions after photon and ion irradiation reproduce corresponding experimental data and their dose- and LET-dependence. However, in particular for high-LET radiation many short DNA fragments are predicted below the detection limits of the measurements, so that the experiments significantly underestimate DSB yields by high-LET radiation [4]. The DNA repair module correctly describes the LET-dependent repair kinetics after (60) Co gamma-rays and different N-ion radiation qualities [2]. First calculations on the induction of chromosome aberrations have overestimated the absolute yields of dicentrics, but correctly reproduced their relative dose-dependence and the difference between gamma- and alpha particle irradiation [3]. Recent developments of the PARTRAC code include a model of hetero- vs euchromatin structures to enable accounting for variations in DNA damage yields, complexity and repair between these regions. Second, the applicability of the code to low-energy ions has been extended to full stopping by using a modified Barkas scaling of proton cross sections for ions heavier than helium. Third, ongoing studies aim at hitherto unprecedented benchmarking of the code against experiments with sub-muµm focused bunches of low-LET ions mimicking single high-LET ion tracks [5] which separate effects of damage clustering on a sub-mum scale from DNA damage complexity on a nanometer scale. Fourth, motivated by implications for the involvement of mitochondria in intercellular signaling and radiation-induced bystander effects, ongoing work extends the range of PARTRAC DNA models to radiation effects on mitochondrial DNA. The contribution will discuss the PARTRAC modules, benchmarks to experimental data, recent and ongoing developments of the code, with special attention to its implications and potential applications in radiation protection and space research. Acknowledgement. This work was partially funded by the EU (Contract FP7-249689 ‘DoReMi’). References 1. Friedland et al., Mutat. Res. 711, 28 (2011) 2. Friedland et al., Int. J. Radiat. Biol. 88, 129 (2012) 3. Friedland et al., Mutat. Res. 756, 213 (2013) 4. Alloni et al., Radiat. Res. 179, 690 (2013) 5. Schmid et al., Phys. Med. Biol. 57, 5889 (2012)
Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F; Harger, Matthew; Torabifard, Hedieh; Cisneros, G Andrés; Schnieders, Michael J; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y; Ponder, Jay W; Piquemal, Jean-Philip
2018-01-28
We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over a single-core computation is observed for the largest systems. The extension of the present CPU implementation of Tinker-HP to other computational platforms is discussed.
NASA Technical Reports Server (NTRS)
Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Global MHD simulation of magnetosphere using HPF
NASA Astrophysics Data System (ADS)
Ogino, T.
We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5% using 56 PEs of Fujitsu VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Parallel community climate model: Description and user`s guide
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drake, J.B.; Flanery, R.E.; Semeraro, B.D.
This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
Radiation Coupling with the FUN3D Unstructured-Grid CFD Code
NASA Technical Reports Server (NTRS)
Wood, William A.
2012-01-01
The HARA radiation code is fully-coupled to the FUN3D unstructured-grid CFD code for the purpose of simulating high-energy hypersonic flows. The radiation energy source terms and surface heat transfer, under the tangent slab approximation, are included within the fluid dynamic ow solver. The Fire II flight test, at the Mach-31 1643-second trajectory point, is used as a demonstration case. Comparisons are made with an existing structured-grid capability, the LAURA/HARA coupling. The radiative surface heat transfer rates from the present approach match the benchmark values within 6%. Although radiation coupling is the focus of the present work, convective surface heat transfer rates are also reported, and are seen to vary depending upon the choice of mesh connectivity and FUN3D ux reconstruction algorithm. On a tetrahedral-element mesh the convective heating matches the benchmark at the stagnation point, but under-predicts by 15% on the Fire II shoulder. Conversely, on a mixed-element mesh the convective heating over-predicts at the stagnation point by 20%, but matches the benchmark away from the stagnation region.
FDNS CFD Code Benchmark for RBCC Ejector Mode Operation
NASA Technical Reports Server (NTRS)
Holt, James B.; Ruf, Joe
1999-01-01
Computational Fluid Dynamics (CFD) analysis results are compared with benchmark quality test data from the Propulsion Engineering Research Center's (PERC) Rocket Based Combined Cycle (RBCC) experiments to verify fluid dynamic code and application procedures. RBCC engine flowpath development will rely on CFD applications to capture the multi-dimensional fluid dynamic interactions and to quantify their effect on the RBCC system performance. Therefore, the accuracy of these CFD codes must be determined through detailed comparisons with test data. The PERC experiments build upon the well-known 1968 rocket-ejector experiments of Odegaard and Stroup by employing advanced optical and laser based diagnostics to evaluate mixing and secondary combustion. The Finite Difference Navier Stokes (FDNS) code was used to model the fluid dynamics of the PERC RBCC ejector mode configuration. Analyses were performed for both Diffusion and Afterburning (DAB) and Simultaneous Mixing and Combustion (SMC) test conditions. Results from both the 2D and the 3D models are presented.
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Frumkin, Michael; Jin, Haoqiang; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
Over the past decade, high performance computing has evolved rapidly; systems based on commodity microprocessors have been introduced in quick succession from at least seven vendors/families. Porting codes to every new architecture is a difficult problem; in particular, here at NASA, there are many large CFD applications that are very costly to port to new machines by hand. The LCM ("Legacy Code Modernization") Project is the development of an integrated parallelization environment (IPE) which performs the automated mapping of legacy CFD (Fortran) applications to state-of-the-art high performance computers. While most projects to port codes focus on the parallelization of the code, we consider porting to be an iterative process consisting of several steps: 1) code cleanup, 2) serial optimization,3) parallelization, 4) performance monitoring and visualization, 5) intelligent tools for automated tuning using performance prediction and 6) machine specific optimization. The approach for building this parallelization environment is to build the components for each of the steps simultaneously and then integrate them together. The demonstration will exhibit our latest research in building this environment: 1. Parallelizing tools and compiler evaluation. 2. Code cleanup and serial optimization using automated scripts 3. Development of a code generator for performance prediction 4. Automated partitioning 5. Automated insertion of directives. These demonstrations will exhibit the effectiveness of an automated approach for all the steps involved with porting and tuning a legacy code application for a new architecture.
Force user's manual: A portable, parallel FORTRAN
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1990-01-01
The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
Tough2{_}MP: A parallel version of TOUGH2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris
2003-04-09
TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
Nadkarni, P M; Miller, P L
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
NASA Astrophysics Data System (ADS)
Solano-Altamirano, J. M.; Hernández-Pérez, Julio M.
2015-11-01
DensToolKit is a suite of cross-platform, optionally parallelized, programs for analyzing the molecular electron density (ρ) and several fields derived from it. Scalar and vector fields, such as the gradient of the electron density (∇ρ), electron localization function (ELF) and its gradient, localized orbital locator (LOL), region of slow electrons (RoSE), reduced density gradient, localized electrons detector (LED), information entropy, molecular electrostatic potential, kinetic energy densities K and G, among others, can be evaluated on zero, one, two, and three dimensional grids. The suite includes a program for searching critical points and bond paths of the electron density, under the framework of Quantum Theory of Atoms in Molecules. DensToolKit also evaluates the momentum space electron density on spatial grids, and the reduced density matrix of order one along lines joining two arbitrary atoms of a molecule. The source code is distributed under the GNU-GPLv3 license, and we release the code with the intent of establishing an open-source collaborative project. The style of DensToolKit's code follows some of the guidelines of an object-oriented program. This allows us to supply the user with a simple manner for easily implement new scalar or vector fields, provided they are derived from any of the fields already implemented in the code. In this paper, we present some of the most salient features of the programs contained in the suite, some examples of how to run them, and the mathematical definitions of the implemented fields along with hints of how we optimized their evaluation. We benchmarked our suite against both a freely-available program and a commercial package. Speed-ups of ˜2×, and up to 12× were obtained using a non-parallel compilation of DensToolKit for the evaluation of fields. DensToolKit takes similar times for finding critical points, compared to a commercial package. Finally, we present some perspectives for the future development and growth of the suite.
Multitasking TORT under UNICOS: Parallel performance models and measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barnett, A.; Azmy, Y.Y.
1999-09-27
The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.
Haliasos, N; Rezajooi, K; O'neill, K S; Van Dellen, J; Hudovsky, Anita; Nouraei, Sar
2010-04-01
Clinical coding is the translation of documented clinical activities during an admission to a codified language. Healthcare Resource Groupings (HRGs) are derived from coding data and are used to calculate payment to hospitals in England, Wales and Scotland and to conduct national audit and benchmarking exercises. Coding is an error-prone process and an understanding of its accuracy within neurosurgery is critical for financial, organizational and clinical governance purposes. We undertook a multidisciplinary audit of neurosurgical clinical coding accuracy. Neurosurgeons trained in coding assessed the accuracy of 386 patient episodes. Where clinicians felt a coding error was present, the case was discussed with an experienced clinical coder. Concordance between the initial coder-only clinical coding and the final clinician-coder multidisciplinary coding was assessed. At least one coding error occurred in 71/386 patients (18.4%). There were 36 diagnosis and 93 procedure errors and in 40 cases, the initial HRG changed (10.4%). Financially, this translated to pound111 revenue-loss per patient episode and projected to pound171,452 of annual loss to the department. 85% of all coding errors were due to accumulation of coding changes that occurred only once in the whole data set. Neurosurgical clinical coding is error-prone. This is financially disadvantageous and with the coding data being the source of comparisons within and between departments, coding inaccuracies paint a distorted picture of departmental activity and subspecialism in audit and benchmarking. Clinical engagement improves accuracy and is encouraged within a clinical governance framework.
Efficiently modeling neural networks on massively parallel computers
NASA Technical Reports Server (NTRS)
Farber, Robert M.
1993-01-01
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2014-10-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Tiao, J; Moore, L; Porgo, T V; Belcaid, A
2016-06-01
To assess whether the definition of an IHF used as an exclusion criterion influences the results of trauma center benchmarking. We conducted a multicenter retrospective cohort study with data from an integrated Canadian trauma system. The study population included all patients admitted between 1999 and 2010 to any of the 57 adult trauma centers. Seven definitions of IHF based on diagnostic codes, age, mechanism of injury, and secondary injuries, identified in a systematic review, were used. Trauma centers were benchmarked using risk-adjusted mortality estimates generated using the Trauma Risk Adjustment Model. The agreement between benchmarking results generated under different IHF definitions was evaluated with correlation coefficients on adjusted mortality estimates. Correlation coefficients >0.95 were considered to convey acceptable agreement. The study population consisted of 172,872 patients before exclusion of IHF and between 128,094 and 139,588 patients after exclusion. Correlation coefficients between risk-adjusted mortality estimates generated in populations including and excluding IHF varied between 0.86 and 0.90. Correlation coefficients of estimates generated under different definitions of IHF varied between 0.97 and 0.99, even when analyses were restricted to patients aged ≥65 years. Although the exclusion of patients with IHF has an influence on the results of trauma center benchmarking based on mortality, the definition of IHF in terms of diagnostic codes, age, mechanism of injury and secondary injury has no significant impact on benchmarking results. Results suggest that there is no need to obtain formal consensus on the definition of IHF for benchmarking activities.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Scale/TSUNAMI Sensitivity Data for ICSBEP Evaluations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rearden, Bradley T; Reed, Davis Allan; Lefebvre, Robert A
2011-01-01
The Tools for Sensitivity and Uncertainty Analysis Methodology Implementation (TSUNAMI) software developed at Oak Ridge National Laboratory (ORNL) as part of the Scale code system provide unique methods for code validation, gap analysis, and experiment design. For TSUNAMI analysis, sensitivity data are generated for each application and each existing or proposed experiment used in the assessment. The validation of diverse sets of applications requires potentially thousands of data files to be maintained and organized by the user, and a growing number of these files are available through the International Handbook of Evaluated Criticality Safety Benchmark Experiments (IHECSBE) distributed through themore » International Criticality Safety Benchmark Evaluation Program (ICSBEP). To facilitate the use of the IHECSBE benchmarks in rigorous TSUNAMI validation and gap analysis techniques, ORNL generated SCALE/TSUNAMI sensitivity data files (SDFs) for several hundred benchmarks for distribution with the IHECSBE. For the 2010 edition of IHECSBE, the sensitivity data were generated using 238-group cross-section data based on ENDF/B-VII.0 for 494 benchmark experiments. Additionally, ORNL has developed a quality assurance procedure to guide the generation of Scale inputs and sensitivity data, as well as a graphical user interface to facilitate the use of sensitivity data in identifying experiments and applying them in validation studies.« less
An integrity measure to benchmark quantum error correcting memories
NASA Astrophysics Data System (ADS)
Xu, Xiaosi; de Beaudrap, Niel; O'Gorman, Joe; Benjamin, Simon C.
2018-02-01
Rapidly developing experiments across multiple platforms now aim to realise small quantum codes, and so demonstrate a memory within which a logical qubit can be protected from noise. There is a need to benchmark the achievements in these diverse systems, and to compare the inherent power of the codes they rely upon. We describe a recently introduced performance measure called integrity, which relates to the probability that an ideal agent will successfully ‘guess’ the state of a logical qubit after a period of storage in the memory. Integrity is straightforward to evaluate experimentally without state tomography and it can be related to various established metrics such as the logical fidelity and the pseudo-threshold. We offer a set of experimental milestones that are steps towards demonstrating unconditionally superior encoded memories. Using intensive numerical simulations we compare memories based on the five-qubit code, the seven-qubit Steane code, and a nine-qubit code which is the smallest instance of a surface code; we assess both the simple and fault-tolerant implementations of each. While the ‘best’ code upon which to base a memory does vary according to the nature and severity of the noise, nevertheless certain trends emerge.
NASA Technical Reports Server (NTRS)
Logan, Terry G.
1994-01-01
The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.
Parallel Scaling Characteristics of Selected NERSC User ProjectCodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skinner, David; Verdier, Francesca; Anand, Harsh
This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems.more » An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji
A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
GPU-Accelerated Stony-Brook University 5-class Microphysics Scheme in WRF
NASA Astrophysics Data System (ADS)
Mielikainen, J.; Huang, B.; Huang, A.
2011-12-01
The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. Microphysics plays an important role in weather and climate prediction. Several bulk water microphysics schemes are available within the WRF, with different numbers of simulated hydrometeor classes and methods for estimating their size fall speeds, distributions and densities. Stony-Brook University scheme (SBU-YLIN) is a 5-class scheme with riming intensity predicted to account for mixed-phase processes. In the past few years, co-processing on Graphics Processing Units (GPUs) has been a disruptive technology in High Performance Computing (HPC). GPUs use the ever increasing transistor count for adding more processor cores. Therefore, GPUs are well suited for massively data parallel processing with high floating point arithmetic intensity. Thus, it is imperative to update legacy scientific applications to take advantage of this unprecedented increase in computing power. CUDA is an extension to the C programming language offering programming GPU's directly. It is designed so that its constructs allow for natural expression of data-level parallelism. A CUDA program is organized into two parts: a serial program running on the CPU and a CUDA kernel running on the GPU. The CUDA code consists of three computational phases: transmission of data into the global memory of the GPU, execution of the CUDA kernel, and transmission of results from the GPU into the memory of CPU. CUDA takes a bottom-up point of view of parallelism is which thread is an atomic unit of parallelism. Individual threads are part of groups called warps, within which every thread executes exactly the same sequence of instructions. To test SBU-YLIN, we used a CONtinental United States (CONUS) benchmark data set for 12 km resolution domain for October 24, 2001. A WRF domain is a geographic region of interest discretized into a 2-dimensional grid parallel to the ground. Each grid point has multiple levels, which correspond to various vertical heights in the atmosphere. The size of the CONUS 12 km domain is 433 x 308 horizontal grid points with 35 vertical levels. First, the entire SBU-YLIN Fortran code was rewritten in C in preparation of GPU accelerated version. After that, C code was verified against Fortran code for identical outputs. Default compiler options from WRF were used for gfortran and gcc compilers. The processing time for the original Fortran code is 12274 ms and 12893 ms for C version. The processing times for GPU implementation of SBU-YLIN microphysics scheme with I/O are 57.7 ms and 37.2 ms for 1 and 2 GPUs, respectively. The corresponding speedups are 213x and 330x compared to a Fortran implementation. Without I/O the speedup is 896x on 1 GPU. Obviously, ignoring I/O time speedup scales linearly with GPUs. Thus, 2 GPUs have a speedup of 1788x without I/O. Microphysics computation is just a small part of the whole WRF model. After having completely implemented WRF on GPU, the inputs for SBU-YLIN do not have to be transferred from CPU. Instead they are results of previous WRF modules. Therefore, the role of I/O is greatly diminished once all of WRF have been converted to run on GPUs. In the near future, we expect to have a WRF running completely on GPUs for a superior performance.
HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, Mark; Brown, Jed; Shalf, John
2014-05-05
This document provides an overview of the benchmark ? HPGMG ? for ranking large scale general purpose computers for use on the Top500 list [8]. We provide a rationale for the need for a replacement for the current metric HPL, some background of the Top500 list and the challenges of developing such a metric; we discuss our design philosophy and methodology, and an overview of the specification of the benchmark. The primary documentation with maintained details on the specification can be found at hpgmg.org and the Wiki and benchmark code itself can be found in the repository https://bitbucket.org/hpgmg/hpgmg.
A conservative approach to parallelizing the Sharks World simulation
NASA Technical Reports Server (NTRS)
Nicol, David M.; Riffe, Scott E.
1990-01-01
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Data Parallel Line Relaxation (DPLR) Code User Manual: Acadia - Version 4.01.1
NASA Technical Reports Server (NTRS)
Wright, Michael J.; White, Todd; Mangini, Nancy
2009-01-01
Data-Parallel Line Relaxation (DPLR) code is a computational fluid dynamic (CFD) solver that was developed at NASA Ames Research Center to help mission support teams generate high-value predictive solutions for hypersonic flow field problems. The DPLR Code Package is an MPI-based, parallel, full three-dimensional Navier-Stokes CFD solver with generalized models for finite-rate reaction kinetics, thermal and chemical non-equilibrium, accurate high-temperature transport coefficients, and ionized flow physics incorporated into the code. DPLR also includes a large selection of generalized realistic surface boundary conditions and links to enable loose coupling with external thermal protection system (TPS) material response and shock layer radiation codes.
Constant time worker thread allocation via configuration caching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichenberger, Alexandre E; O'Brien, John K. P.
Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.
Accelerating next generation sequencing data analysis with system level optimizations.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
2017-08-22
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
Grid Computing Environment using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Alanis, Fransisco; Mahmood, Akhtar
2003-10-01
Custom-made Beowulf clusters using PCs are currently replacing expensive supercomputers to carry out complex scientific computations. At the University of Texas - Pan American, we built a 8 Gflops Beowulf Cluster for doing HEP research using RedHat Linux 7.3 and the LAM-MPI middleware. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes that were compiled in C on the cluster using the LAM-XMPI graphics user environment. We will demonstrate a "simple" prototype grid environment, where we will submit and run parallel jobs remotely across multiple cluster nodes over the internet from the presentation room at Texas Tech. University. The Sphinx Beowulf Cluster will be used for monte-carlo grid test-bed studies for the LHC-ATLAS high energy physics experiment. Grid is a new IT concept for the next generation of the "Super Internet" for high-performance computing. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
New Parallel computing framework for radiation transport codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.
A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Soft-output decoding algorithms in iterative decoding of turbo codes
NASA Technical Reports Server (NTRS)
Benedetto, S.; Montorsi, G.; Divsalar, D.; Pollara, F.
1996-01-01
In this article, we present two versions of a simplified maximum a posteriori decoding algorithm. The algorithms work in a sliding window form, like the Viterbi algorithm, and can thus be used to decode continuously transmitted sequences obtained by parallel concatenated codes, without requiring code trellis termination. A heuristic explanation is also given of how to embed the maximum a posteriori algorithms into the iterative decoding of parallel concatenated codes (turbo codes). The performances of the two algorithms are compared on the basis of a powerful rate 1/3 parallel concatenated code. Basic circuits to implement the simplified a posteriori decoding algorithm using lookup tables, and two further approximations (linear and threshold), with a very small penalty, to eliminate the need for lookup tables are proposed.
Parallel tempering for the traveling salesman problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Percus, Allon; Wang, Richard; Hyman, Jeffrey
We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less
Nadkarni, P. M.; Miller, P. L.
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Effect of strong elastic contrasts on the propagation of seismic wave in hard-rock environments
NASA Astrophysics Data System (ADS)
Saleh, R.; Zheng, L.; Liu, Q.; Milkereit, B.
2013-12-01
Understanding the propagation of seismic waves in a presence of strong elastic contrasts, such as topography, tunnels and ore-bodies is still a challenge. Safety in mining is a major concern and seismic monitoring is the main tool here. For engineering purposes, amplitudes (peak particle velocity/acceleration) and travel times of seismic events (mostly blasts or microseismic events) are critical parameters that have to be determined at various locations in a mine. These parameters are useful in preparing risk maps or to better understand the process of spatial and temporal stress distributions in a mine. Simple constant velocity models used for monitoring studies in mining, cannot explain the observed complexities in scattered seismic waves. In hard-rock environments modeling of elastic seismic wavefield require detailed 3D petrophysical, infrastructure and topographical data to simulate the propagation of seismic wave with a frequencies up to few kilohertz. With the development of efficient numerical techniques, and parallel computation facilities, a solution for such a problem is achievable. In this study, the effects of strong elastic contrasts such as ore-bodies, rough topography and tunnels will be illustrated using 3D modeling method. The main tools here are finite difference code (SOFI3D)[1] that has been benchmarked for engineering studies, and spectral element code (SPECFEM) [2], which was, developed for global seismology problems. The modeling results show locally enhanced peak particle velocity due to presence of strong elastic contrast and topography in models. [1] Bohlen, T. Parallel 3-D viscoelastic finite difference seismic modeling. Computers & Geosciences 28 (2002) 887-899 [2] Komatitsch, D., and J. Tromp, Introduction to the spectral-element method for 3-D seismic wave propagation, Geophys. J. Int., 139, 806-822, 1999.
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Sarukkai, Sekhar R.; Mehra, Pankaj; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
This paper presents a methodology for debugging the performance of message-passing programs on both tightly coupled and loosely coupled distributed-memory machines. The AIMS (Automated Instrumentation and Monitoring System) toolkit, a suite of software tools for measurement and analysis of performance, is introduced and its application illustrated using several benchmark programs drawn from the field of computational fluid dynamics. AIMS includes (i) Xinstrument, a powerful source-code instrumentor, which supports both Fortran77 and C as well as a number of different message-passing libraries including Intel's NX Thinking Machines' CMMD, and PVM; (ii) Monitor, a library of timestamping and trace -collection routines that run on supercomputers (such as Intel's iPSC/860, Delta, and Paragon and Thinking Machines' CM5) as well as on networks of workstations (including Convex Cluster and SparcStations connected by a LAN); (iii) Visualization Kernel, a trace-animation facility that supports source-code clickback, simultaneous visualization of computation and communication patterns, as well as analysis of data movements; (iv) Statistics Kernel, an advanced profiling facility, that associates a variety of performance data with various syntactic components of a parallel program; (v) Index Kernel, a diagnostic tool that helps pinpoint performance bottlenecks through the use of abstract indices; (vi) Modeling Kernel, a facility for automated modeling of message-passing programs that supports both simulation -based and analytical approaches to performance prediction and scalability analysis; (vii) Intrusion Compensator, a utility for recovering true performance from observed performance by removing the overheads of monitoring and their effects on the communication pattern of the program; and (viii) Compatibility Tools, that convert AIMS-generated traces into formats used by other performance-visualization tools, such as ParaGraph, Pablo, and certain AVS/Explorer modules.
Analysis of the influence of the heat transfer phenomena on the late phase of the ThAI Iod-12 test
NASA Astrophysics Data System (ADS)
Gonfiotti, B.; Paci, S.
2014-11-01
Iodine is one of the major contributors to the source term during a severe accident in a Nuclear Power Plant for its volatility and high radiological consequences. Therefore, large efforts have been made to describe the Iodine behaviour during an accident, especially in the containment system. Due to the lack of experimental data, in the last years many attempts were carried out to fill the gaps on the knowledge of Iodine behaviour. In this framework, two tests (ThAI Iod-11 and Iod-12) were carried out inside a multi-compartment steel vessel. A quite complex transient characterizes these two tests; therefore they are also suitable for thermal- hydraulic benchmarks. The two tests were originally released for a benchmark exercise during the SARNET2 EU Project. At the end of this benchmark a report covering the main findings was issued, stating that the common codes employed in SA studies were able to simulate the tests but with large discrepancies. The present work is then related to the application of the new versions of ASTEC and MELCOR codes with the aim of carry out a new code-to-code comparison vs. ThAI Iod-12 experimental data, focusing on the influence of the heat exchanges with the outer environment, which seems to be one of the most challenging issues to cope with.
A CPU benchmark for protein crystallographic refinement.
Bourne, P E; Hendrickson, W A
1990-01-01
The CPU time required to complete a cycle of restrained least-squares refinement of a protein structure from X-ray crystallographic data using the FORTRAN codes PROTIN and PROLSQ are reported for 48 different processors, ranging from single-user workstations to supercomputers. Sequential, vector, VLIW, multiprocessor, and RISC hardware architectures are compared using both a small and a large protein structure. Representative compile times for each hardware type are also given, and the improvement in run-time when coding for a specific hardware architecture considered. The benchmarks involve scalar integer and vector floating point arithmetic and are representative of the calculations performed in many scientific disciplines.
TRACE/PARCS analysis of the OECD/NEA Oskarshamn-2 BWR stability benchmark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kozlowski, T.; Downar, T.; Xu, Y.
2012-07-01
On February 25, 1999, the Oskarshamn-2 NPP experienced a stability event which culminated in diverging power oscillations with a decay ratio of about 1.4. The event was successfully modeled by the TRACE/PARCS coupled code system, and further analysis of the event is described in this paper. The results show very good agreement with the plant data, capturing the entire behavior of the transient including the onset of instability, growth of the oscillations (decay ratio) and oscillation frequency. This provides confidence in the prediction of other parameters which are not available from the plant records. The event provides coupled code validationmore » for a challenging BWR stability event, which involves the accurate simulation of neutron kinetics (NK), thermal-hydraulics (TH), and TH/NK. coupling. The success of this work has demonstrated the ability of the 3-D coupled systems code TRACE/PARCS to capture the complex behavior of BWR stability events. The problem was released as an international OECD/NEA benchmark, and it is the first benchmark based on measured plant data for a stability event with a DR greater than one. Interested participants are invited to contact authors for more information. (authors)« less
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Capabilities of Fully Parallelized MHD Stability Code MARS
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2016-10-01
Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2015-11-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
2015-06-01
cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, B.C.J.; Sha, W.T.; Doria, M.L.
1980-11-01
The governing equations, i.e., conservation equations for mass, momentum, and energy, are solved as a boundary-value problem in space and an initial-value problem in time. BODYFIT-1FE code uses the technique of boundary-fitted coordinate systems where all the physical boundaries are transformed to be coincident with constant coordinate lines in the transformed space. By using this technique, one can prescribe boundary conditions accurately without interpolation. The transformed governing equations in terms of the boundary-fitted coordinates are then solved by using implicit cell-by-cell procedure with a choice of either central or upwind convective derivatives. It is a true benchmark rod-bundle code withoutmore » invoking any assumptions in the case of laminar flow. However, for turbulent flow, some empiricism must be employed due to the closure problem of turbulence modeling. The detailed velocity and temperature distributions calculated from the code can be used to benchmark and calibrate empirical coefficients employed in subchannel codes and porous-medium analyses.« less
NASA Technical Reports Server (NTRS)
Agrawal, Gagan; Sussman, Alan; Saltz, Joel
1993-01-01
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
The implementation of an aeronautical CFD flow code onto distributed memory parallel systems
NASA Astrophysics Data System (ADS)
Ierotheou, C. S.; Forsey, C. R.; Leatham, M.
2000-04-01
The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright
Using SPARK as a Solver for Modelica
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wetter, Michael; Wetter, Michael; Haves, Philip
Modelica is an object-oriented acausal modeling language that is well positioned to become a de-facto standard for expressing models of complex physical systems. To simulate a model expressed in Modelica, it needs to be translated into executable code. For generating run-time efficient code, such a translation needs to employ algebraic formula manipulations. As the SPARK solver has been shown to be competitive for generating such code but currently cannot be used with the Modelica language, we report in this paper how SPARK's symbolic and numerical algorithms can be implemented in OpenModelica, an open-source implementation of a Modelica modeling and simulationmore » environment. We also report benchmark results that show that for our air flow network simulation benchmark, the SPARK solver is competitive with Dymola, which is believed to provide the best solver for Modelica.« less
The grout/glass performance assessment code system (GPACS) with verification and benchmarking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piepho, M.G.; Sutherland, W.H.; Rittmann, P.D.
1994-12-01
GPACS is a computer code system for calculating water flow (unsaturated or saturated), solute transport, and human doses due to the slow release of contaminants from a waste form (in particular grout or glass) through an engineered system and through a vadose zone to an aquifer, well and river. This dual-purpose document is intended to serve as a user`s guide and verification/benchmark document for the Grout/Glass Performance Assessment Code system (GPACS). GPACS can be used for low-level-waste (LLW) Glass Performance Assessment and many other applications including other low-level-waste performance assessments and risk assessments. Based on all the cses presented, GPACSmore » is adequate (verified) for calculating water flow and contaminant transport in unsaturated-zone sediments and for calculating human doses via the groundwater pathway.« less
NASA Technical Reports Server (NTRS)
Padovan, J.; Adams, M.; Lam, P.; Fertis, D.; Zeid, I.
1982-01-01
Second-year efforts within a three-year study to develop and extend finite element (FE) methodology to efficiently handle the transient/steady state response of rotor-bearing-stator structure associated with gas turbine engines are outlined. The two main areas aim at (1) implanting the squeeze film damper element into a general purpose FE code for testing and evaluation; and (2) determining the numerical characteristics of the FE-generated rotor-bearing-stator simulation scheme. The governing FE field equations are set out and the solution methodology is presented. The choice of ADINA as the general-purpose FE code is explained, and the numerical operational characteristics of the direct integration approach of FE-generated rotor-bearing-stator simulations is determined, including benchmarking, comparison of explicit vs. implicit methodologies of direct integration, and demonstration problems.
CHIC - Coupling Habitability, Interior and Crust
NASA Astrophysics Data System (ADS)
Noack, Lena; Labbe, Francois; Boiveau, Thomas; Rivoldini, Attilio; Van Hoolst, Tim
2014-05-01
We present a new code developed for simulating convection in terrestrial planets and icy moons. The code CHIC is written in Fortran and employs the finite volume method and finite difference method for solving energy, mass and momentum equations in either silicate or icy mantles. The code uses either Cartesian (2D and 3D box) or spherical coordinates (2D cylinder or annulus). It furthermore contains a 1D parametrised model to obtain temperature profiles in specific regions, for example in the iron core or in the silicate mantle (solving only the energy equation). The 2D/3D convection model uses the same input parameters as the 1D model, which allows for comparison of the different models and adaptation of the 1D model, if needed. The code has already been benchmarked for the following aspects: - viscosity-dependent rheology (Blankenbach et al., 1989) - pseudo-plastic deformation (Tosi et al., in preparation phase) - subduction mechanism and plastic deformation (Quinquis et al., in preparation phase) New features that are currently developed and benchmarked include: - compressibility (following King et al., 2009 and Leng and Zhong, 2008) - different melt modules (Plesa et al., in preparation phase) - freezing of an inner core (comparison with GAIA code, Huettig and Stemmer, 2008) - build-up of oceanic and continental crust (Noack et al., in preparation phase) The code represents a useful tool to couple the interior with the surface of a planet (e.g. via build-up and erosion of crust) and it's atmosphere (via outgassing on the one hand and subduction of hydrated crust and carbonates back into the mantle). It will be applied to investigate several factors that might influence the habitability of a terrestrial planet, and will also be used to simulate icy bodies with high-pressure ice phases. References: Blankenbach et al. (1989). A benchmark comparison for mantle convection codes. GJI 98, 23-38. Huettig and Stemmer (2008). Finite volume discretization for dynamic viscosities on Voronoi grids. PEPI 171(1-4), 137-146. King et al. (2009). A Community Benchmark for 2D Cartesian Compressible Convection in the Earth's Mantle. GJI 179, 1-11. Leng and Zhong (2008). Viscous heating, adiabatic heating and energetic consistency in compressible mantle convection. GJI 173, 693-702.
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws
NASA Technical Reports Server (NTRS)
Cooke, Daniel; Rushton, Nelson
2013-01-01
With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
NASA Technical Reports Server (NTRS)
Orifici, Adrian C.; Krueger, Ronald
2010-01-01
With capabilities for simulating delamination growth in composite materials becoming available, the need for benchmarking and assessing these capabilities is critical. In this study, benchmark analyses were performed to assess the delamination propagation simulation capabilities of the VCCT implementations in Marc TM and MD NastranTM. Benchmark delamination growth results for Double Cantilever Beam, Single Leg Bending and End Notched Flexure specimens were generated using a numerical approach. This numerical approach was developed previously, and involves comparing results from a series of analyses at different delamination lengths to a single analysis with automatic crack propagation. Specimens were analyzed with three-dimensional and two-dimensional models, and compared with previous analyses using Abaqus . The results demonstrated that the VCCT implementation in Marc TM and MD Nastran(TradeMark) was capable of accurately replicating the benchmark delamination growth results and that the use of the numerical benchmarks offers advantages over benchmarking using experimental and analytical results.
FDNS CFD Code Benchmark for RBCC Ejector Mode Operation: Continuing Toward Dual Rocket Effects
NASA Technical Reports Server (NTRS)
West, Jeff; Ruf, Joseph H.; Turner, James E. (Technical Monitor)
2000-01-01
Computational Fluid Dynamics (CFD) analysis results are compared with benchmark quality test data from the Propulsion Engineering Research Center's (PERC) Rocket Based Combined Cycle (RBCC) experiments to verify fluid dynamic code and application procedures. RBCC engine flowpath development will rely on CFD applications to capture the multi -dimensional fluid dynamic interactions and to quantify their effect on the RBCC system performance. Therefore, the accuracy of these CFD codes must be determined through detailed comparisons with test data. The PERC experiments build upon the well-known 1968 rocket-ejector experiments of Odegaard and Stroup by employing advanced optical and laser based diagnostics to evaluate mixing and secondary combustion. The Finite Difference Navier Stokes (FDNS) code [2] was used to model the fluid dynamics of the PERC RBCC ejector mode configuration. Analyses were performed for the Diffusion and Afterburning (DAB) test conditions at the 200-psia thruster operation point, Results with and without downstream fuel injection are presented.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earth Sciences Division; Zhang, Keni; Zhang, Keni
TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
Hybrid discrete ordinates and characteristics method for solving the linear Boltzmann equation
NASA Astrophysics Data System (ADS)
Yi, Ce
With the ability of computer hardware and software increasing rapidly, deterministic methods to solve the linear Boltzmann equation (LBE) have attracted some attention for computational applications in both the nuclear engineering and medical physics fields. Among various deterministic methods, the discrete ordinates method (SN) and the method of characteristics (MOC) are two of the most widely used methods. The SN method is the traditional approach to solve the LBE for its stability and efficiency. While the MOC has some advantages in treating complicated geometries. However, in 3-D problems requiring a dense discretization grid in phase space (i.e., a large number of spatial meshes, directions, or energy groups), both methods could suffer from the need for large amounts of memory and computation time. In our study, we developed a new hybrid algorithm by combing the two methods into one code, TITAN. The hybrid approach is specifically designed for application to problems containing low scattering regions. A new serial 3-D time-independent transport code has been developed. Under the hybrid approach, the preferred method can be applied in different regions (blocks) within the same problem model. Since the characteristics method is numerically more efficient in low scattering media, the hybrid approach uses a block-oriented characteristics solver in low scattering regions, and a block-oriented SN solver in the remainder of the physical model. In the TITAN code, a physical problem model is divided into a number of coarse meshes (blocks) in Cartesian geometry. Either the characteristics solver or the SN solver can be chosen to solve the LBE within a coarse mesh. A coarse mesh can be filled with fine meshes or characteristic rays depending on the solver assigned to the coarse mesh. Furthermore, with its object-oriented programming paradigm and layered code structure, TITAN allows different individual spatial meshing schemes and angular quadrature sets for each coarse mesh. Two quadrature types (level-symmetric and Legendre-Chebyshev quadrature) along with the ordinate splitting techniques (rectangular splitting and PN-TN splitting) are implemented. In the S N solver, we apply a memory-efficient 'front-line' style paradigm to handle the fine mesh interface fluxes. In the characteristics solver, we have developed a novel 'backward' ray-tracing approach, in which a bi-linear interpolation procedure is used on the incoming boundaries of a coarse mesh. A CPU-efficient scattering kernel is shared in both solvers within the source iteration scheme. Angular and spatial projection techniques are developed to transfer the angular fluxes on the interfaces of coarse meshes with different discretization grids. The performance of the hybrid algorithm is tested in a number of benchmark problems in both nuclear engineering and medical physics fields. Among them are the Kobayashi benchmark problems and a computational tomography (CT) device model. We also developed an extra sweep procedure with the fictitious quadrature technique to calculate angular fluxes along directions of interest. The technique is applied in a single photon emission computed tomography (SPECT) phantom model to simulate the SPECT projection images. The accuracy and efficiency of the TITAN code are demonstrated in these benchmarks along with its scalability. A modified version of the characteristics solver is integrated in the PENTRAN code and tested within the parallel engine of PENTRAN. The limitations on the hybrid algorithm are also studied.
A Data Parallel Multizone Navier-Stokes Code
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)
1995-01-01
We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.
NASA Astrophysics Data System (ADS)
Velioǧlu, Deniz; Cevdet Yalçıner, Ahmet; Zaytsev, Andrey
2016-04-01
Tsunamis are huge waves with long wave periods and wave lengths that can cause great devastation and loss of life when they strike a coast. The interest in experimental and numerical modeling of tsunami propagation and inundation increased considerably after the 2011 Great East Japan earthquake. In this study, two numerical codes, FLOW 3D and NAMI DANCE, that analyze tsunami propagation and inundation patterns are considered. Flow 3D simulates linear and nonlinear propagating surface waves as well as long waves by solving three-dimensional Navier-Stokes (3D-NS) equations. NAMI DANCE uses finite difference computational method to solve 2D depth-averaged linear and nonlinear forms of shallow water equations (NSWE) in long wave problems, specifically tsunamis. In order to validate these two codes and analyze the differences between 3D-NS and 2D depth-averaged NSWE equations, two benchmark problems are applied. One benchmark problem investigates the runup of long waves over a complex 3D beach. The experimental setup is a 1:400 scale model of Monai Valley located on the west coast of Okushiri Island, Japan. Other benchmark problem is discussed in 2015 National Tsunami Hazard Mitigation Program (NTHMP) Annual meeting in Portland, USA. It is a field dataset, recording the Japan 2011 tsunami in Hilo Harbor, Hawaii. The computed water surface elevation and velocity data are compared with the measured data. The comparisons showed that both codes are in fairly good agreement with each other and benchmark data. The differences between 3D-NS and 2D depth-averaged NSWE equations are highlighted. All results are presented with discussions and comparisons. Acknowledgements: Partial support by Japan-Turkey Joint Research Project by JICA on earthquakes and tsunamis in Marmara Region (JICA SATREPS - MarDiM Project), 603839 ASTARTE Project of EU, UDAP-C-12-14 project of AFAD Turkey, 108Y227, 113M556 and 213M534 projects of TUBITAK Turkey, RAPSODI (CONCERT_Dis-021) of CONCERT-Japan Joint Call and Istanbul Metropolitan Municipality are all acknowledged.
NASA Technical Reports Server (NTRS)
Weed, Richard Allen; Sankar, L. N.
1994-01-01
An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
THE PLUTO CODE FOR ADAPTIVE MESH COMPUTATIONS IN ASTROPHYSICAL FLUID DYNAMICS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mignone, A.; Tzeferacos, P.; Zanni, C.
We present a description of the adaptive mesh refinement (AMR) implementation of the PLUTO code for solving the equations of classical and special relativistic magnetohydrodynamics (MHD and RMHD). The current release exploits, in addition to the static grid version of the code, the distributed infrastructure of the CHOMBO library for multidimensional parallel computations over block-structured, adaptively refined grids. We employ a conservative finite-volume approach where primary flow quantities are discretized at the cell center in a dimensionally unsplit fashion using the Corner Transport Upwind method. Time stepping relies on a characteristic tracing step where piecewise parabolic method, weighted essentially non-oscillatory,more » or slope-limited linear interpolation schemes can be handily adopted. A characteristic decomposition-free version of the scheme is also illustrated. The solenoidal condition of the magnetic field is enforced by augmenting the equations with a generalized Lagrange multiplier providing propagation and damping of divergence errors through a mixed hyperbolic/parabolic explicit cleaning step. Among the novel features, we describe an extension of the scheme to include non-ideal dissipative processes, such as viscosity, resistivity, and anisotropic thermal conduction without operator splitting. Finally, we illustrate an efficient treatment of point-local, potentially stiff source terms over hierarchical nested grids by taking advantage of the adaptivity in time. Several multidimensional benchmarks and applications to problems of astrophysical relevance assess the potentiality of the AMR version of PLUTO in resolving flow features separated by large spatial and temporal disparities.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sublet, J.-Ch.; Koning, A.J.; Forrest, R.A.
The reasons for the conversion of the European Activation File, EAF into ENDF-6 format are threefold. First, it significantly enhances the JEFF-3.0 release by the addition of an activation file. Second, to considerably increase its usage by using a recognized, official file format, allowing existing plug-in processes to be effective; and third, to move towards a universal nuclear data file in contrast to the current separate general and special-purpose files. The format chosen for the JEFF-3.0/A file uses reaction cross sections (MF-3), cross sections (MF-10), and multiplicities (MF-9). Having the data in ENDF-6 format allows the ENDF suite of utilitiesmore » and checker codes to be used alongside many other utility, visualizing, and processing codes. It is based on the EAF activation file used for many applications from fission to fusion, including dosimetry, inventories, depletion-transmutation, and geophysics. JEFF-3.0/A takes advantage of four generations of EAF files. Extensive benchmarking activities on these files provide feedback and validation with integral measurements. These, in parallel with a detailed graphical analysis based on EXFOR, have been applied stimulating new measurements, significantly increasing the quality of this activation file. The next step is to include the EAF uncertainty data for all channels into JEFF-3.0/A.« less
Performance of a parallel code for the Euler equations on hypercube computers
NASA Technical Reports Server (NTRS)
Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.
1990-01-01
The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System
NASA Astrophysics Data System (ADS)
Wang, R.; Harris, C.; Wicenec, A.
2016-07-01
In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
A proposed benchmark problem for cargo nuclear threat monitoring
NASA Astrophysics Data System (ADS)
Wesley Holmes, Thomas; Calderon, Adan; Peeples, Cody R.; Gardner, Robin P.
2011-10-01
There is currently a great deal of technical and political effort focused on reducing the risk of potential attacks on the United States involving radiological dispersal devices or nuclear weapons. This paper proposes a benchmark problem for gamma-ray and X-ray cargo monitoring with results calculated using MCNP5, v1.51. The primary goal is to provide a benchmark problem that will allow researchers in this area to evaluate Monte Carlo models for both speed and accuracy in both forward and inverse calculational codes and approaches for nuclear security applications. A previous benchmark problem was developed by one of the authors (RPG) for two similar oil well logging problems (Gardner and Verghese, 1991, [1]). One of those benchmarks has recently been used by at least two researchers in the nuclear threat area to evaluate the speed and accuracy of Monte Carlo codes combined with variance reduction techniques. This apparent need has prompted us to design this benchmark problem specifically for the nuclear threat researcher. This benchmark consists of conceptual design and preliminary calculational results using gamma-ray interactions on a system containing three thicknesses of three different shielding materials. A point source is placed inside the three materials lead, aluminum, and plywood. The first two materials are in right circular cylindrical form while the third is a cube. The entire system rests on a sufficiently thick lead base so as to reduce undesired scattering events. The configuration was arranged in such a manner that as gamma-ray moves from the source outward it first passes through the lead circular cylinder, then the aluminum circular cylinder, and finally the wooden cube before reaching the detector. A 2 in.×4 in.×16 in. box style NaI (Tl) detector was placed 1 m from the point source located in the center with the 4 in.×16 in. side facing the system. The two sources used in the benchmark are 137Cs and 235U.
Center for Extended Magnetohydrodynamic Modeling Cooperative Agreement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carl R. Sovinec
The Center for Extended Magnetohydrodynamic Modeling (CEMM) is developing computer simulation models for predicting the behavior of magnetically confined plasmas. Over the first phase of support from the Department of Energy’s Scientific Discovery through Advanced Computing (SciDAC) initiative, the focus has been on macroscopic dynamics that alter the confinement properties of magnetic field configurations. The ultimate objective is to provide computational capabilities to predict plasma behavior—not unlike computational weather prediction—to optimize performance and to increase the reliability of magnetic confinement for fusion energy. Numerical modeling aids theoretical research by solving complicated mathematical models of plasma behavior including strong nonlinear effectsmore » and the influences of geometrical shaping of actual experiments. The numerical modeling itself remains an area of active research, due to challenges associated with simulating multiple temporal and spatial scales. The research summarized in this report spans computational and physical topics associated with state of the art simulation of magnetized plasmas. The tasks performed for this grant are categorized according to whether they are primarily computational, algorithmic, or application-oriented in nature. All involve the development and use of the Non-Ideal Magnetohydrodynamics with Rotation, Open Discussion (NIMROD) code, which is described at http://nimrodteam.org. With respect to computation, we have tested and refined methods for solving the large algebraic systems of equations that result from our numerical approximations of the physical model. Collaboration with the Terascale Optimal PDE Solvers (TOPS) SciDAC center led us to the SuperLU_DIST software library [http://crd.lbl.gov/~xiaoye/SuperLU/] for solving large sparse matrices using direct methods on parallel computers. Switching to this solver library boosted NIMROD’s performance by a factor of five in typical large nonlinear simulations, which has been publicized as a success story of SciDAC-fostered collaboration. Furthermore, the SuperLU software does not assume any mathematical symmetry, and its generality provides an important capability for extending the physical model beyond magnetohydrodynamics (MHD). With respect to algorithmic and model development, our most significant accomplishment is the development of a new method for solving plasma models that treat electrons as an independent plasma component. These ‘two-fluid’ models encompass MHD and add temporal and spatial scales that are beyond the response of the ion species. Implementation and testing of a previously published algorithm did not prove successful for NIMROD, and the new algorithm has since been devised, analyzed, and implemented. Two-fluid modeling, an important objective of the original NIMROD project, is now routine in 2D applications. Algorithmic components for 3D modeling are in place and tested; though, further computational work is still needed for efficiency. Other algorithmic work extends the ion-fluid stress tensor to include models for parallel and gyroviscous stresses. In addition, our hot-particle simulation capability received important refinements that permitted completion of a benchmark with the M3D code. A highlight of our applications work is the edge-localized mode (ELM) modeling, which was part of the first-ever computational Performance Target for the DOE Office of Fusion Energy Science, see http://www.science.doe.gov/ofes/performancetargets.shtml. Our efforts allowed MHD simulations to progress late into the nonlinear stage, where energy is conducted to the wall location. They also produced a two-fluid ELM simulation starting from experimental information and demonstrating critical drift effects that are characteristic of two-fluid physics. Another important application is the internal kink mode in a tokamak. Here, the primary purpose of the study has been to benchmark the two main code development lines of CEMM, NIMROD and M3D, on a relevant nonlinear problem. Results from the two codes show repeating nonlinear relaxation events driven by the kink mode over quantitatively comparable timescales. The work has launched a more comprehensive nonlinear benchmarking exercise, where realistic transport effects have an important role.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balkey, K.; Witt, F.J.; Bishop, B.A.
1995-06-01
Significant attention has been focused on the issue of reactor vessel pressurized thermal shock (PTS) for many years. Pressurized thermal shock transient events are characterized by a rapid cooldown at potentially high pressure levels that could lead to a reactor vessel integrity concern for some pressurized water reactors. As a result of regulatory and industry efforts in the early 1980`s, a probabilistic risk assessment methodology has been established to address this concern. Probabilistic fracture mechanics analyses are performed as part of this methodology to determine conditional probability of significant flaw extension for given pressurized thermal shock events. While recent industrymore » efforts are underway to benchmark probabilistic fracture mechanics computer codes that are currently used by the nuclear industry, Part I of this report describes the comparison of two independent computer codes used at the time of the development of the original U.S. Nuclear Regulatory Commission (NRC) pressurized thermal shock rule. The work that was originally performed in 1982 and 1983 to compare the U.S. NRC - VISA and Westinghouse (W) - PFM computer codes has been documented and is provided in Part I of this report. Part II of this report describes the results of more recent industry efforts to benchmark PFM computer codes used by the nuclear industry. This study was conducted as part of the USNRC-EPRI Coordinated Research Program for reviewing the technical basis for pressurized thermal shock (PTS) analyses of the reactor pressure vessel. The work focused on the probabilistic fracture mechanics (PFM) analysis codes and methods used to perform the PTS calculations. An in-depth review of the methodologies was performed to verify the accuracy and adequacy of the various different codes. The review was structured around a series of benchmark sample problems to provide a specific context for discussion and examination of the fracture mechanics methodology.« less
Parallel processes: using motivational interviewing as an implementation coaching strategy.
Hettema, Jennifer E; Ernst, Denise; Williams, Jessica Roberts; Miller, Kristin J
2014-07-01
In addition to its clinical efficacy as a communication style for strengthening motivation and commitment to change, motivational interviewing (MI) has been hypothesized to be a potential tool for facilitating evidence-based practice adoption decisions. This paper reports on the rationale and content of MI-based implementation coaching Webinars that, as part of a larger active dissemination strategy, were found to be more effective than passive dissemination strategies at promoting adoption decisions among behavioral health and health providers and administrators. The Motivational Interviewing Treatment Integrity scale (MITI 3.1.1) was used to rate coaching Webinars from 17 community behavioral health organizations and 17 community health centers. The MITI coding system was found to be applicable to the coaching Webinars, and raters achieved high levels of agreement on global and behavior count measurements of fidelity to MI. Results revealed that implementation coaches maintained fidelity to the MI model, exceeding competency benchmarks for almost all measures. Findings suggest that it is feasible to implement MI as a coaching tool.
Performance Studies on Distributed Virtual Screening
Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.
2014-01-01
Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219
Affinity-aware checkpoint restart
Saini, Ajay; Rezaei, Arash; Mueller, Frank; ...
2014-12-08
Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may be restarted on different cores, and in case of non-uniform memory architectures (NUMA), quite common today, memory pages associated with tasks on a NUMA node may be associated with a different NUMA node after restart. Here, this work contributes a novel design technique for C/R mechanisms to preserve task-to-core mapsmore » and NUMA node specific page affinities across restarts. Experimental results with BLCR, a C/R mechanism, enhanced with affinity awareness demonstrate significant performance benefits of 37%-73% for the NAS Parallel Benchmark codes and 6-12% for NAMD with negligible overheads instead of up to nearly four times longer an execution times without affinity-aware restarts on 16 cores.« less
Affinity-aware checkpoint restart
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saini, Ajay; Rezaei, Arash; Mueller, Frank
Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may be restarted on different cores, and in case of non-uniform memory architectures (NUMA), quite common today, memory pages associated with tasks on a NUMA node may be associated with a different NUMA node after restart. Here, this work contributes a novel design technique for C/R mechanisms to preserve task-to-core mapsmore » and NUMA node specific page affinities across restarts. Experimental results with BLCR, a C/R mechanism, enhanced with affinity awareness demonstrate significant performance benefits of 37%-73% for the NAS Parallel Benchmark codes and 6-12% for NAMD with negligible overheads instead of up to nearly four times longer an execution times without affinity-aware restarts on 16 cores.« less
Spectral Element Method for the Simulation of Unsteady Compressible Flows
NASA Technical Reports Server (NTRS)
Diosady, Laslo Tibor; Murman, Scott M.
2013-01-01
This work uses a discontinuous-Galerkin spectral-element method (DGSEM) to solve the compressible Navier-Stokes equations [1{3]. The inviscid ux is computed using the approximate Riemann solver of Roe [4]. The viscous fluxes are computed using the second form of Bassi and Rebay (BR2) [5] in a manner consistent with the spectral-element approximation. The method of lines with the classical 4th-order explicit Runge-Kutta scheme is used for time integration. Results for polynomial orders up to p = 15 (16th order) are presented. The code is parallelized using the Message Passing Interface (MPI). The computations presented in this work are performed using the Sandy Bridge nodes of the NASA Pleiades supercomputer at NASA Ames Research Center. Each Sandy Bridge node consists of 2 eight-core Intel Xeon E5-2670 processors with a clock speed of 2.6Ghz and 2GB per core memory. On a Sandy Bridge node the Tau Benchmark [6] runs in a time of 7.6s.
Palkowski, Marek; Bielecki, Wlodzimierz
2017-06-02
RNA secondary structure prediction is a compute intensive task that lies at the core of several search algorithms in bioinformatics. Fortunately, the RNA folding approaches, such as the Nussinov base pair maximization, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. Polyhedral compilation techniques have proven to be a powerful tool for optimization of dense array codes. However, classical affine loop nest transformations used with these techniques do not optimize effectively codes of dynamic programming of RNA structure predictions. The purpose of this paper is to present a novel approach allowing for generation of a parallel tiled Nussinov RNA loop nest exposing significantly higher performance than that of known related code. This effect is achieved due to improving code locality and calculation parallelization. In order to improve code locality, we apply our previously published technique of automatic loop nest tiling to all the three loops of the Nussinov loop nest. This approach first forms original rectangular 3D tiles and then corrects them to establish their validity by means of applying the transitive closure of a dependence graph. To produce parallel code, we apply the loop skewing technique to a tiled Nussinov loop nest. The technique is implemented as a part of the publicly available polyhedral source-to-source TRACO compiler. Generated code was run on modern Intel multi-core processors and coprocessors. We present the speed-up factor of generated Nussinov RNA parallel code and demonstrate that it is considerably faster than related codes in which only the two outer loops of the Nussinov loop nest are tiled.
Importance of inlet boundary conditions for numerical simulation of combustor flows
NASA Technical Reports Server (NTRS)
Sturgess, G. J.; Syed, S. A.; Mcmanus, K. R.
1983-01-01
Fluid dynamic computer codes for the mathematical simulation of problems in gas turbine engine combustion systems are required as design and diagnostic tools. To eventually achieve a performance standard with these codes of more than qualitative accuracy it is desirable to use benchmark experiments for validation studies. Typical of the fluid dynamic computer codes being developed for combustor simulations is the TEACH (Teaching Elliptic Axisymmetric Characteristics Heuristically) solution procedure. It is difficult to find suitable experiments which satisfy the present definition of benchmark quality. For the majority of the available experiments there is a lack of information concerning the boundary conditions. A standard TEACH-type numerical technique is applied to a number of test-case experiments. It is found that numerical simulations of gas turbine combustor-relevant flows can be sensitive to the plane at which the calculations start and the spatial distributions of inlet quantities for swirling flows.
FY16 Status Report on NEAMS Neutronics Activities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, C. H.; Shemon, E. R.; Smith, M. A.
2016-09-30
The goal of the NEAMS neutronics effort is to develop a neutronics toolkit for use on sodium-cooled fast reactors (SFRs) which can be extended to other reactor types. The neutronics toolkit includes the high-fidelity deterministic neutron transport code PROTEUS and many supporting tools such as a cross section generation code MC 2-3, a cross section library generation code, alternative cross section generation tools, mesh generation and conversion utilities, and an automated regression test tool. The FY16 effort for NEAMS neutronics focused on supporting the release of the SHARP toolkit and existing and new users, continuing to develop PROTEUS functions necessarymore » for performance improvement as well as the SHARP release, verifying PROTEUS against available existing benchmark problems, and developing new benchmark problems as needed. The FY16 research effort was focused on further updates of PROTEUS-SN and PROTEUS-MOCEX and cross section generation capabilities as needed.« less
GW100: Benchmarking G0W0 for Molecular Systems.
van Setten, Michiel J; Caruso, Fabio; Sharifzadeh, Sahar; Ren, Xinguo; Scheffler, Matthias; Liu, Fang; Lischner, Johannes; Lin, Lin; Deslippe, Jack R; Louie, Steven G; Yang, Chao; Weigend, Florian; Neaton, Jeffrey B; Evers, Ferdinand; Rinke, Patrick
2015-12-08
We present the GW100 set. GW100 is a benchmark set of the ionization potentials and electron affinities of 100 molecules computed with the GW method using three independent GW codes and different GW methodologies. The quasi-particle energies of the highest-occupied molecular orbitals (HOMO) and lowest-unoccupied molecular orbitals (LUMO) are calculated for the GW100 set at the G0W0@PBE level using the software packages TURBOMOLE, FHI-aims, and BerkeleyGW. The use of these three codes allows for a quantitative comparison of the type of basis set (plane wave or local orbital) and handling of unoccupied states, the treatment of core and valence electrons (all electron or pseudopotentials), the treatment of the frequency dependence of the self-energy (full frequency or more approximate plasmon-pole models), and the algorithm for solving the quasi-particle equation. Primary results include reference values for future benchmarks, best practices for convergence within a particular approach, and average error bars for the most common approximations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu
Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less
Multiresolution molecular mechanics: Implementation and efficiency
NASA Astrophysics Data System (ADS)
Biyikli, Emre; To, Albert C.
2017-01-01
Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janjusic, Tommy; Kartsaklis, Christos
Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an applicationmore » s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).« less
A Benchmark Study of Large Contract Supplier Monitoring Within DOD and Private Industry
1994-03-01
83 2. Long Term Supplier Relationships ...... .. 84 3. Global Sourcing . . . . . . . . . . . . .. 85 4. Refocusing on Customer Quality...monitoring and recognition, reduced number of suppliers, global sourcing, and long term contractor relationships . These initiatives were then compared to DCMC...on customer quality. 14. suBJE.C TERMS Benchmark Study of Large Contract Supplier Monitoring. 15. NUMBER OF PAGES108 16. PRICE CODE 17. SECURITY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Der Marck, S. C.
Three nuclear data libraries have been tested extensively using criticality safety benchmark calculations. The three libraries are the new release of the US library ENDF/B-VII.1 (2011), the new release of the Japanese library JENDL-4.0 (2011), and the OECD/NEA library JEFF-3.1 (2006). All calculations were performed with the continuous-energy Monte Carlo code MCNP (version 4C3, as well as version 6-beta1). Around 2000 benchmark cases from the International Handbook of Criticality Safety Benchmark Experiments (ICSBEP) were used. The results were analyzed per ICSBEP category, and per element. Overall, the three libraries show similar performance on most criticality safety benchmarks. The largest differencesmore » are probably caused by elements such as Be, C, Fe, Zr, W. (authors)« less
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan
1994-01-01
A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Benchmarks of programming languages for special purposes in the space station
NASA Technical Reports Server (NTRS)
Knoebel, Arthur
1986-01-01
Although Ada is likely to be chosen as the principal programming language for the Space Station, certain needs, such as expert systems and robotics, may be better developed in special languages. The languages, LISP and Prolog, are studied and some benchmarks derived. The mathematical foundations for these languages are reviewed. Likely areas of the space station are sought out where automation and robotics might be applicable. Benchmarks are designed which are functional, mathematical, relational, and expert in nature. The coding will depend on the particular versions of the languages which become available for testing.
RELAP5-3D Results for Phase I (Exercise 2) of the OECD/NEA MHTGR-350 MW Benchmark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerhard Strydom
2012-06-01
The coupling of the PHISICS code suite to the thermal hydraulics system code RELAP5-3D has recently been initiated at the Idaho National Laboratory (INL) to provide a fully coupled prismatic Very High Temperature Reactor (VHTR) system modeling capability as part of the NGNP methods development program. The PHISICS code consists of three modules: INSTANT (performing 3D nodal transport core calculations), MRTAU (depletion and decay heat generation) and a perturbation/mixer module. As part of the verification and validation activities, steady state results have been obtained for Exercise 2 of Phase I of the newly-defined OECD/NEA MHTGR-350 MW Benchmark. This exercise requiresmore » participants to calculate a steady-state solution for an End of Equilibrium Cycle 350 MW Modular High Temperature Reactor (MHTGR), using the provided geometry, material, and coolant bypass flow description. The paper provides an overview of the MHTGR Benchmark and presents typical steady state results (e.g. solid and gas temperatures, thermal conductivities) for Phase I Exercise 2. Preliminary results are also provided for the early test phase of Exercise 3 using a two-group cross-section library and the Relap5-3D model developed for Exercise 2.« less
RELAP5-3D results for phase I (Exercise 2) of the OECD/NEA MHTGR-350 MW benchmark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strydom, G.; Epiney, A. S.
2012-07-01
The coupling of the PHISICS code suite to the thermal hydraulics system code RELAP5-3D has recently been initiated at the Idaho National Laboratory (INL) to provide a fully coupled prismatic Very High Temperature Reactor (VHTR) system modeling capability as part of the NGNP methods development program. The PHISICS code consists of three modules: INSTANT (performing 3D nodal transport core calculations), MRTAU (depletion and decay heat generation) and a perturbation/mixer module. As part of the verification and validation activities, steady state results have been obtained for Exercise 2 of Phase I of the newly-defined OECD/NEA MHTGR-350 MW Benchmark. This exercise requiresmore » participants to calculate a steady-state solution for an End of Equilibrium Cycle 350 MW Modular High Temperature Reactor (MHTGR), using the provided geometry, material, and coolant bypass flow description. The paper provides an overview of the MHTGR Benchmark and presents typical steady state results (e.g. solid and gas temperatures, thermal conductivities) for Phase I Exercise 2. Preliminary results are also provided for the early test phase of Exercise 3 using a two-group cross-section library and the Relap5-3D model developed for Exercise 2. (authors)« less
New Bandwidth Efficient Parallel Concatenated Coding Schemes
NASA Technical Reports Server (NTRS)
Denedetto, S.; Divsalar, D.; Montorsi, G.; Pollara, F.
1996-01-01
We propose a new solution to parallel concatenation of trellis codes with multilevel amplitude/phase modulations and a suitable iterative decoding structure. Examples are given for throughputs 2 bits/sec/Hz with 8PSK and 16QAM signal constellations.
NASA Astrophysics Data System (ADS)
Work, Paul R.
1991-12-01
This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
National Combustion Code: Parallel Performance
NASA Technical Reports Server (NTRS)
Babrauckas, Theresa
2001-01-01
This report discusses the National Combustion Code (NCC). The NCC is an integrated system of codes for the design and analysis of combustion systems. The advanced features of the NCC meet designers' requirements for model accuracy and turn-around time. The fundamental features at the inception of the NCC were parallel processing and unstructured mesh. The design and performance of the NCC are discussed.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
PFLOTRAN: Reactive Flow & Transport Code for Use on Laptops to Leadership-Class Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hammond, Glenn E.; Lichtner, Peter C.; Lu, Chuan
PFLOTRAN, a next-generation reactive flow and transport code for modeling subsurface processes, has been designed from the ground up to run efficiently on machines ranging from leadership-class supercomputers to laptops. Based on an object-oriented design, the code is easily extensible to incorporate additional processes. It can interface seamlessly with Fortran 9X, C and C++ codes. Domain decomposition parallelism is employed, with the PETSc parallel framework used to manage parallel solvers, data structures and communication. Features of the code include a modular input file, implementation of high-performance I/O using parallel HDF5, ability to perform multiple realization simulations with multiple processors permore » realization in a seamless manner, and multiple modes for multiphase flow and multicomponent geochemical transport. Chemical reactions currently implemented in the code include homogeneous aqueous complexing reactions and heterogeneous mineral precipitation/dissolution, ion exchange, surface complexation and a multirate kinetic sorption model. PFLOTRAN has demonstrated petascale performance using 2{sup 17} processor cores with over 2 billion degrees of freedom. Accomplishments achieved to date include applications to the Hanford 300 Area and modeling CO{sub 2} sequestration in deep geologic formations.« less
Performance and scalability evaluation of "Big Memory" on Blue Gene Linux.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoshii, K.; Iskra, K.; Naik, H.
2011-05-01
We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of 'Big Memory' - an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solelymore » for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.« less
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms
NASA Astrophysics Data System (ADS)
Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François
2015-10-01
Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
Implementation of ADI: Schemes on MIMD parallel computers
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1993-01-01
In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM
NASA Astrophysics Data System (ADS)
de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl
2002-03-01
We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
FY2012 summary of tasks completed on PROTEUS-thermal work.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, C.H.; Smith, M.A.
2012-06-06
PROTEUS is a suite of the neutronics codes, both old and new, that can be used within the SHARP codes being developed under the NEAMS program. Discussion here is focused on updates and verification and validation activities of the SHARP neutronics code, DeCART, for application to thermal reactor analysis. As part of the development of SHARP tools, the different versions of the DeCART code created for PWR, BWR, and VHTR analysis were integrated. Verification and validation tests for the integrated version were started, and the generation of cross section libraries based on the subgroup method was revisited for the targetedmore » reactor types. The DeCART code has been reorganized in preparation for an efficient integration of the different versions for PWR, BWR, and VHTR analysis. In DeCART, the old-fashioned common blocks and header files have been replaced by advanced memory structures. However, the changing of variable names was minimized in order to limit problems with the code integration. Since the remaining stability problems of DeCART were mostly caused by the CMFD methodology and modules, significant work was performed to determine whether they could be replaced by more stable methods and routines. The cross section library is a key element to obtain accurate solutions. Thus, the procedure for generating cross section libraries was revisited to provide libraries tailored for the targeted reactor types. To improve accuracy in the cross section library, an attempt was made to replace the CENTRM code by the MCNP Monte Carlo code as a tool obtaining reference resonance integrals. The use of the Monte Carlo code allows us to minimize problems or approximations that CENTRM introduces since the accuracy of the subgroup data is limited by that of the reference solutions. The use of MCNP requires an additional set of libraries without resonance cross sections so that reference calculations can be performed for a unit cell in which only one isotope of interest includes resonance cross sections, among the isotopes in the composition. The OECD MHTGR-350 benchmark core was simulated using DeCART as initial focus of the verification/validation efforts. Among the benchmark problems, Exercise 1 of Phase 1 is a steady-state benchmark case for the neutronics calculation for which block-wise cross sections were provided in 26 energy groups. This type of problem was designed for a homogenized geometry solver like DIF3D rather than the high-fidelity code DeCART. Instead of the homogenized block cross sections given in the benchmark, the VHTR-specific 238-group ENDF/B-VII.0 library of DeCART was directly used for preliminary calculations. Initial results showed that the multiplication factors of a fuel pin and a fuel block with or without a control rod hole were off by 6, -362, and -183 pcm Dk from comparable MCNP solutions, respectively. The 2-D and 3-D one-third core calculations were also conducted for the all-rods-out (ARO) and all-rods-in (ARI) configurations, producing reasonable results. Figure 1 illustrates the intermediate (1.5 eV - 17 keV) and thermal (below 1.5 eV) group flux distributions. As seen from VHTR cores with annular fuels, the intermediate group fluxes are relatively high in the fuel region, but the thermal group fluxes are higher in the inner and outer graphite reflector regions than in the fuel region. To support the current project, a new three-year I-NERI collaboration involving ANL and KAERI was started in November 2011, focused on performing in-depth verification and validation of high-fidelity multi-physics simulation codes for LWR and VHTR. The work scope includes generating improved cross section libraries for the targeted reactor types, developing benchmark models for verification and validation of the neutronics code with or without thermo-fluid feedback, and performing detailed comparisons of predicted reactor parameters against both Monte Carlo solutions and experimental measurements. The following list summarizes the work conducted so far for PROTEUS-Thermal Tasks: Unification of different versions of DeCART was initiated, and at the same time code modernization was conducted to make code unification efficient; (2) Regeneration of cross section libraries was attempted for the targeted reactor types, and the procedure for generating cross section libraries was updated by replacing CENTRM with MCNP for reference resonance integrals; (3) The MHTGR-350 benchmark core was simulated using DeCART with VHTR-specific 238-group ENDF/B-VII.0 library, and MCNP calculations were performed for comparison; and (4) Benchmark problems for PWR and BWR analysis were prepared for the DeCART verification/validation effort. In the coming months, the work listed above will be completed. Cross section libraries will be generated with optimized group structures for specific reactor types.« less
Benchmarking kinetic calculations of resistive wall mode stability
NASA Astrophysics Data System (ADS)
Berkery, J. W.; Liu, Y. Q.; Wang, Z. R.; Sabbagh, S. A.; Logan, N. C.; Park, J.-K.; Manickam, J.; Betti, R.
2014-05-01
Validating the calculations of kinetic resistive wall mode (RWM) stability is important for confidently predicting RWM stable operating regions in ITER and other high performance tokamaks for disruption avoidance. Benchmarking the calculations of the Magnetohydrodynamic Resistive Spectrum—Kinetic (MARS-K) [Y. Liu et al., Phys. Plasmas 15, 112503 (2008)], Modification to Ideal Stability by Kinetic effects (MISK) [B. Hu et al., Phys. Plasmas 12, 057301 (2005)], and Perturbed Equilibrium Nonambipolar Transport PENT) [N. Logan et al., Phys. Plasmas 20, 122507 (2013)] codes for two Solov'ev analytical equilibria and a projected ITER equilibrium has demonstrated good agreement between the codes. The important particle frequencies, the frequency resonance energy integral in which they are used, the marginally stable eigenfunctions, perturbed Lagrangians, and fluid growth rates are all generally consistent between the codes. The most important kinetic effect at low rotation is the resonance between the mode rotation and the trapped thermal particle's precession drift, and MARS-K, MISK, and PENT show good agreement in this term. The different ways the rational surface contribution was treated historically in the codes is identified as a source of disagreement in the bounce and transit resonance terms at higher plasma rotation. Calculations from all of the codes support the present understanding that RWM stability can be increased by kinetic effects at low rotation through precession drift resonance and at high rotation by bounce and transit resonances, while intermediate rotation can remain susceptible to instability. The applicability of benchmarked kinetic stability calculations to experimental results is demonstrated by the prediction of MISK calculations of near marginal growth rates for experimental marginal stability points from the National Spherical Torus Experiment (NSTX) [M. Ono et al., Nucl. Fusion 40, 557 (2000)].
NASA Technical Reports Server (NTRS)
Litt, Jonathan S.; Soditus, Sherry; Hendricks, Robert C.; Zaretsky, Erwin V.
2002-01-01
Over the past two decades there has been considerable effort by NASA Glenn and others to develop probabilistic codes to predict with reasonable engineering certainty the life and reliability of critical components in rotating machinery and, more specifically, in the rotating sections of airbreathing and rocket engines. These codes have, to a very limited extent, been verified with relatively small bench rig type specimens under uniaxial loading. Because of the small and very narrow database the acceptance of these codes within the aerospace community has been limited. An alternate approach to generating statistically significant data under complex loading and environments simulating aircraft and rocket engine conditions is to obtain, catalog and statistically analyze actual field data. End users of the engines, such as commercial airlines and the military, record and store operational and maintenance information. This presentation describes a cooperative program between the NASA GRC, United Airlines, USAF Wright Laboratory, U.S. Army Research Laboratory and Australian Aeronautical & Maritime Research Laboratory to obtain and analyze these airline data for selected components such as blades, disks and combustors. These airline data will be used to benchmark and compare existing life prediction codes.
Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F.; Harger, Matthew; Torabifard, Hedieh; Cisneros, G. Andrés; Schnieders, Michael J.; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y.; Ponder, Jay W.
2017-01-01
We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over a single-core computation is observed for the largest systems. The extension of the present CPU implementation of Tinker-HP to other computational platforms is discussed. PMID:29732110
Zou, Ling; Zhao, Haihua; Zhang, Hongbin
2016-03-09
This work represents a first-of-its-kind successful application to employ advanced numerical methods in solving realistic two-phase flow problems with two-fluid six-equation two-phase flow model. These advanced numerical methods include high-resolution spatial discretization scheme with staggered grids (high-order) fully implicit time integration schemes, and Jacobian-free Newton–Krylov (JFNK) method as the nonlinear solver. The computer code developed in this work has been extensively validated with existing experimental flow boiling data in vertical pipes and rod bundles, which cover wide ranges of experimental conditions, such as pressure, inlet mass flux, wall heat flux and exit void fraction. Additional code-to-code benchmark with the RELAP5-3Dmore » code further verifies the correct code implementation. The combined methods employed in this work exhibit strong robustness in solving two-phase flow problems even when phase appearance (boiling) and realistic discrete flow regimes are considered. Transitional flow regimes used in existing system analysis codes, normally introduced to overcome numerical difficulty, were completely removed in this work. As a result, this in turn provides the possibility to utilize more sophisticated flow regime maps in the future to further improve simulation accuracy.« less
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nanda, Kaushik D.; Krylov, Anna I.
The equation-of-motion coupled-cluster (EOM-CC) methods provide a robust description of electronically excited states and their properties. Here, we present a formalism for two-photon absorption (2PA) cross sections for the equation-of-motion for excitation energies CC with single and double substitutions (EOM-CC for electronically excited states with single and double substitutions) wave functions. Rather than the response theory formulation, we employ the expectation-value approach which is commonly used within EOM-CC, configuration interaction, and algebraic diagrammatic construction frameworks. In addition to canonical implementation, we also exploit resolution-of-the-identity (RI) and Cholesky decomposition (CD) for the electron-repulsion integrals to reduce memory requirements and to increasemore » parallel efficiency. The new methods are benchmarked against the CCSD and CC3 response theories for several small molecules. We found that the expectation-value 2PA cross sections are within 5% from the quadratic response CCSD values. The RI and CD approximations lead to small errors relative to the canonical implementation (less than 4%) while affording computational savings. RI/CD successfully address the well-known issue of large basis set requirements for 2PA cross sections calculations. The capabilities of the new code are illustrated by calculations of the 2PA cross sections for model chromophores of the photoactive yellow and green fluorescent proteins.« less
Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F
2002-02-01
This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Thermo-hydro-mechanical-chemical processes in fractured-porous media: Benchmarks and examples
NASA Astrophysics Data System (ADS)
Kolditz, O.; Shao, H.; Görke, U.; Kalbacher, T.; Bauer, S.; McDermott, C. I.; Wang, W.
2012-12-01
The book comprises an assembly of benchmarks and examples for porous media mechanics collected over the last twenty years. Analysis of thermo-hydro-mechanical-chemical (THMC) processes is essential to many applications in environmental engineering, such as geological waste deposition, geothermal energy utilisation, carbon capture and storage, water resources management, hydrology, even climate change. In order to assess the feasibility as well as the safety of geotechnical applications, process-based modelling is the only tool to put numbers, i.e. to quantify future scenarios. This charges a huge responsibility concerning the reliability of computational tools. Benchmarking is an appropriate methodology to verify the quality of modelling tools based on best practices. Moreover, benchmarking and code comparison foster community efforts. The benchmark book is part of the OpenGeoSys initiative - an open source project to share knowledge and experience in environmental analysis and scientific computation.
Genetic Parallel Programming: design and implementation.
Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong
2006-01-01
This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
NASA Technical Reports Server (NTRS)
Mashnik, S. G.; Gudima, K. K.; Sierk, A. J.; Moskalenko, I. V.
2002-01-01
Space radiation shield applications and studies of cosmic ray propagation in the Galaxy require reliable cross sections to calculate spectra of secondary particles and yields of the isotopes produced in nuclear reactions induced both by particles and nuclei at energies from threshold to hundreds of GeV per nucleon. Since the data often exist in a very limited energy range or sometimes not at all, the only way to obtain an estimate of the production cross sections is to use theoretical models and codes. Recently, we have developed improved versions of the Cascade-Exciton Model (CEM) of nuclear reactions: the codes CEM97 and CEM2k for description of particle-nucleus reactions at energies up to about 5 GeV. In addition, we have developed a LANL version of the Quark-Gluon String Model (LAQGSM) to describe reactions induced both by particles and nuclei at energies up to hundreds of GeVhucleon. We have tested and benchmarked the CEM and LAQGSM codes against a large variety of experimental data and have compared their results with predictions by other currently available models and codes. Our benchmarks show that CEM and LAQGSM codes have predictive powers no worse than other currently used codes and describe many reactions better than other codes; therefore both our codes can be used as reliable event-generators for space radiation shield and cosmic ray propagation applications. The CEM2k code is being incorporated into the transport code MCNPX (and several other transport codes), and we plan to incorporate LAQGSM into MCNPX in the near future. Here, we present the current status of the CEM2k and LAQGSM codes, and show results and applications to studies of cosmic ray propagation in the Galaxy.
Research on computer systems benchmarking
NASA Technical Reports Server (NTRS)
Smith, Alan Jay (Principal Investigator)
1996-01-01
This grant addresses the topic of research on computer systems benchmarking and is more generally concerned with performance issues in computer systems. This report reviews work in those areas during the period of NASA support under this grant. The bulk of the work performed concerned benchmarking and analysis of CPUs, compilers, caches, and benchmark programs. The first part of this work concerned the issue of benchmark performance prediction. A new approach to benchmarking and machine characterization was reported, using a machine characterizer that measures the performance of a given system in terms of a Fortran abstract machine. Another report focused on analyzing compiler performance. The performance impact of optimization in the context of our methodology for CPU performance characterization was based on the abstract machine model. Benchmark programs are analyzed in another paper. A machine-independent model of program execution was developed to characterize both machine performance and program execution. By merging these machine and program characterizations, execution time can be estimated for arbitrary machine/program combinations. The work was continued into the domain of parallel and vector machines, including the issue of caches in vector processors and multiprocessors. All of the afore-mentioned accomplishments are more specifically summarized in this report, as well as those smaller in magnitude supported by this grant.
Nuclide Depletion Capabilities in the Shift Monte Carlo Code
Davidson, Gregory G.; Pandya, Tara M.; Johnson, Seth R.; ...
2017-12-21
A new depletion capability has been developed in the Exnihilo radiation transport code suite. This capability enables massively parallel domain-decomposed coupling between the Shift continuous-energy Monte Carlo solver and the nuclide depletion solvers in ORIGEN to perform high-performance Monte Carlo depletion calculations. This paper describes this new depletion capability and discusses its various features, including a multi-level parallel decomposition, high-order transport-depletion coupling, and energy-integrated power renormalization. Several test problems are presented to validate the new capability against other Monte Carlo depletion codes, and the parallel performance of the new capability is analyzed.
User's Guide for ENSAERO_FE Parallel Finite Element Solver
NASA Technical Reports Server (NTRS)
Eldred, Lloyd B.; Guruswamy, Guru P.
1999-01-01
A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.
NASA Astrophysics Data System (ADS)
Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Portable multi-node LQCD Monte Carlo simulations using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
The Tera Multithreaded Architecture and Unstructured Meshes
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Mavriplis, Dimitri J.
1998-01-01
The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
NASA Astrophysics Data System (ADS)
Velioglu Sogut, Deniz; Yalciner, Ahmet Cevdet
2018-06-01
Field observations provide valuable data regarding nearshore tsunami impact, yet only in inundation areas where tsunami waves have already flooded. Therefore, tsunami modeling is essential to understand tsunami behavior and prepare for tsunami inundation. It is necessary that all numerical models used in tsunami emergency planning be subject to benchmark tests for validation and verification. This study focuses on two numerical codes, NAMI DANCE and FLOW-3D®, for validation and performance comparison. NAMI DANCE is an in-house tsunami numerical model developed by the Ocean Engineering Research Center of Middle East Technical University, Turkey and Laboratory of Special Research Bureau for Automation of Marine Research, Russia. FLOW-3D® is a general purpose computational fluid dynamics software, which was developed by scientists who pioneered in the design of the Volume-of-Fluid technique. The codes are validated and their performances are compared via analytical, experimental and field benchmark problems, which are documented in the ``Proceedings and Results of the 2011 National Tsunami Hazard Mitigation Program (NTHMP) Model Benchmarking Workshop'' and the ``Proceedings and Results of the NTHMP 2015 Tsunami Current Modeling Workshop". The variations between the numerical solutions of these two models are evaluated through statistical error analysis.
Practice Benchmarking in the Age of Targeted Auditing
Langdale, Ryan P.; Holland, Ben F.
2012-01-01
The frequency and sophistication of health care reimbursement auditing has progressed rapidly in recent years, leaving many oncologists wondering whether their private practices would survive a full-scale Office of the Inspector General (OIG) investigation. The Medicare Part B claims database provides a rich source of information for physicians seeking to understand how their billing practices measure up to their peers, both locally and nationally. This database was dissected by a team of cancer specialists to uncover important benchmarks related to targeted auditing. All critical Medicare charges, payments, denials, and service ratios in this article were derived from the full 2010 Medicare Part B claims database. Relevant claims were limited by using Medicare provider specialty codes 83 (hematology/oncology) and 90 (medical oncology), with an emphasis on claims filed from the physician office place of service (11). All charges, denials, and payments were summarized at the Current Procedural Terminology code level to drive practice benchmarking standards. A careful analysis of this data set, combined with the published audit priorities of the OIG, produced germane benchmarks from which medical oncologists can monitor, measure and improve on common areas of billing fraud, waste or abuse in their practices. Part II of this series and analysis will focus on information pertinent to radiation oncologists. PMID:23598847
Practice benchmarking in the age of targeted auditing.
Langdale, Ryan P; Holland, Ben F
2012-11-01
The frequency and sophistication of health care reimbursement auditing has progressed rapidly in recent years, leaving many oncologists wondering whether their private practices would survive a full-scale Office of the Inspector General (OIG) investigation. The Medicare Part B claims database provides a rich source of information for physicians seeking to understand how their billing practices measure up to their peers, both locally and nationally. This database was dissected by a team of cancer specialists to uncover important benchmarks related to targeted auditing. All critical Medicare charges, payments, denials, and service ratios in this article were derived from the full 2010 Medicare Part B claims database. Relevant claims were limited by using Medicare provider specialty codes 83 (hematology/oncology) and 90 (medical oncology), with an emphasis on claims filed from the physician office place of service (11). All charges, denials, and payments were summarized at the Current Procedural Terminology code level to drive practice benchmarking standards. A careful analysis of this data set, combined with the published audit priorities of the OIG, produced germane benchmarks from which medical oncologists can monitor, measure and improve on common areas of billing fraud, waste or abuse in their practices. Part II of this series and analysis will focus on information pertinent to radiation oncologists.
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
2017-10-30
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
CUBE: Information-optimized parallel cosmological N-body simulation code
NASA Astrophysics Data System (ADS)
Yu, Hao-Ran; Pen, Ue-Li; Wang, Xin
2018-05-01
CUBE, written in Coarray Fortran, is a particle-mesh based parallel cosmological N-body simulation code. The memory usage of CUBE can approach as low as 6 bytes per particle. Particle pairwise (PP) force, cosmological neutrinos, spherical overdensity (SO) halofinder are included.
The Particle Accelerator Simulation Code PyORBIT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorlov, Timofey V; Holmes, Jeffrey A; Cousineau, Sarah M
2015-01-01
The particle accelerator simulation code PyORBIT is presented. The structure, implementation, history, parallel and simulation capabilities, and future development of the code are discussed. The PyORBIT code is a new implementation and extension of algorithms of the original ORBIT code that was developed for the Spallation Neutron Source accelerator at the Oak Ridge National Laboratory. The PyORBIT code has a two level structure. The upper level uses the Python programming language to control the flow of intensive calculations performed by the lower level code implemented in the C++ language. The parallel capabilities are based on MPI communications. The PyORBIT ismore » an open source code accessible to the public through the Google Open Source Projects Hosting service.« less
Parallel Event Analysis Under Unix
NASA Astrophysics Data System (ADS)
Looney, S.; Nilsson, B. S.; Oest, T.; Pettersson, T.; Ranjard, F.; Thibonnier, J.-P.
The ALEPH experiment at LEP, the CERN CN division and Digital Equipment Corp. have, in a joint project, developed a parallel event analysis system. The parallel physics code is identical to ALEPH's standard analysis code, ALPHA, only the organisation of input/output is changed. The user may switch between sequential and parallel processing by simply changing one input "card". The initial implementation runs on an 8-node DEC 3000/400 farm, using the PVM software, and exhibits a near-perfect speed-up linearity, reducing the turn-around time by a factor of 8.
Code Optimization and Parallelization on the Origins: Looking from Users' Perspective
NASA Technical Reports Server (NTRS)
Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)
2002-01-01
Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.
NASA Astrophysics Data System (ADS)
Grenier, Christophe; Roux, Nicolas; Anbergen, Hauke; Collier, Nathaniel; Costard, Francois; Ferrry, Michel; Frampton, Andrew; Frederick, Jennifer; Holmen, Johan; Jost, Anne; Kokh, Samuel; Kurylyk, Barret; McKenzie, Jeffrey; Molson, John; Orgogozo, Laurent; Rivière, Agnès; Rühaak, Wolfram; Selroos, Jan-Olof; Therrien, René; Vidstrand, Patrik
2015-04-01
The impacts of climate change in boreal regions has received considerable attention recently due to the warming trends that have been experienced in recent decades and are expected to intensify in the future. Large portions of these regions, corresponding to permafrost areas, are covered by water bodies (lakes, rivers) that interact with the surrounding permafrost. For example, the thermal state of the surrounding soil influences the energy and water budget of the surface water bodies. Also, these water bodies generate taliks (unfrozen zones below) that disturb the thermal regimes of permafrost and may play a key role in the context of climate change. Recent field studies and modeling exercises indicate that a fully coupled 2D or 3D Thermo-Hydraulic (TH) approach is required to understand and model the past and future evolution of landscapes, rivers, lakes and associated groundwater systems in a changing climate. However, there is presently a paucity of 3D numerical studies of permafrost thaw and associated hydrological changes, and the lack of study can be partly attributed to the difficulty in verifying multi-dimensional results produced by numerical models. Numerical approaches can only be validated against analytical solutions for a purely thermic 1D equation with phase change (e.g. Neumann, Lunardini). When it comes to the coupled TH system (coupling two highly non-linear equations), the only possible approach is to compare the results from different codes to provided test cases and/or to have controlled experiments for validation. Such inter-code comparisons can propel discussions to try to improve code performances. A benchmark exercise was initialized in 2014 with a kick-off meeting in Paris in November. Participants from USA, Canada, Germany, Sweden and France convened, representing altogether 13 simulation codes. The benchmark exercises consist of several test cases inspired by existing literature (e.g. McKenzie et al., 2007) as well as new ones. They range from simpler, purely thermal cases (benchmark T1) to more complex, coupled 2D TH cases (benchmarks TH1, TH2, and TH3). Some experimental cases conducted in cold room complement the validation approach. A web site hosted by LSCE (Laboratoire des Sciences du Climat et de l'Environnement) is an interaction platform for the participants and hosts the test cases database at the following address: https://wiki.lsce.ipsl.fr/interfrost. The results of the first stage of the benchmark exercise will be presented. We will mainly focus on the inter-comparison of participant results for the coupled cases (TH1, TH2 & TH3). Further perspectives of the exercise will also be presented. Extensions to more complex physical conditions (e.g. unsaturated conditions and geometrical deformations) are contemplated. In addition, 1D vertical cases of interest to the Climate Modeling community will be proposed. Keywords: Permafrost; Numerical modeling; River-soil interaction; Arctic systems; soil freeze-thaw
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate
NASA Technical Reports Server (NTRS)
Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel
1994-01-01
This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
Astronomy education and the Astrophysics Source Code Library
NASA Astrophysics Data System (ADS)
Allen, Alice; Nemiroff, Robert J.
2016-01-01
The Astrophysics Source Code Library (ASCL) is an online registry of source codes used in refereed astrophysics research. It currently lists nearly 1,200 codes and covers all aspects of computational astrophysics. How can this resource be of use to educators and to the graduate students they mentor? The ASCL serves as a discovery tool for codes that can be used for one's own research. Graduate students can also investigate existing codes to see how common astronomical problems are approached numerically in practice, and use these codes as benchmarks for their own solutions to these problems. Further, they can deepen their knowledge of software practices and techniques through examination of others' codes.
O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...
1995-01-01
Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland
2003-01-01
In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
NASA Astrophysics Data System (ADS)
Lin, Yi-Chun; Huang, Tseng-Te; Liu, Yuan-Hao; Chen, Wei-Lin; Chen, Yen-Fu; Wu, Shu-Wei; Nievaart, Sander; Jiang, Shiang-Huei
2015-06-01
The paired ionization chambers (ICs) technique is commonly employed to determine neutron and photon doses in radiology or radiotherapy neutron beams, where neutron dose shows very strong dependence on the accuracy of accompanying high energy photon dose. During the dose derivation, it is an important issue to evaluate the photon and electron response functions of two commercially available ionization chambers, denoted as TE(TE) and Mg(Ar), used in our reactor based epithermal neutron beam. Nowadays, most perturbation corrections for accurate dose determination and many treatment planning systems are based on the Monte Carlo technique. We used general purposed Monte Carlo codes, MCNP5, EGSnrc, FLUKA or GEANT4 for benchmark verifications among them and carefully measured values for a precise estimation of chamber current from absorbed dose rate of cavity gas. Also, energy dependent response functions of two chambers were calculated in a parallel beam with mono-energies from 20 keV to 20 MeV photons and electrons by using the optimal simple spherical and detailed IC models. The measurements were performed in the well-defined (a) four primary M-80, M-100, M120 and M150 X-ray calibration fields, (b) primary 60Co calibration beam, (c) 6 MV and 10 MV photon, (d) 6 MeV and 18 MeV electron LINACs in hospital and (e) BNCT clinical trials neutron beam. For the TE(TE) chamber, all codes were almost identical over the whole photon energy range. In the Mg(Ar) chamber, MCNP5 showed lower response than other codes for photon energy region below 0.1 MeV and presented similar response above 0.2 MeV (agreed within 5% in the simple spherical model). With the increase of electron energy, the response difference between MCNP5 and other codes became larger in both chambers. Compared with the measured currents, MCNP5 had the difference from the measurement data within 5% for the 60Co, 6 MV, 10 MV, 6 MeV and 18 MeV LINACs beams. But for the Mg(Ar) chamber, the derivations reached 7.8-16.5% below 120 kVp X-ray beams. In this study, we were especially interested in BNCT doses where low energy photon contribution is less to ignore, MCNP model is recognized as the most suitable to simulate wide photon-electron and neutron energy distributed responses of the paired ICs. Also, MCNP provides the best prediction of BNCT source adjustment by the detector's neutron and photon responses.
RISC Processors and High Performance Computing
NASA Technical Reports Server (NTRS)
Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)
1995-01-01
In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr
2014-12-15
In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less
Applications Performance on NAS Intel Paragon XP/S - 15#
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)
1994-01-01
The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran coded. We found that the measured performance of assembly-coded BLAS is much less than what memory bandwidth limitation would predict. The influence of data cache on different sizes of vectors is also investigated using one-dimensional FFTs. c. Impact of processor layout: There are several different ways processors can be laid out within the two-dimensional grid of processors on the Paragon. We have used the FFT example to investigate performance differences based on processors layout.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramos-Mendez, J; Faddegon, B; Perl, J
2015-06-15
Purpose: To develop and verify an extension to TOPAS for calculation of dose response models (TCP/NTCP). TOPAS wraps and extends Geant4. Methods: The TOPAS DICOM interface was extended to include structure contours, for subsequent calculation of DVH’s and TCP/NTCP. The following dose response models were implemented: Lyman-Kutcher-Burman (LKB), critical element (CE), population based critical volume (CV), parallel-serials, a sigmoid-based model of Niemierko for NTCP and TCP, and a Poisson-based model for TCP. For verification, results for the parallel-serial and Poisson models, with 6 MV x-ray dose distributions calculated with TOPAS and Pinnacle v9.2, were compared to data from the benchmarkmore » configuration of the AAPM Task Group 166 (TG166). We provide a benchmark configuration suitable for proton therapy along with results for the implementation of the Niemierko, CV and CE models. Results: The maximum difference in DVH calculated with Pinnacle and TOPAS was 2%. Differences between TG166 data and Monte Carlo calculations of up to 4.2%±6.1% were found for the parallel-serial model and up to 1.0%±0.7% for the Poisson model (including the uncertainty due to lack of knowledge of the point spacing in TG166). For CE, CV and Niemierko models, the discrepancies between the Pinnacle and TOPAS results are 74.5%, 34.8% and 52.1% when using 29.7 cGy point spacing, the differences being highly sensitive to dose spacing. On the other hand, with our proposed benchmark configuration, the largest differences were 12.05%±0.38%, 3.74%±1.6%, 1.57%±4.9% and 1.97%±4.6% for the CE, CV, Niemierko and LKB models, respectively. Conclusion: Several dose response models were successfully implemented with the extension module. Reference data was calculated for future benchmarking. Dose response calculated for the different models varied much more widely for the TG166 benchmark than for the proposed benchmark, which had much lower sensitivity to the choice of DVH dose points. This work was supported by National Cancer Institute Grant R01CA140735.« less
Highly fault-tolerant parallel computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spielman, D.A.
We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider fine-grained parallel computations inmore » which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using w log{sup O(l)} w processors and time t log{sup O(l)} w. The failure probability of the computation will be at most t {center_dot} exp(-w{sup 1/4}). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O(n log{sup O(1)} n) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.« less
Enhancing Scalability and Efficiency of the TOUGH2_MP for LinuxClusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Wu, Yu-Shu
2006-04-17
TOUGH2{_}MP, the parallel version TOUGH2 code, has been enhanced by implementing more efficient communication schemes. This enhancement is achieved through reducing the amount of small-size messages and the volume of large messages. The message exchange speed is further improved by using non-blocking communications for both linear and nonlinear iterations. In addition, we have modified the AZTEC parallel linear-equation solver to nonblocking communication. Through the improvement of code structuring and bug fixing, the new version code is now more stable, while demonstrating similar or even better nonlinear iteration converging speed than the original TOUGH2 code. As a result, the new versionmore » of TOUGH2{_}MP is improved significantly in its efficiency. In this paper, the scalability and efficiency of the parallel code are demonstrated by solving two large-scale problems. The testing results indicate that speedup of the code may depend on both problem size and complexity. In general, the code has excellent scalability in memory requirement as well as computing time.« less
Neural representation of objects in space: a dual coding account.
Humphreys, G W
1998-01-01
I present evidence on the nature of object coding in the brain and discuss the implications of this coding for models of visual selective attention. Neuropsychological studies of task-based constraints on: (i) visual neglect; and (ii) reading and counting, reveal the existence of parallel forms of spatial representation for objects: within-object representations, where elements are coded as parts of objects, and between-object representations, where elements are coded as independent objects. Aside from these spatial codes for objects, however, the coding of visual space is limited. We are extremely poor at remembering small spatial displacements across eye movements, indicating (at best) impoverished coding of spatial position per se. Also, effects of element separation on spatial extinction can be eliminated by filling the space with an occluding object, indicating that spatial effects on visual selection are moderated by object coding. Overall, there are separate limits on visual processing reflecting: (i) the competition to code parts within objects; (ii) the small number of independent objects that can be coded in parallel; and (iii) task-based selection of whether within- or between-object codes determine behaviour. Between-object coding may be linked to the dorsal visual system while parallel coding of parts within objects takes place in the ventral system, although there may additionally be some dorsal involvement either when attention must be shifted within objects or when explicit spatial coding of parts is necessary for object identification. PMID:9770227
featsel: A framework for benchmarking of feature selection algorithms and cost functions
NASA Astrophysics Data System (ADS)
Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior
In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.
BACT Simulation User Guide (Version 7.0)
NASA Technical Reports Server (NTRS)
Waszak, Martin R.
1997-01-01
This report documents the structure and operation of a simulation model of the Benchmark Active Control Technology (BACT) Wind-Tunnel Model. The BACT system was designed, built, and tested at NASA Langley Research Center as part of the Benchmark Models Program and was developed to perform wind-tunnel experiments to obtain benchmark quality data to validate computational fluid dynamics and computational aeroelasticity codes, to verify the accuracy of current aeroservoelasticity design and analysis tools, and to provide an active controls testbed for evaluating new and innovative control algorithms for flutter suppression and gust load alleviation. The BACT system has been especially valuable as a control system testbed.
Rapid Prediction of Unsteady Three-Dimensional Viscous Flows in Turbopump Geometries
NASA Technical Reports Server (NTRS)
Dorney, Daniel J.
1998-01-01
A program is underway to improve the efficiency of a three-dimensional Navier-Stokes code and generalize it for nozzle and turbopump geometries. Code modifications have included the implementation of parallel processing software, incorporation of new physical models and generalization of the multiblock capability. The final report contains details of code modifications, numerical results for several nozzle and turbopump geometries, and the implementation of the parallelization software.
Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code
NASA Technical Reports Server (NTRS)
Yamakov, Vesselin I.
2016-01-01
This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dr. Dale M. Snider
2011-02-28
This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendlymore » environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and jointly parallelize subroutines (CPFD chose the small business EMPhotonics for the Phase-1 the technical partner. See Section Technical Objective and Approach) Task 3: Integrate parallel subroutines into Barracuda (See Section Results from Phase-1 and its subsections) Task 4: Testing, refinement, and optimization of parallel methodology (See Section Results from Phase-1 and Section Result Comparison Program) Task 5: Integrate Phase-1 parallel subroutines into Barracuda and release (See Section Results from Phase-1 and its subsections) Task 6: Roadmap of Phase-2 (See Section Plan for Phase-2) With the completion of Phase 1 we have the base understanding to completely parallelize Barracuda. An overview of the work to move Barracuda to a parallelized code is given in Plan for Phase-2.« less
FX-87 performance measurements: data-flow implementation. Technical report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hammel, R.T.; Gifford, D.K.
1988-11-01
This report documents a series of experiments performed to explore the thesis that the FX-87 effect system permits a compiler to schedule imperative programs (i.e., programs that may contain side-effects) for execution on a parallel computer. The authors analyze how much the FX-87 static effect system can improve the execution times of five benchmark programs on a parallel graph interpreter. Three of their benchmark programs do not use side-effects (factorial, fibonacci, and polynomial division) and thus did not have any effect-induced constraints. Their FX-87 performance was comparable to their performance in a purely functional language. Two of the benchmark programsmore » use side effects (DNA sequence matching and Scheme interpretation) and the compiler was able to use effect information to reduce their execution times by factors of 1.7 to 5.4 when compared with sequential execution times. These results support the thesis that a static effect system is a powerful tool for compilation to multiprocessor computers. However, the graph interpreter we used was based on unrealistic assumptions, and thus our results may not accurately reflect the performance of a practical FX-87 implementation. The results also suggest that conventional loop analysis would complement the FX-87 effect system« less
I/O-Efficient Scientific Computation Using TPIE
NASA Technical Reports Server (NTRS)
Vengroff, Darren Erik; Vitter, Jeffrey Scott
1996-01-01
In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.
Vendrell, Oriol; Brill, Michael; Gatti, Fabien; Lauvergnat, David; Meyer, Hans-Dieter
2009-06-21
Quantum dynamical calculations are reported for the zero point energy, several low-lying vibrational states, and the infrared spectrum of the H(5)O(2)(+) cation. The calculations are performed by the multiconfiguration time-dependent Hartree (MCTDH) method. A new vector parametrization based on a mixed Jacobi-valence description of the system is presented. With this parametrization the potential energy surface coupling is reduced with respect to a full Jacobi description, providing a better convergence of the n-mode representation of the potential. However, new coupling terms appear in the kinetic energy operator. These terms are derived and discussed. A mode-combination scheme based on six combined coordinates is used, and the representation of the 15-dimensional potential in terms of a six-combined mode cluster expansion including up to some 7-dimensional grids is discussed. A statistical analysis of the accuracy of the n-mode representation of the potential at all orders is performed. Benchmark, fully converged results are reported for the zero point energy, which lie within the statistical uncertainty of the reference diffusion Monte Carlo result for this system. Some low-lying vibrationally excited eigenstates are computed by block improved relaxation, illustrating the applicability of the approach to large systems. Benchmark calculations of the linear infrared spectrum are provided, and convergence with increasing size of the time-dependent basis and as a function of the order of the n-mode representation is studied. The calculations presented here make use of recent developments in the parallel version of the MCTDH code, which are briefly discussed. We also show that the infrared spectrum can be computed, to a very good approximation, within D(2d) symmetry, instead of the G(16) symmetry used before, in which the complete rotation of one water molecule with respect to the other is allowed, thus simplifying the dynamical problem.
Dongarra, Jack; Heroux, Michael A.; Luszczek, Piotr
2015-08-17
Here, we describe a new high-performance conjugate-gradient (HPCG) benchmark. HPCG is composed of computations and data-access patterns commonly found in scientific applications. HPCG strives for a better correlation to existing codes from the computational science domain and to be representative of their performance. Furthermore, HPCG is meant to help drive the computer system design and implementation in directions that will better impact future performance improvement.
NASA Astrophysics Data System (ADS)
Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu
2015-07-01
The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
Nouraei, S A R; Hudovsky, A; Frampton, A E; Mufti, U; White, N B; Wathen, C G; Sandhu, G S; Darzi, A
2015-06-01
Clinical coding is the translation of clinical activity into a coded language. Coded data drive hospital reimbursement and are used for audit and research, and benchmarking and outcomes management purposes. We undertook a 2-center audit of coding accuracy across surgery. Clinician-auditor multidisciplinary teams reviewed the coding of 30,127 patients and assessed accuracy at primary and secondary diagnosis and procedure levels, morbidity level, complications assignment, and financial variance. Postaudit data of a randomly selected sample of 400 cases were reaudited by an independent team. At least 1 coding change occurred in 15,402 patients (51%). There were 3911 (13%) and 3620 (12%) changes to primary diagnoses and procedures, respectively. In 5183 (17%) patients, the Health Resource Grouping changed, resulting in income variance of £3,974,544 (+6.2%). The morbidity level changed in 2116 (7%) patients (P < 0.001). The number of assigned complications rose from 2597 (8.6%) to 2979 (9.9%) (P < 0.001). Reaudit resulted in further primary diagnosis and procedure changes in 8.7% and 4.8% of patients, respectively. The coded data are a key engine for knowledge-driven health care provision. They are used, increasingly at individual surgeon level, to benchmark performance. Surgical clinical coding is prone to subjectivity, variability, and error (SVE). Having a specialty-by-specialty understanding of the nature and clinical significance of informatics variability and adopting strategies to reduce it, are necessary to allow accurate assumptions and informed decisions to be made concerning the scope and clinical applicability of administrative data in surgical outcomes improvement.
Final report for the Tera Computer TTI CRADA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davidson, G.S.; Pavlakos, C.; Silva, C.
1997-01-01
Tera Computer and Sandia National Laboratories have completed a CRADA, which examined the Tera Multi-Threaded Architecture (MTA) for use with large codes of importance to industry and DOE. The MTA is an innovative architecture that uses parallelism to mask latency between memories and processors. The physical implementation is a parallel computer with high cross-section bandwidth and GaAs processors designed by Tera, which support many small computation threads and fast, lightweight context switches between them. When any thread blocks while waiting for memory accesses to complete, another thread immediately begins execution so that high CPU utilization is maintained. The Tera MTAmore » parallel computer has a single, global address space, which is appealing when porting existing applications to a parallel computer. This ease of porting is further enabled by compiler technology that helps break computations into parallel threads. DOE and Sandia National Laboratories were interested in working with Tera to further develop this computing concept. While Tera Computer would continue the hardware development and compiler research, Sandia National Laboratories would work with Tera to ensure that their compilers worked well with important Sandia codes, most particularly CTH, a shock physics code used for weapon safety computations. In addition to that important code, Sandia National Laboratories would complete research on a robotic path planning code, SANDROS, which is important in manufacturing applications, and would evaluate the MTA performance on this code. Finally, Sandia would work directly with Tera to develop 3D visualization codes, which would be appropriate for use with the MTA. Each of these tasks has been completed to the extent possible, given that Tera has just completed the MTA hardware. All of the CRADA work had to be done on simulators.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rouxelin, Pascal Nicolas; Strydom, Gerhard
Best-estimate plus uncertainty analysis of reactors is replacing the traditional conservative (stacked uncertainty) method for safety and licensing analysis. To facilitate uncertainty analysis applications, a comprehensive approach and methodology must be developed and applied. High temperature gas cooled reactors (HTGRs) have several features that require techniques not used in light-water reactor analysis (e.g., coated-particle design and large graphite quantities at high temperatures). The International Atomic Energy Agency has therefore launched the Coordinated Research Project on HTGR Uncertainty Analysis in Modeling to study uncertainty propagation in the HTGR analysis chain. The benchmark problem defined for the prismatic design is represented bymore » the General Atomics Modular HTGR 350. The main focus of this report is the compilation and discussion of the results obtained for various permutations of Exercise I 2c and the use of the cross section data in Exercise II 1a of the prismatic benchmark, which is defined as the last and first steps of the lattice and core simulation phases, respectively. The report summarizes the Idaho National Laboratory (INL) best estimate results obtained for Exercise I 2a (fresh single-fuel block), Exercise I 2b (depleted single-fuel block), and Exercise I 2c (super cell) in addition to the first results of an investigation into the cross section generation effects for the super-cell problem. The two dimensional deterministic code known as the New ESC based Weighting Transport (NEWT) included in the Standardized Computer Analyses for Licensing Evaluation (SCALE) 6.1.2 package was used for the cross section evaluation, and the results obtained were compared to the three dimensional stochastic SCALE module KENO VI. The NEWT cross section libraries were generated for several permutations of the current benchmark super-cell geometry and were then provided as input to the Phase II core calculation of the stand alone neutronics Exercise II 1a. The steady state core calculations were simulated with the INL coupled-code system known as the Parallel and Highly Innovative Simulation for INL Code System (PHISICS) and the system thermal-hydraulics code known as the Reactor Excursion and Leak Analysis Program (RELAP) 5 3D using the nuclear data libraries previously generated with NEWT. It was observed that significant differences in terms of multiplication factor and neutron flux exist between the various permutations of the Phase I super-cell lattice calculations. The use of these cross section libraries only leads to minor changes in the Phase II core simulation results for fresh fuel but shows significantly larger discrepancies for spent fuel cores. Furthermore, large incongruities were found between the SCALE NEWT and KENO VI results for the super cells, and while some trends could be identified, a final conclusion on this issue could not yet be reached. This report will be revised in mid 2016 with more detailed analyses of the super-cell problems and their effects on the core models, using the latest version of SCALE (6.2). The super-cell models seem to show substantial improvements in terms of neutron flux as compared to single-block models, particularly at thermal energies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, T; Lin, H; Xu, X
Purpose: (1) To perform phase space (PS) based source modeling for Tomotherapy and Varian TrueBeam 6 MV Linacs, (2) to examine the accuracy and performance of the ARCHER Monte Carlo code on a heterogeneous computing platform with Many Integrated Core coprocessors (MIC, aka Xeon Phi) and GPUs, and (3) to explore the software micro-optimization methods. Methods: The patient-specific source of Tomotherapy and Varian TrueBeam Linacs was modeled using the PS approach. For the helical Tomotherapy case, the PS data were calculated in our previous study (Su et al. 2014 41(7) Medical Physics). For the single-view Varian TrueBeam case, we analyticallymore » derived them from the raw patient-independent PS data in IAEA’s database, partial geometry information of the jaw and MLC as well as the fluence map. The phantom was generated from DICOM images. The Monte Carlo simulation was performed by ARCHER-MIC and GPU codes, which were benchmarked against a modified parallel DPM code. Software micro-optimization was systematically conducted, and was focused on SIMD vectorization of tight for-loops and data prefetch, with the ultimate goal of increasing 512-bit register utilization and reducing memory access latency. Results: Dose calculation was performed for two clinical cases, a Tomotherapy-based prostate cancer treatment and a TrueBeam-based left breast treatment. ARCHER was verified against the DPM code. The statistical uncertainty of the dose to the PTV was less than 1%. Using double-precision, the total wall time of the multithreaded CPU code on a X5650 CPU was 339 seconds for the Tomotherapy case and 131 seconds for the TrueBeam, while on 3 5110P MICs it was reduced to 79 and 59 seconds, respectively. The single-precision GPU code on a K40 GPU took 45 seconds for the Tomotherapy dose calculation. Conclusion: We have extended ARCHER, the MIC and GPU-based Monte Carlo dose engine to Tomotherapy and Truebeam dose calculations.« less
Benchmarking MARS (accident management software) with the Browns Ferry fire
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dawson, S.M.; Liu, L.Y.; Raines, J.C.
1992-01-01
The MAAP Accident Response System (MARS) is a userfriendly computer software developed to provide management and engineering staff with the most needed insights, during actual or simulated accidents, of the current and future conditions of the plant based on current plant data and its trends. To demonstrate the reliability of the MARS code in simulatng a plant transient, MARS is being benchmarked with the available reactor pressure vessel (RPV) pressure and level data from the Browns Ferry fire. The MRS software uses the Modular Accident Analysis Program (MAAP) code as its basis to calculate plant response under accident conditions. MARSmore » uses a limited set of plant data to initialize and track the accidnt progression. To perform this benchmark, a simulated set of plant data was constructed based on actual report data containing the information necessary to initialize MARS and keep track of plant system status throughout the accident progression. The initial Browns Ferry fire data were produced by performing a MAAP run to simulate the accident. The remaining accident simulation used actual plant data.« less
Trinary signed-digit arithmetic using an efficient encoding scheme
NASA Astrophysics Data System (ADS)
Salim, W. Y.; Alam, M. S.; Fyath, R. S.; Ali, S. A.
2000-09-01
The trinary signed-digit (TSD) number system is of interest for ultrafast optoelectronic computing systems since it permits parallel carry-free addition and borrow-free subtraction of two arbitrary length numbers in constant time. In this paper, a simple coding scheme is proposed to encode the decimal number directly into the TSD form. The coding scheme enables one to perform parallel one-step TSD arithmetic operation. The proposed coding scheme uses only a 5-combination coding table instead of the 625-combination table reported recently for recoded TSD arithmetic technique.
One-step trinary signed-digit arithmetic using an efficient encoding scheme
NASA Astrophysics Data System (ADS)
Salim, W. Y.; Fyath, R. S.; Ali, S. A.; Alam, Mohammad S.
2000-11-01
The trinary signed-digit (TSD) number system is of interest for ultra fast optoelectronic computing systems since it permits parallel carry-free addition and borrow-free subtraction of two arbitrary length numbers in constant time. In this paper, a simple coding scheme is proposed to encode the decimal number directly into the TSD form. The coding scheme enables one to perform parallel one-step TSD arithmetic operation. The proposed coding scheme uses only a 5-combination coding table instead of the 625-combination table reported recently for recoded TSD arithmetic technique.
Xyce parallel electronic simulator users guide, version 6.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Adaptive Grid Refinement for Atmospheric Boundary Layer Simulations
NASA Astrophysics Data System (ADS)
van Hooft, Antoon; van Heerwaarden, Chiel; Popinet, Stephane; van der linden, Steven; de Roode, Stephan; van de Wiel, Bas
2017-04-01
We validate and benchmark an adaptive mesh refinement (AMR) algorithm for numerical simulations of the atmospheric boundary layer (ABL). The AMR technique aims to distribute the computational resources efficiently over a domain by refining and coarsening the numerical grid locally and in time. This can be beneficial for studying cases in which length scales vary significantly in time and space. We present the results for a case describing the growth and decay of a convective boundary layer. The AMR results are benchmarked against two runs using a fixed, fine meshed grid. First, with the same numerical formulation as the AMR-code and second, with a code dedicated to ABL studies. Compared to the fixed and isotropic grid runs, the AMR algorithm can coarsen and refine the grid such that accurate results are obtained whilst using only a fraction of the grid cells. Performance wise, the AMR run was cheaper than the fixed and isotropic grid run with similar numerical formulations. However, for this specific case, the dedicated code outperformed both aforementioned runs.
Cove benchmark calculations using SAGUARO and FEMTRAN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eaton, R.R.; Martinez, M.J.
1986-10-01
Three small-scale, time-dependent, benchmarking calculations have been made using the finite element codes SAGUARO, to determine hydraulic head and water velocity profiles, and FEMTRAN, to predict the solute transport. Sand and hard rock porous materials were used. Time scales for the problems, which ranged from tens of hours to thousands of years, have posed no particular diffculty for the two codes. Studies have been performed to determine the effects of computational mesh, boundary conditions, velocity formulation and SAGUARO/FEMTRAN code-coupling on water and solute transport. Results showed that mesh refinement improved mass conservation. Varying the drain-tile size in COVE 1N hadmore » a weak effect on the rate at which the tile field drained. Excellent agreement with published COVE 1N data was obtained for the hydrological field and reasonable agreement for the solute-concentration predictions. The question remains whether these types of calculations can be carried out on repository-scale problems using material characteristic curves representing tuff with fractures.« less
The MCUCN simulation code for ultracold neutron physics
NASA Astrophysics Data System (ADS)
Zsigmond, G.
2018-02-01
Ultracold neutrons (UCN) have very low kinetic energies 0-300 neV, thereby can be stored in specific material or magnetic confinements for many hundreds of seconds. This makes them a very useful tool in probing fundamental symmetries of nature (for instance charge-parity violation by neutron electric dipole moment experiments) and contributing important parameters for the Big Bang nucleosynthesis (neutron lifetime measurements). Improved precision experiments are in construction at new and planned UCN sources around the world. MC simulations play an important role in the optimization of such systems with a large number of parameters, but also in the estimation of systematic effects, in benchmarking of analysis codes, or as part of the analysis. The MCUCN code written at PSI has been extensively used for the optimization of the UCN source optics and in the optimization and analysis of (test) experiments within the nEDM project based at PSI. In this paper we present the main features of MCUCN and interesting benchmark and application examples.
Transferring ecosystem simulation codes to supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1995-01-01
Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew
2014-01-01
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
PETSc Users Manual Revision 3.3
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.; Brown, J.; Buschelman, K.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
PETSc Users Manual Revision 3.4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.; Brown, J.; Buschelman, K.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
PETSc Users Manual Revision 3.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself. ;For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
Alvioli, M.; Baum, R.L.
2016-01-01
We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.
DYNA3D/ParaDyn Regression Test Suite Inventory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Jerry I.
2016-09-01
The following table constitutes an initial assessment of feature coverage across the regression test suite used for DYNA3D and ParaDyn. It documents the regression test suite at the time of preliminary release 16.1 in September 2016. The columns of the table represent groupings of functionalities, e.g., material models. Each problem in the test suite is represented by a row in the table. All features exercised by the problem are denoted by a check mark (√) in the corresponding column. The definition of “feature” has not been subdivided to its smallest unit of user input, e.g., algorithmic parameters specific to amore » particular type of contact surface. This represents a judgment to provide code developers and users a reasonable impression of feature coverage without expanding the width of the table by several multiples. All regression testing is run in parallel, typically with eight processors, except problems involving features only available in serial mode. Many are strictly regression tests acting as a check that the codes continue to produce adequately repeatable results as development unfolds; compilers change and platforms are replaced. A subset of the tests represents true verification problems that have been checked against analytical or other benchmark solutions. Users are welcomed to submit documented problems for inclusion in the test suite, especially if they are heavily exercising, and dependent upon, features that are currently underrepresented.« less
NASA Technical Reports Server (NTRS)
Krueger, Ronald
2012-01-01
The development of benchmark examples for quasi-static delamination propagation prediction is presented. The example is based on a finite element model of the Mixed-Mode Bending (MMB) specimen for 50% mode II. The benchmarking is demonstrated for Abaqus/Standard, however, the example is independent of the analysis software used and allows the assessment of the automated delamination propagation prediction capability in commercial finite element codes based on the virtual crack closure technique (VCCT). First, a quasi-static benchmark example was created for the specimen. Second, starting from an initially straight front, the delamination was allowed to propagate under quasi-static loading. Third, the load-displacement as well as delamination length versus applied load/displacement relationships from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Overall, the results are encouraging, but further assessment for mixed-mode delamination fatigue onset and growth is required.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.; Sterbentz, James W.; Snoj, Luka
PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Lightweight Specifications for Parallel Correctness
2012-12-05
Galenson, Benjamin Hindman, Thibaud Hottelier, Pallavi Joshi, Ben- jamin Lipshitz, Leo Meyerovich, Mayur Naik, Chang-Seo Park, and Philip Reames — many...violating executions. We discuss some of these errors in detail in the CHAPTER 5. SPECIFYING AND CHECKING SEMANTIC ATOMICITY 84 Benchmark Approx. LoC
Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian
The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functionalmore » characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.« less
JASMIN: Japanese-American study of muon interactions and neutron detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakashima, Hiroshi; /JAEA, Ibaraki; Mokhov, N.V.
Experimental studies of shielding and radiation effects at Fermi National Accelerator Laboratory (FNAL) have been carried out under collaboration between FNAL and Japan, aiming at benchmarking of simulation codes and study of irradiation effects for upgrade and design of new high-energy accelerator facilities. The purposes of this collaboration are (1) acquisition of shielding data in a proton beam energy domain above 100GeV; (2) further evaluation of predictive accuracy of the PHITS and MARS codes; (3) modification of physics models and data in these codes if needed; (4) establishment of irradiation field for radiation effect tests; and (5) development of amore » code module for improved description of radiation effects. A series of experiments has been performed at the Pbar target station and NuMI facility, using irradiation of targets with 120 GeV protons for antiproton and neutrino production, as well as the M-test beam line (M-test) for measuring nuclear data and detector responses. Various nuclear and shielding data have been measured by activation methods with chemical separation techniques as well as by other detectors such as a Bonner ball counter. Analyses with the experimental data are in progress for benchmarking the PHITS and MARS15 codes. In this presentation recent activities and results are reviewed.« less
An Assessment of Current Fan Noise Prediction Capability
NASA Technical Reports Server (NTRS)
Envia, Edmane; Woodward, Richard P.; Elliott, David M.; Fite, E. Brian; Hughes, Christopher E.; Podboy, Gary G.; Sutliff, Daniel L.
2008-01-01
In this paper, the results of an extensive assessment exercise carried out to establish the current state of the art for predicting fan noise at NASA are presented. Representative codes in the empirical, analytical, and computational categories were exercised and assessed against a set of benchmark acoustic data obtained from wind tunnel tests of three model scale fans. The chosen codes were ANOPP, representing an empirical capability, RSI, representing an analytical capability, and LINFLUX, representing a computational aeroacoustics capability. The selected benchmark fans cover a wide range of fan pressure ratios and fan tip speeds, and are representative of modern turbofan engine designs. The assessment results indicate that the ANOPP code can predict fan noise spectrum to within 4 dB of the measurement uncertainty band on a third-octave basis for the low and moderate tip speed fans except at extreme aft emission angles. The RSI code can predict fan broadband noise spectrum to within 1.5 dB of experimental uncertainty band provided the rotor-only contribution is taken into account. The LINFLUX code can predict interaction tone power levels to within experimental uncertainties at low and moderate fan tip speeds, but could deviate by as much as 6.5 dB outside the experimental uncertainty band at the highest tip speeds in some case.
An Expert System for the Development of Efficient Parallel Code
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit
2004-01-01
We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring
NASA Technical Reports Server (NTRS)
Padovan, Joe; Kwang, Abel
1994-01-01
This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Interactive visual optimization and analysis for RFID benchmarking.
Wu, Yingcai; Chung, Ka-Kei; Qu, Huamin; Yuan, Xiaoru; Cheung, S C
2009-01-01
Radio frequency identification (RFID) is a powerful automatic remote identification technique that has wide applications. To facilitate RFID deployment, an RFID benchmarking instrument called aGate has been invented to identify the strengths and weaknesses of different RFID technologies in various environments. However, the data acquired by aGate are usually complex time varying multidimensional 3D volumetric data, which are extremely challenging for engineers to analyze. In this paper, we introduce a set of visualization techniques, namely, parallel coordinate plots, orientation plots, a visual history mechanism, and a 3D spatial viewer, to help RFID engineers analyze benchmark data visually and intuitively. With the techniques, we further introduce two workflow procedures (a visual optimization procedure for finding the optimum reader antenna configuration and a visual analysis procedure for comparing the performance and identifying the flaws of RFID devices) for the RFID benchmarking, with focus on the performance analysis of the aGate system. The usefulness and usability of the system are demonstrated in the user evaluation.
Navier-Stokes analysis of a liquid rocket engine disk cavity
NASA Technical Reports Server (NTRS)
Benjamin, Theodore G.; Mcconnaughey, Paul K.
1991-01-01
This paper presents a Navier-Stokes analysis of hydrodynamic phenomena occurring in the aft disk cavity of a liquid rocket engine turbine. The cavity analyzed in the Space Shuttle Main Engine Alternate Turbopump currently being developed by NASA and Pratt and Whitney. Comparison of results obtained from the Navier-Stokes code for two rotating disk datasets available in the literature are presented as benchmark validations. The benchmark results obtained using the code show good agreement relative to experimental data, and the turbine disk cavity was analyzed with comparable grid resolution, dissipation levels, and turbulence models. Predicted temperatures in the cavity show that little mixing of hot and cold fluid occurs in the cavity and the flow is dominated by swirl and pumping up the rotating disk.
Scalable randomized benchmarking of non-Clifford gates
NASA Astrophysics Data System (ADS)
Cross, Andrew; Magesan, Easwar; Bishop, Lev; Smolin, John; Gambetta, Jay
Randomized benchmarking is a widely used experimental technique to characterize the average error of quantum operations. Benchmarking procedures that scale to enable characterization of n-qubit circuits rely on efficient procedures for manipulating those circuits and, as such, have been limited to subgroups of the Clifford group. However, universal quantum computers require additional, non-Clifford gates to approximate arbitrary unitary transformations. We define a scalable randomized benchmarking procedure over n-qubit unitary matrices that correspond to protected non-Clifford gates for a class of stabilizer codes. We present efficient methods for representing and composing group elements, sampling them uniformly, and synthesizing corresponding poly (n) -sized circuits. The procedure provides experimental access to two independent parameters that together characterize the average gate fidelity of a group element. We acknowledge support from ARO under Contract W911NF-14-1-0124.
Gadolinia depletion analysis by CASMO-4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kobayashi, Y.; Saji, E.; Toba, A.
1993-01-01
CASMO-4 is the most recent version of the lattice physics code CASMO introduced by Studsvik. The principal aspects of the CASMO-4 model that differ from the models in previous CASMO versions are as follows: (1) heterogeneous model for two-dimensional transport theory calculations; and (2) microregion depletion model for burnable absorbers, such as gadolinia. Of these aspects, the first has previously been benchmarked against measured data of critical experiments and Monte Carlo calculations, verifying the high degree of accuracy. To proceed with CASMO-4 benchmarking, it is desirable to benchmark the microregion depletion model, which enables CASMO-4 to calculate gadolinium depletion directlymore » without the need for precalculated MICBURN cross-section data. This paper presents the benchmarking results for the microregion depletion model in CASMO-4 using the measured data of depleted gadolinium rods.« less
Benchmark Problems of the Geothermal Technologies Office Code Comparison Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Mark D.; Podgorney, Robert; Kelkar, Sharad M.
A diverse suite of numerical simulators is currently being applied to predict or understand the performance of enhanced geothermal systems (EGS). To build confidence and identify critical development needs for these analytical tools, the United States Department of Energy, Geothermal Technologies Office has sponsored a Code Comparison Study (GTO-CCS), with participants from universities, industry, and national laboratories. A principal objective for the study was to create a community forum for improvement and verification of numerical simulators for EGS modeling. Teams participating in the study were those representing U.S. national laboratories, universities, and industries, and each team brought unique numerical simulationmore » capabilities to bear on the problems. Two classes of problems were developed during the study, benchmark problems and challenge problems. The benchmark problems were structured to test the ability of the collection of numerical simulators to solve various combinations of coupled thermal, hydrologic, geomechanical, and geochemical processes. This class of problems was strictly defined in terms of properties, driving forces, initial conditions, and boundary conditions. Study participants submitted solutions to problems for which their simulation tools were deemed capable or nearly capable. Some participating codes were originally developed for EGS applications whereas some others were designed for different applications but can simulate processes similar to those in EGS. Solution submissions from both were encouraged. In some cases, participants made small incremental changes to their numerical simulation codes to address specific elements of the problem, and in other cases participants submitted solutions with existing simulation tools, acknowledging the limitations of the code. The challenge problems were based on the enhanced geothermal systems research conducted at Fenton Hill, near Los Alamos, New Mexico, between 1974 and 1995. The problems involved two phases of research, stimulation, development, and circulation in two separate reservoirs. The challenge problems had specific questions to be answered via numerical simulation in three topical areas: 1) reservoir creation/stimulation, 2) reactive and passive transport, and 3) thermal recovery. Whereas the benchmark class of problems were designed to test capabilities for modeling coupled processes under strictly specified conditions, the stated objective for the challenge class of problems was to demonstrate what new understanding of the Fenton Hill experiments could be realized via the application of modern numerical simulation tools by recognized expert practitioners.« less
Deterministic Modeling of the High Temperature Test Reactor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ortensi, J.; Cogliati, J. J.; Pope, M. A.
2010-06-01
Idaho National Laboratory (INL) is tasked with the development of reactor physics analysis capability of the Next Generation Nuclear Power (NGNP) project. In order to examine INL’s current prismatic reactor deterministic analysis tools, the project is conducting a benchmark exercise based on modeling the High Temperature Test Reactor (HTTR). This exercise entails the development of a model for the initial criticality, a 19 column thin annular core, and the fully loaded core critical condition with 30 columns. Special emphasis is devoted to the annular core modeling, which shares more characteristics with the NGNP base design. The DRAGON code is usedmore » in this study because it offers significant ease and versatility in modeling prismatic designs. Despite some geometric limitations, the code performs quite well compared to other lattice physics codes. DRAGON can generate transport solutions via collision probability (CP), method of characteristics (MOC), and discrete ordinates (Sn). A fine group cross section library based on the SHEM 281 energy structure is used in the DRAGON calculations. HEXPEDITE is the hexagonal z full core solver used in this study and is based on the Green’s Function solution of the transverse integrated equations. In addition, two Monte Carlo (MC) based codes, MCNP5 and PSG2/SERPENT, provide benchmarking capability for the DRAGON and the nodal diffusion solver codes. The results from this study show a consistent bias of 2–3% for the core multiplication factor. This systematic error has also been observed in other HTTR benchmark efforts and is well documented in the literature. The ENDF/B VII graphite and U235 cross sections appear to be the main source of the error. The isothermal temperature coefficients calculated with the fully loaded core configuration agree well with other benchmark participants but are 40% higher than the experimental values. This discrepancy with the measurement stems from the fact that during the experiments the control rods were adjusted to maintain criticality, whereas in the model, the rod positions were fixed. In addition, this work includes a brief study of a cross section generation approach that seeks to decouple the domain in order to account for neighbor effects. This spectral interpenetration is a dominant effect in annular HTR physics. This analysis methodology should be further explored in order to reduce the error that is systematically propagated in the traditional generation of cross sections.« less
Experimental benchmark of kinetic simulations of capacitively coupled plasmas in molecular gases
NASA Astrophysics Data System (ADS)
Donkó, Z.; Derzsi, A.; Korolov, I.; Hartmann, P.; Brandt, S.; Schulze, J.; Berger, B.; Koepke, M.; Bruneau, B.; Johnson, E.; Lafleur, T.; Booth, J.-P.; Gibson, A. R.; O'Connell, D.; Gans, T.
2018-01-01
We discuss the origin of uncertainties in the results of numerical simulations of low-temperature plasma sources, focusing on capacitively coupled plasmas. These sources can be operated in various gases/gas mixtures, over a wide domain of excitation frequency, voltage, and gas pressure. At low pressures, the non-equilibrium character of the charged particle transport prevails and particle-based simulations become the primary tools for their numerical description. The particle-in-cell method, complemented with Monte Carlo type description of collision processes, is a well-established approach for this purpose. Codes based on this technique have been developed by several authors/groups, and have been benchmarked with each other in some cases. Such benchmarking demonstrates the correctness of the codes, but the underlying physical model remains unvalidated. This is a key point, as this model should ideally account for all important plasma chemical reactions as well as for the plasma-surface interaction via including specific surface reaction coefficients (electron yields, sticking coefficients, etc). In order to test the models rigorously, comparison with experimental ‘benchmark data’ is necessary. Examples will be given regarding the studies of electron power absorption modes in O2, and CF4-Ar discharges, as well as on the effect of modifications of the parameters of certain elementary processes on the computed discharge characteristics in O2 capacitively coupled plasmas.
Fingerprinting sea-level variations in response to continental ice loss: a benchmark exercise
NASA Astrophysics Data System (ADS)
Barletta, Valentina R.; Spada, Giorgio; Riva, Riccardo E. M.; James, Thomas S.; Simon, Karen M.; van der Wal, Wouter; Martinec, Zdenek; Klemann, Volker; Olsson, Per-Anders; Hagedoorn, Jan; Stocchi, Paolo; Vermeersen, Bert
2013-04-01
Understanding the response of the Earth to the waxing and waning ice sheets is crucial in various contexts, ranging from the interpretation of modern satellite geodetic measurements to the projections of future sea level trends in response to climate change. All the processes accompanying Glacial Isostatic Adjustment (GIA) can be described solving the so-called Sea Level Equation (SLE), an integral equation that accounts for the interactions between the ice sheets, the solid Earth, and the oceans. Modern approaches to the SLE are based on various techniques that range from purely analytical formulations to fully numerical methods. Here we present the results of a benchmark exercise of independently developed codes designed to solve the SLE. The study involves predictions of current sea level changes due to present-day ice mass loss. In spite of the differences in the methods employed, the comparison shows that a significant number of GIA modellers can reproduce their sea-level computations within 2% for well defined, large-scale present-day ice mass changes. Smaller and more detailed loads need further and dedicated benchmarking and high resolution computation. This study shows how the details of the implementation and the inputs specifications are an important, and often underappreciated, aspect. Hence this represents a step toward the assessment of reliability of sea level projections obtained with benchmarked SLE codes.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Rizk, Yehia M.
1999-01-01
The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cohen, J; Dossa, D; Gokhale, M
Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
ERIC Educational Resources Information Center
von Davier, Matthias
2016-01-01
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
NASA Astrophysics Data System (ADS)
Chan, Chia-Hsin; Tu, Chun-Chuan; Tsai, Wen-Jiin
2017-01-01
High efficiency video coding (HEVC) not only improves the coding efficiency drastically compared to the well-known H.264/AVC but also introduces coding tools for parallel processing, one of which is tiles. Tile partitioning is allowed to be arbitrary in HEVC, but how to decide tile boundaries remains an open issue. An adaptive tile boundary (ATB) method is proposed to select a better tile partitioning to improve load balancing (ATB-LoadB) and coding efficiency (ATB-Gain) with a unified scheme. Experimental results show that, compared to ordinary uniform-space partitioning, the proposed ATB can save up to 17.65% of encoding times in parallel encoding scenarios and can reduce up to 0.8% of total bit rates for coding efficiency.