core computational benchmark: Topics by Science.gov

Sample records for core computational benchmark

Sequoia Messaging Rate Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Friedley, Andrew

2008-01-22

The purpose of this benchmark is to measure the maximal message rate of a single compute node. The first num_cores ranks are expected to reside on the 'core' compute node for which message rate is being tested. After that, the next num_nbors ranks are neighbors for the first core rank, the next set of num_nbors ranks are neighbors for the second core rank, and so on. For example, testing an 8-core node (num_cores = 8) with 4 neighbors (num_nbors = 4) requires 8 + 8 * 4 - 40 ranks. The first 8 of those 40 ranks are expected tomore » be on the 'core' node being benchmarked, while the rest of the ranks are on separate nodes.« less
Multi-Core Processor Memory Contention Benchmark Analysis Case Study

NASA Technical Reports Server (NTRS)

Simon, Tyler; McGalliard, James

2009-01-01

Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
Present Status and Extensions of the Monte Carlo Performance Benchmark

NASA Astrophysics Data System (ADS)

Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.

2014-06-01

The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
A benchmarking tool to evaluate computer tomography perfusion infarct core predictions against a DWI standard.

PubMed

Cereda, Carlo W; Christensen, Søren; Campbell, Bruce Cv; Mishra, Nishant K; Mlynash, Michael; Levi, Christopher; Straka, Matus; Wintermark, Max; Bammer, Roland; Albers, Gregory W; Parsons, Mark W; Lansberg, Maarten G

2016-10-01

Differences in research methodology have hampered the optimization of Computer Tomography Perfusion (CTP) for identification of the ischemic core. We aim to optimize CTP core identification using a novel benchmarking tool. The benchmarking tool consists of an imaging library and a statistical analysis algorithm to evaluate the performance of CTP. The tool was used to optimize and evaluate an in-house developed CTP-software algorithm. Imaging data of 103 acute stroke patients were included in the benchmarking tool. Median time from stroke onset to CT was 185 min (IQR 180-238), and the median time between completion of CT and start of MRI was 36 min (IQR 25-79). Volumetric accuracy of the CTP-ROIs was optimal at an rCBF threshold of <38%; at this threshold, the mean difference was 0.3 ml (SD 19.8 ml), the mean absolute difference was 14.3 (SD 13.7) ml, and CTP was 67% sensitive and 87% specific for identification of DWI positive tissue voxels. The benchmarking tool can play an important role in optimizing CTP software as it provides investigators with a novel method to directly compare the performance of alternative CTP software packages. © The Author(s) 2015.
Benchmarking high performance computing architectures with CMS’ skeleton framework

NASA Astrophysics Data System (ADS)

Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

2017-10-01

In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
Benchmarking high performance computing architectures with CMS’ skeleton framework

DOE PAGES

Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

2017-11-23

Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking NWP Kernels on Multi- and Many-core Processors

NASA Astrophysics Data System (ADS)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Comprehensive Benchmark Suite for Simulation of Particle Laden Flows Using the Discrete Element Method with Performance Profiles from the Multiphase Flow with Interface eXchanges (MFiX) Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Peiyuan; Brown, Timothy; Fullmer, William D.

Five benchmark problems are developed and simulated with the computational fluid dynamics and discrete element model code MFiX. The benchmark problems span dilute and dense regimes, consider statistically homogeneous and inhomogeneous (both clusters and bubbles) particle concentrations and a range of particle and fluid dynamic computational loads. Several variations of the benchmark problems are also discussed to extend the computational phase space to cover granular (particles only), bidisperse and heat transfer cases. A weak scaling analysis is performed for each benchmark problem and, in most cases, the scalability of the code appears reasonable up to approx. 103 cores. Profiling ofmore » the benchmark problems indicate that the most substantial computational time is being spent on particle-particle force calculations, drag force calculations and interpolating between discrete particle and continuum fields. Hardware performance analysis was also carried out showing significant Level 2 cache miss ratios and a rather low degree of vectorization. These results are intended to serve as a baseline for future developments to the code as well as a preliminary indicator of where to best focus performance optimizations.« less
Performance implications from sizing a VM on multi-core systems: A Data analytic application s view

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lim, Seung-Hwan; Horey, James L; Begoli, Edmon

In this paper, we present a quantitative performance analysis of data analytics applications running on multi-core virtual machines. Such environments form the core of cloud computing. In addition, data analytics applications, such as Cassandra and Hadoop, are becoming increasingly popular on cloud computing platforms. This convergence necessitates a better understanding of the performance and cost implications of such hybrid systems. For example, the very rst step in hosting applications in virtualized environments, requires the user to con gure the number of virtual processors and the size of memory. To understand performance implications of this step, we benchmarked three Yahoo Cloudmore » Serving Benchmark (YCSB) workloads in a virtualized multi-core environment. Our measurements indicate that the performance of Cassandra for YCSB workloads does not heavily depend on the processing capacity of a system, while the size of the data set is critical to performance relative to allocated memory. We also identi ed a strong relationship between the running time of workloads and various hardware events (last level cache loads, misses, and CPU migrations). From this analysis, we provide several suggestions to improve the performance of data analytics applications running on cloud computing environments.« less
Benchmark Evaluation of Start-Up and Zero-Power Measurements at the High-Temperature Engineering Test Reactor

DOE PAGES

Bess, John D.; Fujimoto, Nozomu

2014-10-09

Benchmark models were developed to evaluate six cold-critical and two warm-critical, zero-power measurements of the HTTR. Additional measurements of a fully-loaded subcritical configuration, core excess reactivity, shutdown margins, six isothermal temperature coefficients, and axial reaction-rate distributions were also evaluated as acceptable benchmark experiments. Insufficient information is publicly available to develop finely-detailed models of the HTTR as much of the design information is still proprietary. However, the uncertainties in the benchmark models are judged to be of sufficient magnitude to encompass any biases and bias uncertainties incurred through the simplification process used to develop the benchmark models. Dominant uncertainties in themore » experimental keff for all core configurations come from uncertainties in the impurity content of the various graphite blocks that comprise the HTTR. Monte Carlo calculations of keff are between approximately 0.9 % and 2.7 % greater than the benchmark values. Reevaluation of the HTTR models as additional information becomes available could improve the quality of this benchmark and possibly reduce the computational biases. High-quality characterization of graphite impurities would significantly improve the quality of the HTTR benchmark assessment. Simulation of the other reactor physics measurements are in good agreement with the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Summary of comparison and analysis of results from exercises 1 and 2 of the OECD PBMR coupled neutronics/thermal hydraulics transient benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mkhabela, P.; Han, J.; Tyobeka, B.

2006-07-01

The Nuclear Energy Agency (NEA) of the Organization for Economic Cooperation and Development (OECD) has accepted, through the Nuclear Science Committee (NSC), the inclusion of the Pebble-Bed Modular Reactor 400 MW design (PBMR-400) coupled neutronics/thermal hydraulics transient benchmark problem as part of their official activities. The scope of the benchmark is to establish a well-defined problem, based on a common given library of cross sections, to compare methods and tools in core simulation and thermal hydraulics analysis with a specific focus on transient events through a set of multi-dimensional computational test problems. The benchmark includes three steady state exercises andmore » six transient exercises. This paper describes the first two steady state exercises, their objectives and the international participation in terms of organization, country and computer code utilized. This description is followed by a comparison and analysis of the participants' results submitted for these two exercises. The comparison of results from different codes allows for an assessment of the sensitivity of a result to the method employed and can thus help to focus the development efforts on the most critical areas. The two first exercises also allow for removing of user-related modeling errors and prepare core neutronics and thermal-hydraulics models of the different codes for the rest of the exercises in the benchmark. (authors)« less
ICT Proficiency and Gender: A Validation on Training and Development

ERIC Educational Resources Information Center

Lin, Shinyi; Shih, Tse-Hua; Lu, Ruiling

2013-01-01

Use of innovative learning/instruction mode, embedded in the Certification Pathway System (CPS) developed by Certiport TM, is geared toward Internet and Computing Benchmark & Mentor specifically for IC[superscript 3] certification. The Internet and Computing Core Certification (IC[superscript 3]), as an industry-based credentialing program,…
User and Performance Impacts from Franklin Upgrades

DOE Office of Scientific and Technical Information (OSTI.GOV)

He, Yun

2009-05-10

The NERSC flagship computer Cray XT4 system"Franklin" has gone through three major upgrades: quad core upgrade, CLE 2.1 upgrade, and IO upgrade, during the past year. In this paper, we will discuss the various aspects of the user impacts such as user access, user environment, and user issues etc from these upgrades. The performance impacts on the kernel benchmarks and selected application benchmarks will also be presented.
The change of radial power factor distribution due to RCCA insertion at the first cycle core of AP1000

NASA Astrophysics Data System (ADS)

Susilo, J.; Suparlina, L.; Deswandri; Sunaryo, G. R.

2018-02-01

The using of a computer program for the PWR type core neutronic design parameters analysis has been carried out in some previous studies. These studies included a computer code validation on the neutronic parameters data values resulted from measurements and benchmarking calculation. In this study, the AP1000 first cycle core radial power peaking factor validation and analysis were performed using CITATION module of the SRAC2006 computer code. The computer code has been also validated with a good result to the criticality values of VERA benchmark core. The AP1000 core power distribution calculation has been done in two-dimensional X-Y geometry through ¼ section modeling. The purpose of this research is to determine the accuracy of the SRAC2006 code, and also the safety performance of the AP1000 core first cycle operating. The core calculations were carried out with the several conditions, those are without Rod Cluster Control Assembly (RCCA), by insertion of a single RCCA (AO, M1, M2, MA, MB, MC, MD) and multiple insertion RCCA (MA + MB, MA + MB + MC, MA + MB + MC + MD, and MA + MB + MC + MD + M1). The maximum power factor of the fuel rods value in the fuel assembly assumedapproximately 1.406. The calculation results analysis showed that the 2-dimensional CITATION module of SRAC2006 code is accurate in AP1000 power distribution calculation without RCCA and with MA+MB RCCA insertion.The power peaking factor on the first operating cycle of the AP1000 core without RCCA, as well as with single and multiple RCCA are still below in the safety limit values (less then about 1.798). So in terms of thermal power generated by the fuel assembly, then it can be considered that the AP100 core at the first operating cycle is safe.
A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

PubMed

Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

2017-10-01

The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems

DTIC Science & Technology

2017-04-13

modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
Graph 500 on OpenSHMEM: Using a Practical Survey of Past Work to Motivate Novel Algorithmic Developments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grossman, Max; Pritchard Jr., Howard Porter; Budimlic, Zoran

2016-12-22

Graph500 [14] is an effort to offer a standardized benchmark across large-scale distributed platforms which captures the behavior of common communicationbound graph algorithms. Graph500 differs from other large-scale benchmarking efforts (such as HPL [6] or HPGMG [7]) primarily in the irregularity of its computation and data access patterns. The core computational kernel of Graph500 is a breadth-first search (BFS) implemented on an undirected graph. The output of Graph500 is a spanning tree of the input graph, usually represented by a predecessor mapping for every node in the graph. The Graph500 benchmark defines several pre-defined input sizes for implementers to testmore » against. This report summarizes investigation into implementing the Graph500 benchmark on OpenSHMEM, and focuses on first building a strong and practical understanding of the strengths and limitations of past work before proposing and developing novel extensions.« less
Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

A computationally simple model for determining the time dependent spectral neutron flux in a nuclear reactor core

NASA Astrophysics Data System (ADS)

Schneider, E. A.; Deinert, M. R.; Cady, K. B.

2006-10-01

The balance of isotopes in a nuclear reactor core is key to understanding the overall performance of a given fuel cycle. This balance is in turn most strongly affected by the time and energy-dependent neutron flux. While many large and involved computer packages exist for determining this spectrum, a simplified approach amenable to rapid computation is missing from the literature. We present such a model, which accepts as inputs the fuel element/moderator geometry and composition, reactor geometry, fuel residence time and target burnup and we compare it to OECD/NEA benchmarks for homogeneous MOX and UOX LWR cores. Collision probability approximations to the neutron transport equation are used to decouple the spatial and energy variables. The lethargy dependent neutron flux, governed by coupled integral equations for the fuel and moderator/coolant regions is treated by multigroup thermalization methods, and the transport of neutrons through space is modeled by fuel to moderator transport and escape probabilities. Reactivity control is achieved through use of a burnable poison or adjustable control medium. The model calculates the buildup of 24 actinides, as well as fission products, along with the lethargy dependent neutron flux and the results of several simulations are compared with benchmarked standards.
featsel: A framework for benchmarking of feature selection algorithms and cost functions

NASA Astrophysics Data System (ADS)

Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior

In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.
HTR-PROTEUS pebble bed experimental program cores 9 & 10: columnar hexagonal point-on-point packing with a 1:1 moderator-to-fuel pebble ratio

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.

2014-03-01

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
HTR-PROTEUS PEBBLE BED EXPERIMENTAL PROGRAM CORES 5, 6, 7, & 8: COLUMNAR HEXAGONAL POINT-ON-POINT PACKING WITH A 1:2 MODERATOR-TO-FUEL PEBBLE RATIO

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess

2013-03-01

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
HTR-PROTEUS PEBBLE BED EXPERIMENTAL PROGRAM CORES 9 & 10: COLUMNAR HEXAGONAL POINT-ON-POINT PACKING WITH A 1:1 MODERATOR-TO-FUEL PEBBLE RATIO

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess

2013-03-01

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
MC21 analysis of the MIT PWR benchmark: Hot zero power results

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kelly Iii, D. J.; Aviles, B. N.; Herman, B. R.

2013-07-01

MC21 Monte Carlo results have been compared with hot zero power measurements from an operating pressurized water reactor (PWR), as specified in a new full core PWR performance benchmark from the MIT Computational Reactor Physics Group. Included in the comparisons are axially integrated full core detector measurements, axial detector profiles, control rod bank worths, and temperature coefficients. Power depressions from grid spacers are seen clearly in the MC21 results. Application of Coarse Mesh Finite Difference (CMFD) acceleration within MC21 has been accomplished, resulting in a significant reduction of inactive batches necessary to converge the fission source. CMFD acceleration has alsomore » been shown to work seamlessly with the Uniform Fission Site (UFS) variance reduction method. (authors)« less
Experimental Criticality Benchmarks for SNAP 10A/2 Reactor Cores

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krass, A.W.

2005-12-19

This report describes computational benchmark models for nuclear criticality derived from descriptions of the Systems for Nuclear Auxiliary Power (SNAP) Critical Assembly (SCA)-4B experimental criticality program conducted by Atomics International during the early 1960's. The selected experimental configurations consist of fueled SNAP 10A/2-type reactor cores subject to varied conditions of water immersion and reflection under experimental control to measure neutron multiplication. SNAP 10A/2-type reactor cores are compact volumes fueled and moderated with the hydride of highly enriched uranium-zirconium alloy. Specifications for the materials and geometry needed to describe a given experimental configuration for a model using MCNP5 are provided. Themore » material and geometry specifications are adequate to permit user development of input for alternative nuclear safety codes, such as KENO. A total of 73 distinct experimental configurations are described.« less
New core-reflector boundary conditions for transient nodal reactor calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, E.K.; Kim, C.H.; Joo, H.K.

1995-09-01

New core-reflector boundary conditions designed for the exclusion of the reflector region in transient nodal reactor calculations are formulated. Spatially flat frequency approximations for the temporal neutron behavior and two types of transverse leakage approximations in the reflector region are introduced to solve the transverse-integrated time-dependent one-dimensional diffusion equation and then to obtain relationships between net current and flux at the core-reflector interfaces. To examine the effectiveness of new core-reflector boundary conditions in transient nodal reactor computations, nodal expansion method (NEM) computations with and without explicit representation of the reflector are performed for Laboratorium fuer Reaktorregelung und Anlagen (LRA) boilingmore » water reactor (BWR) and Nuclear Energy Agency Committee on Reactor Physics (NEACRP) pressurized water reactor (PWR) rod ejection kinetics benchmark problems. Good agreement between two NEM computations is demonstrated in all the important transient parameters of two benchmark problems. A significant amount of CPU time saving is also demonstrated with the boundary condition model with transverse leakage (BCMTL) approximations in the reflector region. In the three-dimensional LRA BWR, the BCMTL and the explicit reflector model computations differ by {approximately}4% in transient peak power density while the BCMTL results in >40% of CPU time saving by excluding both the axial and the radial reflector regions from explicit computational nodes. In the NEACRP PWR problem, which includes six different transient cases, the largest difference is 24.4% in the transient maximum power in the one-node-per-assembly B1 transient results. This difference in the transient maximum power of the B1 case is shown to reduce to 11.7% in the four-node-per-assembly computations. As for the computing time, BCMTL is shown to reduce the CPU time >20% in all six transient cases of the NEACRP PWR.« less
Initial Performance Results on IBM POWER6

NASA Technical Reports Server (NTRS)

Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

2008-01-01

The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
A Programming Model Performance Study Using the NAS Parallel Benchmarks

DOE PAGES

Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

2010-01-01

Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

NASA Astrophysics Data System (ADS)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Modal analysis and acoustic transmission through offset-core honeycomb sandwich panels

NASA Astrophysics Data System (ADS)

Mathias, Adam Dustin

The work presented in this thesis is motivated by an earlier research that showed that double, offset-core honeycomb sandwich panels increased thermal resistance and, hence, decreased heat transfer through the panels. This result lead to the hypothesis that these panels could be used for acoustic insulation. Using commercial finite element modeling software, COMSOL Multiphysics, the acoustical properties, specifically the transmission loss across a variety of offset-core honeycomb sandwich panels, is studied for the case of a plane acoustic wave impacting the panel at normal incidence. The transmission loss results are compared with those of single-core honeycomb panels with the same cell sizes. The fundamental frequencies of the panels are also computed in an attempt to better understand the vibrational modes of these particular sandwich-structured panels. To ensure that the finite element analysis software is adequate for the task at hand, two relevant benchmark problems are solved and compared with theory. Results from these benchmark results compared well to those obtained from theory. Transmission loss results from the offset-core honeycomb sandwich panels show increased transmission loss, especially for large cell honeycombs when compared to single-core honeycomb panels.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Sterbentz, James W.; Snoj, Luka

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
An MPI-based MoSST core dynamics model

NASA Astrophysics Data System (ADS)

Jiang, Weiyuan; Kuang, Weijia

2008-09-01

Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
Modeling Cardiac Electrophysiology at the Organ Level in the Peta FLOPS Computing Age

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mitchell, Lawrence; Bishop, Martin; Hoetzl, Elena

2010-09-30

Despite a steep increase in available compute power, in-silico experimentation with highly detailed models of the heart remains to be challenging due to the high computational cost involved. It is hoped that next generation high performance computing (HPC) resources lead to significant reductions in execution times to leverage a new class of in-silico applications. However, performance gains with these new platforms can only be achieved by engaging a much larger number of compute cores, necessitating strongly scalable numerical techniques. So far strong scalability has been demonstrated only for a moderate number of cores, orders of magnitude below the range requiredmore » to achieve the desired performance boost.In this study, strong scalability of currently used techniques to solve the bidomain equations is investigated. Benchmark results suggest that scalability is limited to 512-4096 cores within the range of relevant problem sizes even when systems are carefully load-balanced and advanced IO strategies are employed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, J; Dossa, D; Gokhale, M

Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-02-15

We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-05-29

We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Visualization assisted by parallel processing

NASA Astrophysics Data System (ADS)

Lange, B.; Rey, H.; Vasques, X.; Puech, W.; Rodriguez, N.

2011-01-01

This paper discusses the experimental results of our visualization model for data extracted from sensors. The objective of this paper is to find a computationally efficient method to produce a real time rendering visualization for a large amount of data. We develop visualization method to monitor temperature variance of a data center. Sensors are placed on three layers and do not cover all the room. We use particle paradigm to interpolate data sensors. Particles model the "space" of the room. In this work we use a partition of the particle set, using two mathematical methods: Delaunay triangulation and Voronoý cells. Avis and Bhattacharya present these two algorithms in. Particles provide information on the room temperature at different coordinates over time. To locate and update particles data we define a computational cost function. To solve this function in an efficient way, we use a client server paradigm. Server computes data and client display this data on different kind of hardware. This paper is organized as follows. The first part presents related algorithm used to visualize large flow of data. The second part presents different platforms and methods used, which was evaluated in order to determine the better solution for the task proposed. The benchmark use the computational cost of our algorithm that formed based on located particles compared to sensors and on update of particles value. The benchmark was done on a personal computer using CPU, multi core programming, GPU programming and hybrid GPU/CPU. GPU programming method is growing in the research field; this method allows getting a real time rendering instates of a precompute rendering. For improving our results, we compute our algorithm on a High Performance Computing (HPC), this benchmark was used to improve multi-core method. HPC is commonly used in data visualization (astronomy, physic, etc) for improving the rendering and getting real-time.
A solid reactor core thermal model for nuclear thermal rockets

NASA Astrophysics Data System (ADS)

Rider, William J.; Cappiello, Michael W.; Liles, Dennis R.

1991-01-01

A Helium/Hydrogen Cooled Reactor Analysis (HERA) computer code has been developed. HERA has the ability to model arbitrary geometries in three dimensions, which allows the user to easily analyze reactor cores constructed of prismatic graphite elements. The code accounts for heat generation in the fuel, control rods, and other structures; conduction and radiation across gaps; convection to the coolant; and a variety of boundary conditions. The numerical solution scheme has been optimized for vector computers, making long transient analyses economical. Time integration is either explicit or implicit, which allows the use of the model to accurately calculate both short- or long-term transients with an efficient use of computer time. Both the basic spatial and temporal integration schemes have been benchmarked against analytical solutions.

Benchmark Evaluation of HTR-PROTEUS Pebble Bed Experimental Program

DOE PAGES

Bess, John D.; Montierth, Leland; Köberl, Oliver; ...

2014-10-09

Benchmark models were developed to evaluate 11 critical core configurations of the HTR-PROTEUS pebble bed experimental program. Various additional reactor physics measurements were performed as part of this program; currently only a total of 37 absorber rod worth measurements have been evaluated as acceptable benchmark experiments for Cores 4, 9, and 10. Dominant uncertainties in the experimental keff for all core configurations come from uncertainties in the ²³⁵U enrichment of the fuel, impurities in the moderator pebbles, and the density and impurity content of the radial reflector. Calculations of k eff with MCNP5 and ENDF/B-VII.0 neutron nuclear data aremore » greater than the benchmark values but within 1% and also within the 3σ uncertainty, except for Core 4, which is the only randomly packed pebble configuration. Repeated calculations of k eff with MCNP6.1 and ENDF/B-VII.1 are lower than the benchmark values and within 1% (~3σ) except for Cores 5 and 9, which calculate lower than the benchmark eigenvalues within 4σ. The primary difference between the two nuclear data libraries is the adjustment of the absorption cross section of graphite. Simulations of the absorber rod worth measurements are within 3σ of the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Kohn-Sham Band Structure Benchmark Including Spin-Orbit Coupling for 2D and 3D Solids

NASA Astrophysics Data System (ADS)

Huhn, William; Blum, Volker

2015-03-01

Accurate electronic band structures serve as a primary indicator of the suitability of a material for a given application, e.g., as electronic or catalytic materials. Computed band structures, however, are subject to a host of approximations, some of which are more obvious (e.g., the treatment of the exchange-correlation of self-energy) and others less obvious (e.g., the treatment of core, semicore, or valence electrons, handling of relativistic effects, or the accuracy of the underlying basis set used). We here provide a set of accurate Kohn-Sham band structure benchmarks, using the numeric atom-centered all-electron electronic structure code FHI-aims combined with the ``traditional'' PBE functional and the hybrid HSE functional, to calculate core, valence, and low-lying conduction bands of a set of 2D and 3D materials. Benchmarks are provided with and without effects of spin-orbit coupling, using quasi-degenerate perturbation theory to predict spin-orbit splittings. This work is funded by Fritz-Haber-Institut der Max-Planck-Gesellschaft.
Benchmark Comparison of Dual- and Quad-Core Processor Linux Clusters with Two Global Climate Modeling Workloads

NASA Technical Reports Server (NTRS)

McGalliard, James

2008-01-01

This viewgraph presentation details the science and systems environments that NASA High End computing program serves. Included is a discussion of the workload that is involved in the processing for the Global Climate Modeling. The Goddard Earth Observing System Model, Version 5 (GEOS-5) is a system of models integrated using the Earth System Modeling Framework (ESMF). The GEOS-5 system was used for the Benchmark tests, and the results of the tests are shown and discussed. Tests were also run for the Cubed Sphere system, results for these test are also shown.
CERN Computing in Commercial Clouds

NASA Astrophysics Data System (ADS)

Cordeiro, C.; Field, L.; Garrido Bear, B.; Giordano, D.; Jones, B.; Keeble, O.; Manzi, A.; Martelli, E.; McCance, G.; Moreno-García, D.; Traylen, S.

2017-10-01

By the end of 2016 more than 10 Million core-hours of computing resources have been delivered by several commercial cloud providers to the four LHC experiments to run their production workloads, from simulation to full chain processing. In this paper we describe the experience gained at CERN in procuring and exploiting commercial cloud resources for the computing needs of the LHC experiments. The mechanisms used for provisioning, monitoring, accounting, alarming and benchmarking will be discussed, as well as the involvement of the LHC collaborations in terms of managing the workflows of the experiments within a multicloud environment.
Mean velocity and turbulence measurements in a 90 deg curved duct with thin inlet boundary layer

NASA Technical Reports Server (NTRS)

Crawford, R. A.; Peters, C. E.; Steinhoff, J.; Hornkohl, J. O.; Nourinejad, J.; Ramachandran, K.

1985-01-01

The experimental database established by this investigation of the flow in a large rectangular turning duct is of benchmark quality. The experimental Reynolds numbers, Deans numbers and boundary layer characteristics are significantly different from previous benchmark curved-duct experimental parameters. This investigation extends the experimental database to higher Reynolds number and thinner entrance boundary layers. The 5% to 10% thick boundary layers, based on duct half-width, results in a large region of near-potential flow in the duct core surrounded by developing boundary layers with large crossflows. The turbulent entrance boundary layer case at R sub ed = 328,000 provides an incompressible flowfield which approaches real turbine blade cascade characteristics. The results of this investigation provide a challenging benchmark database for computational fluid dynamics code development.
Core-core and core-valence correlation energy atomic and molecular benchmarks for Li through Ar

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ranasinghe, Duminda S.; Frisch, Michael J.; Petersson, George A., E-mail: gpetersson@wesleyan.edu

2015-12-07

We have established benchmark core-core, core-valence, and valence-valence absolute coupled-cluster single double (triple) correlation energies (±0.1%) for 210 species covering the first- and second-rows of the periodic table. These species provide 194 energy differences (±0.03 mE{sub h}) including ionization potentials, electron affinities, and total atomization energies. These results can be used for calibration of less expensive methodologies for practical routine determination of core-core and core-valence correlation energies.
Core Collapse: The Race Between Stellar Evolution and Binary Heating

NASA Astrophysics Data System (ADS)

Converse, Joseph M.; Chandar, R.

2012-01-01

The dynamical formation of binary stars can dramatically affect the evolution of their host star clusters. In relatively small clusters (M < 6000 Msun) the most massive stars rapidly form binaries, heating the cluster and preventing any significant contraction of the core. The situation in much larger globular clusters (M 105 Msun) is quite different, with many showing collapsed cores, implying that binary formation did not affect them as severely as lower mass clusters. More massive clusters, however, should take longer to form their binaries, allowing stellar evolution more time to prevent the heating by causing the larger stars to die off. Here, we simulate the evolution of clusters between those of open and globular clusters in order to find at what size a star cluster is able to experience true core collapse. Our simulations make use of a new GPU-based computing cluster recently purchased at the University of Toledo. We also present some benchmarks of this new computational resource.
Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines

DTIC Science & Technology

2014-11-01

architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges
Hardware accelerated high performance neutron transport computation based on AGENT methodology

NASA Astrophysics Data System (ADS)

Xiao, Shanjie

The spatial heterogeneity of the next generation Gen-IV nuclear reactor core designs brings challenges to the neutron transport analysis. The Arbitrary Geometry Neutron Transport (AGENT) AGENT code is a three-dimensional neutron transport analysis code being developed at the Laboratory for Neutronics and Geometry Computation (NEGE) at Purdue University. It can accurately describe the spatial heterogeneity in a hierarchical structure through the R-function solid modeler. The previous version of AGENT coupled the 2D transport MOC solver and the 1D diffusion NEM solver to solve the three dimensional Boltzmann transport equation. In this research, the 2D/1D coupling methodology was expanded to couple two transport solvers, the radial 2D MOC solver and the axial 1D MOC solver, for better accuracy. The expansion was benchmarked with the widely applied C5G7 benchmark models and two fast breeder reactor models, and showed good agreement with the reference Monte Carlo results. In practice, the accurate neutron transport analysis for a full reactor core is still time-consuming and thus limits its application. Therefore, another content of my research is focused on designing a specific hardware based on the reconfigurable computing technique in order to accelerate AGENT computations. It is the first time that the application of this type is used to the reactor physics and neutron transport for reactor design. The most time consuming part of the AGENT algorithm was identified. Moreover, the architecture of the AGENT acceleration system was designed based on the analysis. Through the parallel computation on the specially designed, highly efficient architecture, the acceleration design on FPGA acquires high performance at the much lower working frequency than CPUs. The whole design simulations show that the acceleration design would be able to speedup large scale AGENT computations about 20 times. The high performance AGENT acceleration system will drastically shortening the computation time for 3D full-core neutron transport analysis, making the AGENT methodology unique and advantageous, and thus supplies the possibility to extend the application range of neutron transport analysis in either industry engineering or academic research.
CLEAR: Cross-Layer Exploration for Architecting Resilience

DTIC Science & Technology

2017-03-01

benchmark analysis, also provides cost-effective solutions (~1% additional energy cost for the same 50× improvement). This paper addresses the...core (OoO-core) [Wang 04], across 18 benchmarks . Such extensive exploration enables us to conclusively answer the above cross-layer resilience...analysis of the effects of soft errors on application benchmarks , provides a highly effective soft error resilience approach. 3. The above
Integral Full Core Multi-Physics PWR Benchmark with Measured Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Forget, Benoit; Smith, Kord; Kumar, Shikhar

In recent years, the importance of modeling and simulation has been highlighted extensively in the DOE research portfolio with concrete examples in nuclear engineering with the CASL and NEAMS programs. These research efforts and similar efforts worldwide aim at the development of high-fidelity multi-physics analysis tools for the simulation of current and next-generation nuclear power reactors. Like all analysis tools, verification and validation is essential to guarantee proper functioning of the software and methods employed. The current approach relies mainly on the validation of single physic phenomena (e.g. critical experiment, flow loops, etc.) and there is a lack of relevantmore » multiphysics benchmark measurements that are necessary to validate high-fidelity methods being developed today. This work introduces a new multi-cycle full-core Pressurized Water Reactor (PWR) depletion benchmark based on two operational cycles of a commercial nuclear power plant that provides a detailed description of fuel assemblies, burnable absorbers, in-core fission detectors, core loading and re-loading patterns. This benchmark enables analysts to develop extremely detailed reactor core models that can be used for testing and validation of coupled neutron transport, thermal-hydraulics, and fuel isotopic depletion. The benchmark also provides measured reactor data for Hot Zero Power (HZP) physics tests, boron letdown curves, and three-dimensional in-core flux maps from 58 instrumented assemblies. The benchmark description is now available online and has been used by many groups. However, much work remains to be done on the quantification of uncertainties and modeling sensitivities. This work aims to address these deficiencies and make this benchmark a true non-proprietary international benchmark for the validation of high-fidelity tools. This report details the BEAVRS uncertainty quantification for the first two cycle of operations and serves as the final report of the project.« less
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schulz, Roland; Lindner, Benjamin; Petridis, Loukas

2009-01-01

A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors,more » other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million atom biological systems scale well up to 30k cores, producing 30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.« less
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer.

PubMed

Schulz, Roland; Lindner, Benjamin; Petridis, Loukas; Smith, Jeremy C

2009-10-13

A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors, other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million-atom biological systems scale well up to ∼30k cores, producing ∼30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

PubMed Central

Manolakos, Elias S.

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

PubMed

Sharma, Anuj; Manolakos, Elias S

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Simulation of X-ray absorption spectra with orthogonality constrained density functional theory.

PubMed

Derricotte, Wallace D; Evangelista, Francesco A

2015-06-14

Orthogonality constrained density functional theory (OCDFT) [F. A. Evangelista, P. Shushkov and J. C. Tully, J. Phys. Chem. A, 2013, 117, 7378] is a variational time-independent approach for the computation of electronic excited states. In this work we extend OCDFT to compute core-excited states and generalize the original formalism to determine multiple excited states. Benchmark computations on a set of 13 small molecules and 40 excited states show that unshifted OCDFT/B3LYP excitation energies have a mean absolute error of 1.0 eV. Contrary to time-dependent DFT, OCDFT excitation energies for first- and second-row elements are computed with near-uniform accuracy. OCDFT core excitation energies are insensitive to the choice of the functional and the amount of Hartree-Fock exchange. We show that OCDFT is a powerful tool for the assignment of X-ray absorption spectra of large molecules by simulating the gas-phase near-edge spectrum of adenine and thymine.
ViSAPy: a Python tool for biophysics-based generation of virtual spiking activity for evaluation of spike-sorting algorithms.

PubMed

Hagen, Espen; Ness, Torbjørn V; Khosrowshahi, Amir; Sørensen, Christina; Fyhn, Marianne; Hafting, Torkel; Franke, Felix; Einevoll, Gaute T

2015-04-30

New, silicon-based multielectrodes comprising hundreds or more electrode contacts offer the possibility to record spike trains from thousands of neurons simultaneously. This potential cannot be realized unless accurate, reliable automated methods for spike sorting are developed, in turn requiring benchmarking data sets with known ground-truth spike times. We here present a general simulation tool for computing benchmarking data for evaluation of spike-sorting algorithms entitled ViSAPy (Virtual Spiking Activity in Python). The tool is based on a well-established biophysical forward-modeling scheme and is implemented as a Python package built on top of the neuronal simulator NEURON and the Python tool LFPy. ViSAPy allows for arbitrary combinations of multicompartmental neuron models and geometries of recording multielectrodes. Three example benchmarking data sets are generated, i.e., tetrode and polytrode data mimicking in vivo cortical recordings and microelectrode array (MEA) recordings of in vitro activity in salamander retinas. The synthesized example benchmarking data mimics salient features of typical experimental recordings, for example, spike waveforms depending on interspike interval. ViSAPy goes beyond existing methods as it includes biologically realistic model noise, synaptic activation by recurrent spiking networks, finite-sized electrode contacts, and allows for inhomogeneous electrical conductivities. ViSAPy is optimized to allow for generation of long time series of benchmarking data, spanning minutes of biological time, by parallel execution on multi-core computers. ViSAPy is an open-ended tool as it can be generalized to produce benchmarking data or arbitrary recording-electrode geometries and with various levels of complexity. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Shift Verification and Validation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pandya, Tara M.; Evans, Thomas M.; Davidson, Gregory G

2016-09-07

This documentation outlines the verification and validation of Shift for the Consortium for Advanced Simulation of Light Water Reactors (CASL). Five main types of problems were used for validation: small criticality benchmark problems; full-core reactor benchmarks for light water reactors; fixed-source coupled neutron-photon dosimetry benchmarks; depletion/burnup benchmarks; and full-core reactor performance benchmarks. We compared Shift results to measured data and other simulated Monte Carlo radiation transport code results, and found very good agreement in a variety of comparison measures. These include prediction of critical eigenvalue, radial and axial pin power distributions, rod worth, leakage spectra, and nuclide inventories over amore » burn cycle. Based on this validation of Shift, we are confident in Shift to provide reference results for CASL benchmarking.« less
Optical interconnection network for parallel access to multi-rank memory in future computing systems.

PubMed

Wang, Kang; Gu, Huaxi; Yang, Yintang; Wang, Kun

2015-08-10

With the number of cores increasing, there is an emerging need for a high-bandwidth low-latency interconnection network, serving core-to-memory communication. In this paper, aiming at the goal of simultaneous access to multi-rank memory, we propose an optical interconnection network for core-to-memory communication. In the proposed network, the wavelength usage is delicately arranged so that cores can communicate with different ranks at the same time and broadcast for flow control can be achieved. A distributed memory controller architecture that works in a pipeline mode is also designed for efficient optical communication and transaction address processes. The scaling method and wavelength assignment for the proposed network are investigated. Compared with traditional electronic bus-based core-to-memory communication, the simulation results based on the PARSEC benchmark show that the bandwidth enhancement and latency reduction are apparent.
Accelerating cardiac bidomain simulations using graphics processing units.

PubMed

Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G

2012-08-01

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.

Accelerating Cardiac Bidomain Simulations Using Graphics Processing Units

PubMed Central

Neic, Aurel; Liebmann, Manfred; Hoetzl, Elena; Mitchell, Lawrence; Vigmond, Edward J.; Haase, Gundolf

2013-01-01

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6–20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20GPUs, 476 CPU cores were required on a national supercomputing facility. PMID:22692867
A Two-Step Approach to Uncertainty Quantification of Core Simulators

DOE PAGES

Yankov, Artem; Collins, Benjamin; Klein, Markus; ...

2012-01-01

For the multiple sources of error introduced into the standard computational regime for simulating reactor cores, rigorous uncertainty analysis methods are available primarily to quantify the effects of cross section uncertainties. Two methods for propagating cross section uncertainties through core simulators are the XSUSA statistical approach and the “two-step” method. The XSUSA approach, which is based on the SUSA code package, is fundamentally a stochastic sampling method. Alternatively, the two-step method utilizes generalized perturbation theory in the first step and stochastic sampling in the second step. The consistency of these two methods in quantifying uncertainties in the multiplication factor andmore » in the core power distribution was examined in the framework of phase I-3 of the OECD Uncertainty Analysis in Modeling benchmark. With the Three Mile Island Unit 1 core as a base model for analysis, the XSUSA and two-step methods were applied with certain limitations, and the results were compared to those produced by other stochastic sampling-based codes. Based on the uncertainty analysis results, conclusions were drawn as to the method that is currently more viable for computing uncertainties in burnup and transient calculations.« less
Investigating the impact of the cielo cray XE6 architecture on scientific application codes.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajan, Mahesh; Barrett, Richard; Pedretti, Kevin Thomas Tauke

2010-12-01

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, andmore » supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.« less
Virtualizing Super-Computation On-Board Uas

NASA Astrophysics Data System (ADS)

Salami, E.; Soler, J. A.; Cuadrado, R.; Barrado, C.; Pastor, E.

2015-04-01

Unmanned aerial systems (UAS, also known as UAV, RPAS or drones) have a great potential to support a wide variety of aerial remote sensing applications. Most UAS work by acquiring data using on-board sensors for later post-processing. Some require the data gathered to be downlinked to the ground in real-time. However, depending on the volume of data and the cost of the communications, this later option is not sustainable in the long term. This paper develops the concept of virtualizing super-computation on-board UAS, as a method to ease the operation by facilitating the downlink of high-level information products instead of raw data. Exploiting recent developments in miniaturized multi-core devices is the way to speed-up on-board computation. This hardware shall satisfy size, power and weight constraints. Several technologies are appearing with promising results for high performance computing on unmanned platforms, such as the 36 cores of the TILE-Gx36 by Tilera (now EZchip) or the 64 cores of the Epiphany-IV by Adapteva. The strategy for virtualizing super-computation on-board includes the benchmarking for hardware selection, the software architecture and the communications aware design. A parallelization strategy is given for the 36-core TILE-Gx36 for a UAS in a fire mission or in similar target-detection applications. The results are obtained for payload image processing algorithms and determine in real-time the data snapshot to gather and transfer to ground according to the needs of the mission, the processing time, and consumed watts.
Numerical modeling of fluid and electrical currents through geometries based on synchrotron X-ray tomographic images of reservoir rocks using Avizo and COMSOL

NASA Astrophysics Data System (ADS)

Bird, M. B.; Butler, S. L.; Hawkes, C. D.; Kotzer, T.

2014-12-01

The use of numerical simulations to model physical processes occurring within subvolumes of rock samples that have been characterized using advanced 3D imaging techniques is becoming increasingly common. Not only do these simulations allow for the determination of macroscopic properties like hydraulic permeability and electrical formation factor, but they also allow the user to visualize processes taking place at the pore scale and they allow for multiple different processes to be simulated on the same geometry. Most efforts to date have used specialized research software for the purpose of simulations. In this contribution, we outline the steps taken to use commercial software Avizo to transform a 3D synchrotron X-ray-derived tomographic image of a rock core sample to an STL (STereoLithography) file which can be imported into the commercial multiphysics modeling package COMSOL. We demonstrate that the use of COMSOL to perform fluid and electrical current flow simulations through the pore spaces. The permeability and electrical formation factor of the sample are calculated and compared with laboratory-derived values and benchmark calculations. Although the simulation domains that we were able to model on a desk top computer were significantly smaller than representative elementary volumes, and we were able to establish Kozeny-Carman and Archie's Law trends on which laboratory measurements and previous benchmark solutions fall. The rock core samples include a Fountainebleau sandstone used for benchmarking and a marly dolostone sampled from a well in the Weyburn oil field of southeastern Saskatchewan, Canada. Such carbonates are known to have complicated pore structures compared with sandstones, yet we are able to calculate reasonable macroscopic properties. We discuss the computing resources required.
Evaluation of Neutron Radiography Reactor LEU-Core Start-Up Measurements

DOE PAGES

Bess, John D.; Maddock, Thomas L.; Smolinski, Andrew T.; ...

2014-11-04

Benchmark models were developed to evaluate the cold-critical start-up measurements performed during the fresh core reload of the Neutron Radiography (NRAD) reactor with Low Enriched Uranium (LEU) fuel. Experiments include criticality, control-rod worth measurements, shutdown margin, and excess reactivity for four core loadings with 56, 60, 62, and 64 fuel elements. The worth of four graphite reflector block assemblies and an empty dry tube used for experiment irradiations were also measured and evaluated for the 60-fuel-element core configuration. Dominant uncertainties in the experimental k eff come from uncertainties in the manganese content and impurities in the stainless steel fuel claddingmore » as well as the 236U and erbium poison content in the fuel matrix. Calculations with MCNP5 and ENDF/B-VII.0 neutron nuclear data are approximately 1.4% (9σ) greater than the benchmark model eigenvalues, which is commonly seen in Monte Carlo simulations of other TRIGA reactors. Simulations of the worth measurements are within the 2σ uncertainty for most of the benchmark experiment worth values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Evaluation of Neutron Radiography Reactor LEU-Core Start-Up Measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Maddock, Thomas L.; Smolinski, Andrew T.

Benchmark models were developed to evaluate the cold-critical start-up measurements performed during the fresh core reload of the Neutron Radiography (NRAD) reactor with Low Enriched Uranium (LEU) fuel. Experiments include criticality, control-rod worth measurements, shutdown margin, and excess reactivity for four core loadings with 56, 60, 62, and 64 fuel elements. The worth of four graphite reflector block assemblies and an empty dry tube used for experiment irradiations were also measured and evaluated for the 60-fuel-element core configuration. Dominant uncertainties in the experimental k eff come from uncertainties in the manganese content and impurities in the stainless steel fuel claddingmore » as well as the 236U and erbium poison content in the fuel matrix. Calculations with MCNP5 and ENDF/B-VII.0 neutron nuclear data are approximately 1.4% (9σ) greater than the benchmark model eigenvalues, which is commonly seen in Monte Carlo simulations of other TRIGA reactors. Simulations of the worth measurements are within the 2σ uncertainty for most of the benchmark experiment worth values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Computational Chemistry Comparison and Benchmark Database

National Institute of Standards and Technology Data Gateway

SRD 101 NIST Computational Chemistry Comparison and Benchmark Database (Web, free access) The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of molecules. The goals are to provide a benchmark set of molecules for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of thermochemical properties.
Spectral Element Method for the Simulation of Unsteady Compressible Flows

NASA Technical Reports Server (NTRS)

Diosady, Laslo Tibor; Murman, Scott M.

2013-01-01

This work uses a discontinuous-Galerkin spectral-element method (DGSEM) to solve the compressible Navier-Stokes equations [1{3]. The inviscid ux is computed using the approximate Riemann solver of Roe [4]. The viscous fluxes are computed using the second form of Bassi and Rebay (BR2) [5] in a manner consistent with the spectral-element approximation. The method of lines with the classical 4th-order explicit Runge-Kutta scheme is used for time integration. Results for polynomial orders up to p = 15 (16th order) are presented. The code is parallelized using the Message Passing Interface (MPI). The computations presented in this work are performed using the Sandy Bridge nodes of the NASA Pleiades supercomputer at NASA Ames Research Center. Each Sandy Bridge node consists of 2 eight-core Intel Xeon E5-2670 processors with a clock speed of 2.6Ghz and 2GB per core memory. On a Sandy Bridge node the Tau Benchmark [6] runs in a time of 7.6s.
A theoretical and experimental benchmark study of core-excited states in nitrogen

NASA Astrophysics Data System (ADS)

Myhre, Rolf H.; Wolf, Thomas J. A.; Cheng, Lan; Nandi, Saikat; Coriani, Sonia; Gühr, Markus; Koch, Henrik

2018-02-01

The high resolution near edge X-ray absorption fine structure spectrum of nitrogen displays the vibrational structure of the core-excited states. This makes nitrogen well suited for assessing the accuracy of different electronic structure methods for core excitations. We report high resolution experimental measurements performed at the SOLEIL synchrotron facility. These are compared with theoretical spectra calculated using coupled cluster theory and algebraic diagrammatic construction theory. The coupled cluster singles and doubles with perturbative triples model known as CC3 is shown to accurately reproduce the experimental excitation energies as well as the spacing of the vibrational transitions. The computational results are also shown to be systematically improved within the coupled cluster hierarchy, with the coupled cluster singles, doubles, triples, and quadruples method faithfully reproducing the experimental vibrational structure.
GW100: Benchmarking G0W0 for Molecular Systems.

PubMed

van Setten, Michiel J; Caruso, Fabio; Sharifzadeh, Sahar; Ren, Xinguo; Scheffler, Matthias; Liu, Fang; Lischner, Johannes; Lin, Lin; Deslippe, Jack R; Louie, Steven G; Yang, Chao; Weigend, Florian; Neaton, Jeffrey B; Evers, Ferdinand; Rinke, Patrick

2015-12-08

We present the GW100 set. GW100 is a benchmark set of the ionization potentials and electron affinities of 100 molecules computed with the GW method using three independent GW codes and different GW methodologies. The quasi-particle energies of the highest-occupied molecular orbitals (HOMO) and lowest-unoccupied molecular orbitals (LUMO) are calculated for the GW100 set at the G0W0@PBE level using the software packages TURBOMOLE, FHI-aims, and BerkeleyGW. The use of these three codes allows for a quantitative comparison of the type of basis set (plane wave or local orbital) and handling of unoccupied states, the treatment of core and valence electrons (all electron or pseudopotentials), the treatment of the frequency dependence of the self-energy (full frequency or more approximate plasmon-pole models), and the algorithm for solving the quasi-particle equation. Primary results include reference values for future benchmarks, best practices for convergence within a particular approach, and average error bars for the most common approximations.
Preliminary Results for the OECD/NEA Time Dependent Benchmark using Rattlesnake, Rattlesnake-IQS and TDKENO

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeHart, Mark D.; Mausolff, Zander; Weems, Zach

2016-08-01

One goal of the MAMMOTH M&S project is to validate the analysis capabilities within MAMMOTH. Historical data has shown limited value for validation of full three-dimensional (3D) multi-physics methods. Initial analysis considered the TREAT startup minimum critical core and one of the startup transient tests. At present, validation is focusing on measurements taken during the M8CAL test calibration series. These exercises will valuable in preliminary assessment of the ability of MAMMOTH to perform coupled multi-physics calculations; calculations performed to date are being used to validate the neutron transport solver Rattlesnake\\cite{Rattlesnake} and the fuels performance code BISON. Other validation projects outsidemore » of TREAT are available for single-physics benchmarking. Because the transient solution capability of Rattlesnake is one of the key attributes that makes it unique for TREAT transient simulations, validation of the transient solution of Rattlesnake using other time dependent kinetics benchmarks has considerable value. The Nuclear Energy Agency (NEA) of the Organization for Economic Cooperation and Development (OECD) has recently developed a computational benchmark for transient simulations. This benchmark considered both two-dimensional (2D) and 3D configurations for a total number of 26 different transients. All are negative reactivity insertions, typically returning to the critical state after some time.« less
ZPR-6 assembly 7 high {sup 240} PU core : a cylindrical assemby with mixed (PU, U)-oxide fuel and a central high {sup 240} PU zone.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lell, R. M.; Schaefer, R. W.; McKnight, R. D.

Over a period of 30 years more than a hundred Zero Power Reactor (ZPR) critical assemblies were constructed at Argonne National Laboratory. The ZPR facilities, ZPR-3, ZPR-6, ZPR-9 and ZPPR, were all fast critical assembly facilities. The ZPR critical assemblies were constructed to support fast reactor development, but data from some of these assemblies are also well suited to form the basis for criticality safety benchmarks. Of the three classes of ZPR assemblies, engineering mockups, engineering benchmarks and physics benchmarks, the last group tends to be most useful for criticality safety. Because physics benchmarks were designed to test fast reactormore » physics data and methods, they were as simple as possible in geometry and composition. The principal fissile species was {sup 235}U or {sup 239}Pu. Fuel enrichments ranged from 9% to 95%. Often there were only one or two main core diluent materials, such as aluminum, graphite, iron, sodium or stainless steel. The cores were reflected (and insulated from room return effects) by one or two layers of materials such as depleted uranium, lead or stainless steel. Despite their more complex nature, a small number of assemblies from the other two classes would make useful criticality safety benchmarks because they have features related to criticality safety issues, such as reflection by soil-like material. The term 'benchmark' in a ZPR program connotes a particularly simple loading aimed at gaining basic reactor physics insight, as opposed to studying a reactor design. In fact, the ZPR-6/7 Benchmark Assembly (Reference 1) had a very simple core unit cell assembled from plates of depleted uranium, sodium, iron oxide, U3O8, and plutonium. The ZPR-6/7 core cell-average composition is typical of the interior region of liquid-metal fast breeder reactors (LMFBRs) of the era. It was one part of the Demonstration Reactor Benchmark Program,a which provided integral experiments characterizing the important features of demonstration-size LMFBRs. As a benchmark, ZPR-6/7 was devoid of many 'real' reactor features, such as simulated control rods and multiple enrichment zones, in its reference form. Those kinds of features were investigated experimentally in variants of the reference ZPR-6/7 or in other critical assemblies in the Demonstration Reactor Benchmark Program.« less
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox

NASA Astrophysics Data System (ADS)

Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas

In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
Benchmark Evaluation of the HTR-PROTEUS Absorber Rod Worths (Core 4)

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; Leland M. Montierth

2014-06-01

PROTEUS was a zero-power research reactor at the Paul Scherrer Institute (PSI) in Switzerland. The critical assembly was constructed from a large graphite annulus surrounding a central cylindrical cavity. Various experimental programs were investigated in PROTEUS; during the years 1992 through 1996, it was configured as a pebble-bed reactor and designated HTR-PROTEUS. Various critical configurations were assembled with each accompanied by an assortment of reactor physics experiments including differential and integral absorber rod measurements, kinetics, reaction rate distributions, water ingress effects, and small sample reactivity effects [1]. Four benchmark reports were previously prepared and included in the March 2013 editionmore » of the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook) [2] evaluating eleven critical configurations. A summary of that effort was previously provided [3] and an analysis of absorber rod worth measurements for Cores 9 and 10 have been performed prior to this analysis and included in PROTEUS-GCR-EXP-004 [4]. In the current benchmark effort, absorber rod worths measured for Core Configuration 4, which was the only core with a randomly-packed pebble loading, have been evaluated for inclusion as a revision to the HTR-PROTEUS benchmark report PROTEUS-GCR-EXP-002.« less
Parallel Algorithms for Monte Carlo Particle Transport Simulation on Exascale Computing Architectures

NASA Astrophysics Data System (ADS)

Romano, Paul Kollath

Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
Beyond core count: a look at new mainstream computing platforms for HEP workloads

NASA Astrophysics Data System (ADS)

Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.

2014-06-01

As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
Defining core elements and outstanding practice in Nutritional Science through collaborative benchmarking.

PubMed

Samman, Samir; McCarthur, Jennifer O; Peat, Mary

2006-01-01

Benchmarking has been adopted by educational institutions as a potentially sensitive tool for improving learning and teaching. To date there has been limited application of benchmarking methodology in the Discipline of Nutritional Science. The aim of this survey was to define core elements and outstanding practice in Nutritional Science through collaborative benchmarking. Questionnaires that aimed to establish proposed core elements for Nutritional Science, and inquired about definitions of " good" and " outstanding" practice were posted to named representatives at eight Australian universities. Seven respondents identified core elements that included knowledge of nutrient metabolism and requirement, food production and processing, modern biomedical techniques that could be applied to understanding nutrition, and social and environmental issues as related to Nutritional Science. Four of the eight institutions who agreed to participate in the present survey identified the integration of teaching with research as an indicator of outstanding practice. Nutritional Science is a rapidly evolving discipline. Further and more comprehensive surveys are required to consolidate and update the definition of the discipline, and to identify the optimal way of teaching it. Global ideas and specific regional requirements also need to be considered.
Porting a Hall MHD Code to a Graphic Processing Unit

NASA Technical Reports Server (NTRS)

Dorelli, John C.

2011-01-01

We present our experience porting a Hall MHD code to a Graphics Processing Unit (GPU). The code is a 2nd order accurate MUSCL-Hancock scheme which makes use of an HLL Riemann solver to compute numerical fluxes and second-order finite differences to compute the Hall contribution to the electric field. The divergence of the magnetic field is controlled with Dedner?s hyperbolic divergence cleaning method. Preliminary benchmark tests indicate a speedup (relative to a single Nehalem core) of 58x for a double precision calculation. We discuss scaling issues which arise when distributing work across multiple GPUs in a CPU-GPU cluster.
Future computing platforms for science in a power constrained era

DOE PAGES

Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; ...

2015-12-23

Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. In conclusion, we evaluate the potentialmore » for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).« less

Accelerating 3D Hall MHD Magnetosphere Simulations with Graphics Processing Units

NASA Astrophysics Data System (ADS)

Bard, C.; Dorelli, J.

2017-12-01

The resolution required to simulate planetary magnetospheres with Hall magnetohydrodynamics result in program sizes approaching several hundred million grid cells. These would take years to run on a single computational core and require hundreds or thousands of computational cores to complete in a reasonable time. However, this requires access to the largest supercomputers. Graphics processing units (GPUs) provide a viable alternative: one GPU can do the work of roughly 100 cores, bringing Hall MHD simulations of Ganymede within reach of modest GPU clusters ( 8 GPUs). We report our progress in developing a GPU-accelerated, three-dimensional Hall magnetohydrodynamic code and present Hall MHD simulation results for both Ganymede (run on 8 GPUs) and Mercury (56 GPUs). We benchmark our Ganymede simulation with previous results for the Galileo G8 flyby, namely that adding the Hall term to ideal MHD simulations changes the global convection pattern within the magnetosphere. Additionally, we present new results for the G1 flyby as well as initial results from Hall MHD simulations of Mercury and compare them with the corresponding ideal MHD runs.
Coupled Neutronics Thermal-Hydraulic Solution of a Full-Core PWR Using VERA-CS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clarno, Kevin T; Palmtag, Scott; Davidson, Gregory G

2014-01-01

The Consortium for Advanced Simulation of Light Water Reactors (CASL) is developing a core simulator called VERA-CS to model operating PWR reactors with high resolution. This paper describes how the development of VERA-CS is being driven by a set of progression benchmark problems that specify the delivery of useful capability in discrete steps. As part of this development, this paper will describe the current capability of VERA-CS to perform a multiphysics simulation of an operating PWR at Hot Full Power (HFP) conditions using a set of existing computer codes coupled together in a novel method. Results for several single-assembly casesmore » are shown that demonstrate coupling for different boron concentrations and power levels. Finally, high-resolution results are shown for a full-core PWR reactor modeled in quarter-symmetry.« less
Method and system for benchmarking computers

DOEpatents

Gustafson, John L.

1993-09-14

A testing system and method for benchmarking computer systems. The system includes a store containing a scalable set of tasks to be performed to produce a solution in ever-increasing degrees of resolution as a larger number of the tasks are performed. A timing and control module allots to each computer a fixed benchmarking interval in which to perform the stored tasks. Means are provided for determining, after completion of the benchmarking interval, the degree of progress through the scalable set of tasks and for producing a benchmarking rating relating to the degree of progress for each computer.
Energy Efficiency Evaluation and Benchmarking of AFRL’s Condor High Performance Computer

DTIC Science & Technology

2011-08-01

AUG 2011 2. REPORT TYPE CONFERENCE PAPER (Post Print) 3. DATES COVERED (From - To) JAN 2011 – JUN 2011 4 . TITLE AND SUBTITLE ENERGY EFFICIENCY...1716 Sony PlayStation 3s (PS3s), adding up to a total of 69,940 cores and a theoretical peak performance of 500 TFLOPS. There are 84 subcluster head...Thus, a critical component to achieving maximum performance is to find the optimum division of processing load between the CPU and GPU. 4 The
HS06 Benchmark for an ARM Server

NASA Astrophysics Data System (ADS)

Kluth, Stefan

2014-06-01

We benchmarked an ARM cortex-A9 based server system with a four-core CPU running at 1.1 GHz. The system used Ubuntu 12.04 as operating system and the HEPSPEC 2006 (HS06) benchmarking suite was compiled natively with gcc-4.4 on the system. The benchmark was run for various settings of the relevant gcc compiler options. We did not find significant influence from the compiler options on the benchmark result. The final HS06 benchmark result is 10.4.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
The Learning Organisation: Results of a Benchmarking Study.

ERIC Educational Resources Information Center

Zairi, Mohamed

1999-01-01

Learning in corporations was assessed using these benchmarks: core qualities of creative organizations, characteristic of organizational creativity, attributes of flexible organizations, use of diversity and conflict, creative human resource management systems, and effective and successful teams. These benchmarks are key elements of the learning…
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers.

PubMed

Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito

2016-11-15

A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
PDS: A Performance Database Server

DOE PAGES

Berry, Michael W.; Dongarra, Jack J.; Larose, Brian H.; ...

1994-01-01

The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly available central depository of performance data for all ranges of machines from personal computers to supercomputers. We present an Internet-accessible performance database server (PDS) that can be used to extract current benchmark data and literature. As an extension to the X-Windows-based user interface (Xnetlib) to the Netlib archival system, PDS provides an on-line catalog of public domain computer benchmarks such as the LINPACK benchmark, Perfect benchmarks, and the NAS parallelmore » benchmarks. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. We believe that all branches (research laboratories, academia, and industry) of the general computing community can use this facility to archive performance metrics and make them readily available to the public. PDS can provide a more manageable approach to the development and support of a large dynamic database of published performance metrics.« less
Benchmarking the ATLAS software through the Kit Validation engine

NASA Astrophysics Data System (ADS)

De Salvo, Alessandro; Brasolin, Franco

2010-04-01

The measurement of the experiment software performance is a very important metric in order to choose the most effective resources to be used and to discover the bottlenecks of the code implementation. In this work we present the benchmark techniques used to measure the ATLAS software performance through the ATLAS offline testing engine Kit Validation and the online portal Global Kit Validation. The performance measurements, the data collection, the online analysis and display of the results will be presented. The results of the measurement on different platforms and architectures will be shown, giving a full report on the CPU power and memory consumption of the Monte Carlo generation, simulation, digitization and reconstruction of the most CPU-intensive channels. The impact of the multi-core computing on the ATLAS software performance will also be presented, comparing the behavior of different architectures when increasing the number of concurrent processes. The benchmark techniques described in this paper have been used in the HEPiX group since the beginning of 2008 to help defining the performance metrics for the High Energy Physics applications, based on the real experiment software.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Optimization of image processing algorithms on mobile platforms

NASA Astrophysics Data System (ADS)

Poudel, Pramod; Shirvaikar, Mukul

2011-03-01

This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware computing on mobile embedded systems requires the use of computationally intensive image processing algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256 MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.
EBR-II Reactor Physics Benchmark Evaluation Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pope, Chad L.; Lum, Edward S; Stewart, Ryan

This report provides a reactor physics benchmark evaluation with associated uncertainty quantification for the critical configuration of the April 1986 Experimental Breeder Reactor II Run 138B core configuration.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Game playing.

PubMed

Rosin, Christopher D

2014-03-01

Game playing has been a core domain of artificial intelligence research since the beginnings of the field. Game playing provides clearly defined arenas within which computational approaches can be readily compared to human expertise through head-to-head competition and other benchmarks. Game playing research has identified several simple core algorithms that provide successful foundations, with development focused on the challenges of defeating human experts in specific games. Key developments include minimax search in chess, machine learning from self-play in backgammon, and Monte Carlo tree search in Go. These approaches have generalized successfully to additional games. While computers have surpassed human expertise in a wide variety of games, open challenges remain and research focuses on identifying and developing new successful algorithmic foundations. WIREs Cogn Sci 2014, 5:193-205. doi: 10.1002/wcs.1278 CONFLICT OF INTEREST: The author has declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website. © 2014 John Wiley & Sons, Ltd.
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Benchmarking gate-based quantum computers

NASA Astrophysics Data System (ADS)

Michielsen, Kristel; Nocon, Madita; Willsch, Dennis; Jin, Fengping; Lippert, Thomas; De Raedt, Hans

2017-11-01

With the advent of public access to small gate-based quantum processors, it becomes necessary to develop a benchmarking methodology such that independent researchers can validate the operation of these processors. We explore the usefulness of a number of simple quantum circuits as benchmarks for gate-based quantum computing devices and show that circuits performing identity operations are very simple, scalable and sensitive to gate errors and are therefore very well suited for this task. We illustrate the procedure by presenting benchmark results for the IBM Quantum Experience, a cloud-based platform for gate-based quantum computing.
SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures

NASA Technical Reports Server (NTRS)

Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.

2017-01-01

The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.

Development of a Computing Cluster At the University of Richmond

NASA Astrophysics Data System (ADS)

Carbonneau, J.; Gilfoyle, G. P.; Bunn, E. F.

2010-11-01

The University of Richmond has developed a computing cluster to support the massive simulation and data analysis requirements for programs in intermediate-energy nuclear physics, and cosmology. It is a 20-node, 240-core system running Red Hat Enterprise Linux 5. We have built and installed the physics software packages (Geant4, gemc, MADmap...) and developed shell and Perl scripts for running those programs on the remote nodes. The system has a theoretical processing peak of about 2500 GFLOPS. Testing with the High Performance Linpack (HPL) benchmarking program (one of the standard benchmarks used by the TOP500 list of fastest supercomputers) resulted in speeds of over 900 GFLOPS. The difference between the maximum and measured speeds is due to limitations in the communication speed among the nodes; creating a bottleneck for large memory problems. As HPL sends data between nodes, the gigabit Ethernet connection cannot keep up with the processing power. We will show how both the theoretical and actual performance of the cluster compares with other current and past clusters, as well as the cost per GFLOP. We will also examine the scaling of the performance when distributed to increasing numbers of nodes.
Benchmark CCSD(T) and DFT study of binding energies in Be7 - 12: in search of reliable DFT functional for beryllium clusters

NASA Astrophysics Data System (ADS)

Labanc, Daniel; Šulka, Martin; Pitoňák, Michal; Černušák, Ivan; Urban, Miroslav; Neogrády, Pavel

2018-05-01

We present a computational study of the stability of small homonuclear beryllium clusters Be7 - 12 in singlet electronic states. Our predictions are based on highly correlated CCSD(T) coupled cluster calculations. Basis set convergence towards the complete basis set limit as well as the role of the 1s core electron correlation are carefully examined. Our CCSD(T) data for binding energies of Be7 - 12 clusters serve as a benchmark for performance assessment of several density functional theory (DFT) methods frequently used in beryllium cluster chemistry. We observe that, from Be10 clusters on, the deviation from CCSD(T) benchmarks is stable with respect to size, and fluctuating within 0.02 eV error bar for most examined functionals. This opens up the possibility of scaling the DFT binding energies for large Be clusters using CCSD(T) benchmark values for smaller clusters. We also tried to find analogies between the performance of DFT functionals for Be clusters and for the valence-isoelectronic Mg clusters investigated recently in Truhlar's group. We conclude that it is difficult to find DFT functionals that perform reasonably well for both beryllium and magnesium clusters. Out of 12 functionals examined, only the M06-2X functional gives reasonably accurate and balanced binding energies for both Be and Mg clusters.
47 CFR 69.108 - Transport rate benchmark.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 47 Telecommunication 3 2010-10-01 2010-10-01 false Transport rate benchmark. 69.108 Section 69.108... Computation of Charges § 69.108 Transport rate benchmark. (a) For transport charges computed in accordance... interoffice transmission using the telephone company's DS1 special access rates. (b) Initial transport rates...
47 CFR 69.108 - Transport rate benchmark.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 47 Telecommunication 3 2011-10-01 2011-10-01 false Transport rate benchmark. 69.108 Section 69.108... Computation of Charges § 69.108 Transport rate benchmark. (a) For transport charges computed in accordance... interoffice transmission using the telephone company's DS1 special access rates. (b) Initial transport rates...
Experimental and Theoretical Investigations on Viscosity of Fe-Ni-C Liquids at High Pressures

NASA Astrophysics Data System (ADS)

Chen, B.; Lai, X.; Wang, J.; Zhu, F.; Liu, J.; Kono, Y.

2016-12-01

Understanding and modeling of Earth's core processes such as geodynamo and heat flow via convection in liquid outer cores hinges on the viscosity of candidate liquid iron alloys under core conditions. Viscosity estimates from various methods of the metallic liquid of the outer core, however, span up to 12 orders of magnitude. Due to experimental challenges, viscosity measurements of iron liquids alloyed with lighter elements are scarce and conducted at conditions far below those expected for the outer core. In this study, we adopt a synergistic approach by integrating experiments at experimentally-achievable conditions with computations up to core conditions. We performed viscosity measurements based on the modified Stokes' floating sphere viscometry method for the Fe-Ni-C liquids at high pressures in a Paris-Edinburgh press at Sector 16 of the Advanced Photon Source, Argonne National Laboratory. Our results show that the addition of 3-5 wt.% carbon to iron-nickel liquids has negligible effect on its viscosity at pressures lower than 5 GPa. The viscosity of the Fe-Ni-C liquids, however, becomes notably higher and increases by a factor of 3 at 5-8 GPa. Similarly, our first-principles molecular dynamics calculations up to Earth's core pressures show a viscosity change in Fe-Ni-C liquids at 5 GPa. The significant change in the viscosity is likely due to a liquid structural transition of the Fe-Ni-C liquids as revealed by our X-ray diffraction measurements and first-principles molecular dynamics calculations. The observed correlation between structure and physical properties of liquids permit stringent benchmark test of the computational liquid models and contribute to a more comprehensive understanding of liquid properties under high pressures. The interplay between experiments and first-principles based modeling is shown to be a practical and effective methodology for studying liquid properties under outer core conditions that are difficult to reach with the current static high-pressure capabilities. The new viscosity data from experiments and computations would provide new insights into the internal dynamics of the outer core.
Benchmark Lisp And Ada Programs

NASA Technical Reports Server (NTRS)

Davis, Gloria; Galant, David; Lim, Raymond; Stutz, John; Gibson, J.; Raghavan, B.; Cheesema, P.; Taylor, W.

1992-01-01

Suite of nonparallel benchmark programs, ELAPSE, designed for three tests: comparing efficiency of computer processing via Lisp vs. Ada; comparing efficiencies of several computers processing via Lisp; or comparing several computers processing via Ada. Tests efficiency which computer executes routines in each language. Available for computer equipped with validated Ada compiler and/or Common Lisp system.
Global Futures: a multithreaded execution model for Global Arrays-based applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chavarría-Miranda, Daniel; Krishnamoorthy, Sriram; Vishnu, Abhinav

2012-05-31

We present Global Futures (GF), an execution model extension to Global Arrays, which is based on a PGAS-compatible Active Message-based paradigm. We describe the design and implementation of Global Futures and illustrate its use in a computational chemistry application benchmark (Hartree-Fock matrix construction using the Self-Consistent Field method). Our results show how we used GF to increase the scalability of the Hartree-Fock matrix build to up to 6,144 cores of an Infiniband cluster. We also show how GF's multithreaded execution has comparable performance to the traditional process-based SPMD model.
Development of a Computer-based Benchmarking and Analytical Tool. Benchmarking and Energy & Water Savings Tool in Dairy Plants (BEST-Dairy)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Tengfang; Flapper, Joris; Ke, Jing

The overall goal of the project is to develop a computer-based benchmarking and energy and water savings tool (BEST-Dairy) for use in the California dairy industry – including four dairy processes – cheese, fluid milk, butter, and milk powder.
Analysis of a benchmark suite to evaluate mixed numeric and symbolic processing

NASA Technical Reports Server (NTRS)

Ragharan, Bharathi; Galant, David

1992-01-01

The suite of programs that formed the benchmark for a proposed advanced computer is described and analyzed. The features of the processor and its operating system that are tested by the benchmark are discussed. The computer codes and the supporting data for the analysis are given as appendices.
369 TFlop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Swaminarayan, Sriram; Germann, Timothy C; Kadau, Kai

2008-01-01

The authors present timing and performance numbers for a short-range parallel molecular dynamics (MD) code, SPaSM, that has been rewritten for the heterogeneous Roadrunner supercomputer. Each Roadrunner compute node consists of two AMD Opteron dual-core microprocessors and four PowerXCell 8i enhanced Cell microprocessors, so that there are four MPI ranks per node, each with one Opteron and one Cell. The interatomic forces are computed on the Cells (each with one PPU and eight SPU cores), while the Opterons are used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementationmore » of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), corresponding to 124 MFlop/Watt/s at a price of approximately 3.69 MFlops/dollar. They demonstrate an initial target application, the jetting and ejection of material from a shocked surface.« less
Implementation and validation of a conceptual benchmarking framework for patient blood management.

PubMed

Kastner, Peter; Breznik, Nada; Gombotz, Hans; Hofmann, Axel; Schreier, Günter

2015-01-01

Public health authorities and healthcare professionals are obliged to ensure high quality health service. Because of the high variability of the utilisation of blood and blood components, benchmarking is indicated in transfusion medicine. Implementation and validation of a benchmarking framework for Patient Blood Management (PBM) based on the report from the second Austrian Benchmark trial. Core modules for automatic report generation have been implemented with KNIME (Konstanz Information Miner) and validated by comparing the output with the results of the second Austrian benchmark trial. Delta analysis shows a deviation <0.1% for 95% (max. 1.4%). The framework provides a reliable tool for PBM benchmarking. The next step is technical integration with hospital information systems.
GPUs benchmarking in subpixel image registration algorithm

NASA Astrophysics Data System (ADS)

Sanz-Sabater, Martin; Picazo-Bueno, Jose Angel; Micó, Vicente; Ferrerira, Carlos; Granero, Luis; Garcia, Javier

2015-05-01

Image registration techniques are used among different scientific fields, like medical imaging or optical metrology. The straightest way to calculate shifting between two images is using the cross correlation, taking the highest value of this correlation image. Shifting resolution is given in whole pixels which cannot be enough for certain applications. Better results can be achieved interpolating both images, as much as the desired resolution we want to get, and applying the same technique described before, but the memory needed by the system is significantly higher. To avoid memory consuming we are implementing a subpixel shifting method based on FFT. With the original images, subpixel shifting can be achieved multiplying its discrete Fourier transform by a linear phase with different slopes. This method is high time consuming method because checking a concrete shifting means new calculations. The algorithm, highly parallelizable, is very suitable for high performance computing systems. GPU (Graphics Processing Unit) accelerated computing became very popular more than ten years ago because they have hundreds of computational cores in a reasonable cheap card. In our case, we are going to register the shifting between two images, doing the first approach by FFT based correlation, and later doing the subpixel approach using the technique described before. We consider it as `brute force' method. So we will present a benchmark of the algorithm consisting on a first approach (pixel resolution) and then do subpixel resolution approaching, decreasing the shifting step in every loop achieving a high resolution in few steps. This program will be executed in three different computers. At the end, we will present the results of the computation, with different kind of CPUs and GPUs, checking the accuracy of the method, and the time consumed in each computer, discussing the advantages, disadvantages of the use of GPUs.
Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Benchmarking neuromorphic vision: lessons learnt from computer vision

PubMed Central

Tan, Cheston; Lallee, Stephane; Orchard, Garrick

2015-01-01

Neuromorphic Vision sensors have improved greatly since the first silicon retina was presented almost three decades ago. They have recently matured to the point where they are commercially available and can be operated by laymen. However, despite improved availability of sensors, there remains a lack of good datasets, while algorithms for processing spike-based visual data are still in their infancy. On the other hand, frame-based computer vision algorithms are far more mature, thanks in part to widely accepted datasets which allow direct comparison between algorithms and encourage competition. We are presented with a unique opportunity to shape the development of Neuromorphic Vision benchmarks and challenges by leveraging what has been learnt from the use of datasets in frame-based computer vision. Taking advantage of this opportunity, in this paper we review the role that benchmarks and challenges have played in the advancement of frame-based computer vision, and suggest guidelines for the creation of Neuromorphic Vision benchmarks and challenges. We also discuss the unique challenges faced when benchmarking Neuromorphic Vision algorithms, particularly when attempting to provide direct comparison with frame-based computer vision. PMID:26528120
CORAL: aligning conserved core regions across domain families.

PubMed

Fong, Jessica H; Marchler-Bauer, Aron

2009-08-01

Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. Supplementary data are available at Bioinformatics online.
Highly Efficient Parallel Multigrid Solver For Large-Scale Simulation of Grain Growth Using the Structural Phase Field Crystal Model

NASA Astrophysics Data System (ADS)

Guan, Zhen; Pekurovsky, Dmitry; Luce, Jason; Thornton, Katsuyo; Lowengrub, John

The structural phase field crystal (XPFC) model can be used to model grain growth in polycrystalline materials at diffusive time-scales while maintaining atomic scale resolution. However, the governing equation of the XPFC model is an integral-partial-differential-equation (IPDE), which poses challenges in implementation onto high performance computing (HPC) platforms. In collaboration with the XSEDE Extended Collaborative Support Service, we developed a distributed memory HPC solver for the XPFC model, which combines parallel multigrid and P3DFFT. The performance benchmarking on the Stampede supercomputer indicates near linear strong and weak scaling for both multigrid and transfer time between multigrid and FFT modules up to 1024 cores. Scalability of the FFT module begins to decline at 128 cores, but it is sufficient for the type of problem we will be examining. We have demonstrated simulations using 1024 cores, and we expect to achieve 4096 cores and beyond. Ongoing work involves optimization of MPI/OpenMP-based codes for the Intel KNL Many-Core Architecture. This optimizes the code for coming pre-exascale systems, in particular many-core systems such as Stampede 2.0 and Cori 2 at NERSC, without sacrificing efficiency on other general HPC systems.
Heterogeneous Distributed Computing for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Sunderam, Vaidy S.

1998-01-01

The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
ZPR-3 Assembly 11 : A cylindrical sssembly of highly enriched uranium and depleted uranium with an average {sup 235}U enrichment of 12 atom % and a depleted uranium reflector.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lell, R. M.; McKnight, R. D.; Tsiboulia, A.

2010-09-30

Over a period of 30 years, more than a hundred Zero Power Reactor (ZPR) critical assemblies were constructed at Argonne National Laboratory. The ZPR facilities, ZPR-3, ZPR-6, ZPR-9 and ZPPR, were all fast critical assembly facilities. The ZPR critical assemblies were constructed to support fast reactor development, but data from some of these assemblies are also well suited for nuclear data validation and to form the basis for criticality safety benchmarks. A number of the Argonne ZPR/ZPPR critical assemblies have been evaluated as ICSBEP and IRPhEP benchmarks. Of the three classes of ZPR assemblies, engineering mockups, engineering benchmarks and physicsmore » benchmarks, the last group tends to be most useful for criticality safety. Because physics benchmarks were designed to test fast reactor physics data and methods, they were as simple as possible in geometry and composition. The principal fissile species was {sup 235}U or {sup 239}Pu. Fuel enrichments ranged from 9% to 95%. Often there were only one or two main core diluent materials, such as aluminum, graphite, iron, sodium or stainless steel. The cores were reflected (and insulated from room return effects) by one or two layers of materials such as depleted uranium, lead or stainless steel. Despite their more complex nature, a small number of assemblies from the other two classes would make useful criticality safety benchmarks because they have features related to criticality safety issues, such as reflection by soil-like material. ZPR-3 Assembly 11 (ZPR-3/11) was designed as a fast reactor physics benchmark experiment with an average core {sup 235}U enrichment of approximately 12 at.% and a depleted uranium reflector. Approximately 79.7% of the total fissions in this assembly occur above 100 keV, approximately 20.3% occur below 100 keV, and essentially none below 0.625 eV - thus the classification as a 'fast' assembly. This assembly is Fast Reactor Benchmark No. 8 in the Cross Section Evaluation Working Group (CSEWG) Benchmark Specificationsa and has historically been used as a data validation benchmark assembly. Loading of ZPR-3 Assembly 11 began in early January 1958, and the Assembly 11 program ended in late January 1958. The core consisted of highly enriched uranium (HEU) plates and depleted uranium plates loaded into stainless steel drawers, which were inserted into the central square stainless steel tubes of a 31 x 31 matrix on a split table machine. The core unit cell consisted of two columns of 0.125 in.-wide (3.175 mm) HEU plates, six columns of 0.125 in.-wide (3.175 mm) depleted uranium plates and one column of 1.0 in.-wide (25.4 mm) depleted uranium plates. The length of each column was 10 in. (254.0 mm) in each half of the core. The axial blanket consisted of 12 in. (304.8 mm) of depleted uranium behind the core. The thickness of the depleted uranium radial blanket was approximately 14 in. (355.6 mm), and the length of the radial blanket in each half of the matrix was 22 in. (558.8 mm). The assembly geometry approximated a right circular cylinder as closely as the square matrix tubes allowed. According to the logbook and loading records for ZPR-3/11, the reference critical configuration was loading 10 which was critical on January 21, 1958. Subsequent loadings were very similar but less clean for criticality because there were modifications made to accommodate reactor physics measurements other than criticality. Accordingly, ZPR-3/11 loading 10 was selected as the only configuration for this benchmark. As documented below, it was determined to be acceptable as a criticality safety benchmark experiment. A very accurate transformation to a simplified model is needed to make any ZPR assembly a practical criticality-safety benchmark. There is simply too much geometric detail in an exact (as-built) model of a ZPR assembly, even a clean core such as ZPR-3/11 loading 10. The transformation must reduce the detail to a practical level without masking any of the important features of the critical experiment. And it must do this without increasing the total uncertainty far beyond that of the original experiment. Such a transformation is described in Section 3. It was obtained using a pair of continuous-energy Monte Carlo calculations. First, the critical configuration was modeled in full detail - every plate, drawer, matrix tube, and air gap was modeled explicitly. Then the regionwise compositions and volumes from the detailed as-built model were used to construct a homogeneous, two-dimensional (RZ) model of ZPR-3/11 that conserved the mass of each nuclide and volume of each region. The simple cylindrical model is the criticality-safety benchmark model. The difference in the calculated k{sub eff} values between the as-built three-dimensional model and the homogeneous two-dimensional benchmark model was used to adjust the measured excess reactivity of ZPR-3/11 loading 10 to obtain the k{sub eff} for the benchmark model.« less
Computation of the free energy due to electron density fluctuation of a solute in solution: A QM/MM method with perturbation approach combined with a theory of solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suzuoka, Daiki; Takahashi, Hideaki, E-mail: hideaki@m.tohoku.ac.jp; Morita, Akihiro

2014-04-07

We developed a perturbation approach to compute solvation free energy Δμ within the framework of QM (quantum mechanical)/MM (molecular mechanical) method combined with a theory of energy representation (QM/MM-ER). The energy shift η of the whole system due to the electronic polarization of the solute is evaluated using the second-order perturbation theory (PT2), where the electric field formed by surrounding solvent molecules is treated as the perturbation to the electronic Hamiltonian of the isolated solute. The point of our approach is that the energy shift η, thus obtained, is to be adopted for a novel energy coordinate of the distributionmore » functions which serve as fundamental variables in the free energy functional developed in our previous work. The most time-consuming part in the QM/MM-ER simulation can be, thus, avoided without serious loss of accuracy. For our benchmark set of molecules, it is demonstrated that the PT2 approach coupled with QM/MM-ER gives hydration free energies in excellent agreements with those given by the conventional method utilizing the Kohn-Sham SCF procedure except for a few molecules in the benchmark set. A variant of the approach is also proposed to deal with such difficulties associated with the problematic systems. The present approach is also advantageous to parallel implementations. We examined the parallel efficiency of our PT2 code on multi-core processors and found that the speedup increases almost linearly with respect to the number of cores. Thus, it was demonstrated that QM/MM-ER coupled with PT2 deserves practical applications to systems of interest.« less
Development and Applications of Orthogonality Constrained Density Functional Theory for the Accurate Simulation of X-Ray Absorption Spectroscopy

NASA Astrophysics Data System (ADS)

Derricotte, Wallace D.

The aim of this dissertation is to address the theoretical challenges of calculating core-excited states within the framework of orthogonality constrained density functional theory (OCDFT). OCDFT is a well-established variational, time independent formulation of DFT for the computation of electronic excited states. In this work, the theory is first extended to compute core-excited states and generalized to calculate multiple excited state solutions. An initial benchmark is performed on a set of 40 unique core-excitations, highlighting that OCDFT excitation energies have a mean absolute error of 1.0 eV. Next, a novel implementation of the spin-free exact-two-component (X2C) one-electron treatment of scalar relativistic effects is presented and combined with OCDFT in an effort to calculate core excited states of transition metal complexes. The X2C-OCDFT spectra of three organotitanium complexes (TiCl4, TiCpCl3, and TiCp2Cl2) are shown to be in good agreement with experimental results and show a maximum absolute error of 5-6 eV. Next the issue of assigning core excited states is addressed by introducing an automated approach to analyzing the excited state MO by quantifying its local contributions using a unique orbital basis known as localized intrinsic valence virtual orbitals (LIVVOs). The utility of this approach is highlighted by studying sulfur core-excitations in ethanethiol and benzenethiol, as well as the hydrogen bonding in the water dimer. Finally, an approach to selectively target specic core-excited states in OCDFT based on atomic orbital subspace projection is presented in an effort to target core excited states of chemisorbed organic molecules. The core excitation spectrum of pyrazine chemisorbed on Si(100) is calculated using OCDFT and further characterized using the LIVVO approach.

Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Haoqiang; VanderWijngaart, Rob F.

2003-01-01

We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
TREAT Transient Analysis Benchmarking for the HEU Core

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kontogeorgakos, D. C.; Connaway, H. M.; Wright, A. E.

2014-05-01

This work was performed to support the feasibility study on the potential conversion of the Transient Reactor Test Facility (TREAT) at Idaho National Laboratory from the use of high enriched uranium (HEU) fuel to the use of low enriched uranium (LEU) fuel. The analyses were performed by the GTRI Reactor Conversion staff at the Argonne National Laboratory (ANL). The objective of this study was to benchmark the transient calculations against temperature-limited transients performed in the final operating HEU TREAT core configuration. The MCNP code was used to evaluate steady-state neutronics behavior, and the point kinetics code TREKIN was used tomore » determine core power and energy during transients. The first part of the benchmarking process was to calculate with MCNP all the neutronic parameters required by TREKIN to simulate the transients: the transient rod-bank worth, the prompt neutron generation lifetime, the temperature reactivity feedback as a function of total core energy, and the core-average temperature and peak temperature as a functions of total core energy. The results of these calculations were compared against measurements or against reported values as documented in the available TREAT reports. The heating of the fuel was simulated as an adiabatic process. The reported values were extracted from ANL reports, intra-laboratory memos and experiment logsheets and in some cases it was not clear if the values were based on measurements, on calculations or a combination of both. Therefore, it was decided to use the term “reported” values when referring to such data. The methods and results from the HEU core transient analyses will be used for the potential LEU core configurations to predict the converted (LEU) core’s performance.« less
Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr

2014-12-15

In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less
Anisn-Dort Neutron-Gamma Flux Intercomparison Exercise for a Simple Testing Model

NASA Astrophysics Data System (ADS)

Boehmer, B.; Konheiser, J.; Borodkin, G.; Brodkin, E.; Egorov, A.; Kozhevnikov, A.; Zaritsky, S.; Manturov, G.; Voloschenko, A.

2003-06-01

The ability of transport codes ANISN, DORT, ROZ-6, MCNP and TRAMO, as well as nuclear data libraries BUGLE-96, ABBN-93, VITAMIN-B6 and ENDF/B-6 to deliver consistent gamma and neutron flux results was tested in the calculation of a one-dimensional cylindrical model consisting of a homogeneous core and an outer zone with a single material. Model variants with H2O, Fe, Cr and Ni in the outer zones were investigated. The results are compared with MCNP-ENDF/B-6 results. Discrepancies are discussed. The specified test model is proposed as a computational benchmark for testing calculation codes and data libraries.
Benchmarks for target tracking

NASA Astrophysics Data System (ADS)

Dunham, Darin T.; West, Philip D.

2011-09-01

The term benchmark originates from the chiseled horizontal marks that surveyors made, into which an angle-iron could be placed to bracket ("bench") a leveling rod, thus ensuring that the leveling rod can be repositioned in exactly the same place in the future. A benchmark in computer terms is the result of running a computer program, or a set of programs, in order to assess the relative performance of an object by running a number of standard tests and trials against it. This paper will discuss the history of simulation benchmarks that are being used by multiple branches of the military and agencies of the US government. These benchmarks range from missile defense applications to chemical biological situations. Typically, a benchmark is used with Monte Carlo runs in order to tease out how algorithms deal with variability and the range of possible inputs. We will also describe problems that can be solved by a benchmark.
A finite-volume module for all-scale Earth-system modelling at ECMWF

NASA Astrophysics Data System (ADS)

Kühnlein, Christian; Malardel, Sylvie; Smolarkiewicz, Piotr

2017-04-01

We highlight recent advancements in the development of the finite-volume module (FVM) (Smolarkiewicz et al., 2016) for the IFS at ECMWF. FVM represents an alternative dynamical core that complements the operational spectral dynamical core of the IFS with new capabilities. Most notably, these include a compact-stencil finite-volume discretisation, flexible meshes, conservative non-oscillatory transport and all-scale governing equations. As a default, FVM solves the compressible Euler equations in a geospherical framework (Szmelter and Smolarkiewicz, 2010). The formulation incorporates a generalised terrain-following vertical coordinate. A hybrid computational mesh, fully unstructured in the horizontal and structured in the vertical, enables efficient global atmospheric modelling. Moreover, a centred two-time-level semi-implicit integration scheme is employed with 3D implicit treatment of acoustic, buoyant, and rotational modes. The associated 3D elliptic Helmholtz problem is solved using a preconditioned Generalised Conjugate Residual approach. The solution procedure employs the non-oscillatory finite-volume MPDATA advection scheme that is bespoke for the compressible dynamics on the hybrid mesh (Kühnlein and Smolarkiewicz, 2017). The recent progress of FVM is illustrated with results of benchmark simulations of intermediate complexity, and comparison to the operational spectral dynamical core of the IFS. C. Kühnlein, P.K. Smolarkiewicz: An unstructured-mesh finite-volume MPDATA for compressible atmospheric dynamics, J. Comput. Phys. (2017), in press. P.K. Smolarkiewicz, W. Deconinck, M. Hamrud, C. Kühnlein, G. Mozdzynski, J. Szmelter, N.P. Wedi: A finite-volume module for simulating global all-scale atmospheric flows, J. Comput. Phys. 314 (2016) 287-304. J. Szmelter, P.K. Smolarkiewicz: An edge-based unstructured mesh discretisation in geospherical framework, J. Comput. Phys. 229 (2010) 4980-4995.
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

PubMed

Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

2017-06-16

Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
Results of a Neutronic Simulation of HTR-Proteus Core 4.2 using PEBBED and other INL Reactor Physics Tools: FY-09 Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hans D. Gougar

The Idaho National Laboratory’s deterministic neutronics analysis codes and methods were applied to the computation of the core multiplication factor of the HTR-Proteus pebble bed reactor critical facility. A combination of unit cell calculations (COMBINE-PEBDAN), 1-D discrete ordinates transport (SCAMP), and nodal diffusion calculations (PEBBED) were employed to yield keff and flux profiles. Preliminary results indicate that these tools, as currently configured and used, do not yield satisfactory estimates of keff. If control rods are not modeled, these methods can deliver much better agreement with experimental core eigenvalues which suggests that development efforts should focus on modeling control rod andmore » other absorber regions. Under some assumptions and in 1D subcore analyses, diffusion theory agrees well with transport. This suggests that developments in specific areas can produce a viable core simulation approach. Some corrections have been identified and can be further developed, specifically: treatment of the upper void region, treatment of inter-pebble streaming, and explicit (multiscale) transport modeling of TRISO fuel particles as a first step in cross section generation. Until corrections are made that yield better agreement with experiment, conclusions from core design and burnup analyses should be regarded as qualitative and not benchmark quality.« less
Benchmarking Memory Performance with the Data Cube Operator

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Shabanov, Leonid V.

2004-01-01

Data movement across a computer memory hierarchy and across computational grids is known to be a limiting factor for applications processing large data sets. We use the Data Cube Operator on an Arithmetic Data Set, called ADC, to benchmark capabilities of computers and of computational grids to handle large distributed data sets. We present a prototype implementation of a parallel algorithm for computation of the operatol: The algorithm follows a known approach for computing views from the smallest parent. The ADC stresses all levels of grid memory and storage by producing some of 2d views of an Arithmetic Data Set of d-tuples described by a small number of integers. We control data intensity of the ADC by selecting the tuple parameters, the sizes of the views, and the number of realized views. Benchmarking results of memory performance of a number of computer architectures and of a small computational grid are presented.
Implementation of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Investigation on the Core Bypass Flow in a Very High Temperature Reactor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hassan, Yassin

2013-10-22

Uncertainties associated with the core bypass flow are some of the key issues that directly influence the coolant mass flow distribution and magnitude, and thus the operational core temperature profiles, in the very high-temperature reactor (VHTR). Designers will attempt to configure the core geometry so the core cooling flow rate magnitude and distribution conform to the design values. The objective of this project is to study the bypass flow both experimentally and computationally. Researchers will develop experimental data using state-of-the-art particle image velocimetry in a small test facility. The team will attempt to obtain full field temperature distribution using racksmore » of thermocouples. The experimental data are intended to benchmark computational fluid dynamics (CFD) codes by providing detailed information. These experimental data are urgently needed for validation of the CFD codes. The following are the project tasks: • Construct a small-scale bench-top experiment to resemble the bypass flow between the graphite blocks, varying parameters to address their impact on bypass flow. Wall roughness of the graphite block walls, spacing between the blocks, and temperature of the blocks are some of the parameters to be tested. • Perform CFD to evaluate pre- and post-test calculations and turbulence models, including sensitivity studies to achieve high accuracy. • Develop the state-of-the art large eddy simulation (LES) using appropriate subgrid modeling. • Develop models to be used in systems thermal hydraulics codes to account and estimate the bypass flows. These computer programs include, among others, RELAP3D, MELCOR, GAMMA, and GAS-NET. Actual core bypass flow rate may vary considerably from the design value. Although the uncertainty of the bypass flow rate is not known, some sources have stated that the bypass flow rates in the Fort St. Vrain reactor were between 8 and 25 percent of the total reactor mass flow rate. If bypass flow rates are on the high side, the quantity of cooling flow through the core may be considerably less than the nominal design value, causing some regions of the core to operate at temperatures in excess of the design values. These effects are postulated to lead to localized hot regions in the core that must be considered when evaluating the VHTR operational and accident scenarios.« less
Coupled-cluster based approach for core-level states in condensed phase: Theory and application to different protonated forms of aqueous glycine

DOE PAGES

Sadybekov, Arman; Krylov, Anna I.

2017-07-07

A theoretical approach for calculating core-level states in condensed phase is presented. The approach is based on equation-of-motion coupled-cluster theory (EOMCC) and effective fragment potential (EFP) method. By introducing an approximate treatment of double excitations in the EOM-CCSD (EOM-CC with single and double substitutions) ansatz, we address poor convergence issues that are encountered for the core-level states and significantly reduce computational costs. While the approximations introduce relatively large errors in the absolute values of transition energies, the errors are systematic. Consequently, chemical shifts, changes in ionization energies relative to reference systems, are reproduced reasonably well. By using different protonation formsmore » of solvated glycine as a benchmark system, we show that our protocol is capable of reproducing the experimental chemical shifts with a quantitative accuracy. The results demonstrate that chemical shifts are very sensitive to the solvent interactions and that explicit treatment of solvent, such as EFP, is essential for achieving quantitative accuracy.« less
Coupled-cluster based approach for core-level states in condensed phase: Theory and application to different protonated forms of aqueous glycine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sadybekov, Arman; Krylov, Anna I.

A theoretical approach for calculating core-level states in condensed phase is presented. The approach is based on equation-of-motion coupled-cluster theory (EOMCC) and effective fragment potential (EFP) method. By introducing an approximate treatment of double excitations in the EOM-CCSD (EOM-CC with single and double substitutions) ansatz, we address poor convergence issues that are encountered for the core-level states and significantly reduce computational costs. While the approximations introduce relatively large errors in the absolute values of transition energies, the errors are systematic. Consequently, chemical shifts, changes in ionization energies relative to reference systems, are reproduced reasonably well. By using different protonation formsmore » of solvated glycine as a benchmark system, we show that our protocol is capable of reproducing the experimental chemical shifts with a quantitative accuracy. The results demonstrate that chemical shifts are very sensitive to the solvent interactions and that explicit treatment of solvent, such as EFP, is essential for achieving quantitative accuracy.« less
Recent advances in PC-Linux systems for electronic structure computations by optimized compilers and numerical libraries.

PubMed

Yu, Jen-Shiang K; Yu, Chin-Hui

2002-01-01

One of the most frequently used packages for electronic structure research, GAUSSIAN 98, is compiled on Linux systems with various hardware configurations, including AMD Athlon (with the "Thunderbird" core), AthlonMP, and AthlonXP (with the "Palomino" core) systems as well as the Intel Pentium 4 (with the "Willamette" core) machines. The default PGI FORTRAN compiler (pgf77) and the Intel FORTRAN compiler (ifc) are respectively employed with different architectural optimization options to compile GAUSSIAN 98 and test the performance improvement. In addition to the BLAS library included in revision A.11 of this package, the Automatically Tuned Linear Algebra Software (ATLAS) library is linked against the binary executables to improve the performance. Various Hartree-Fock, density-functional theories, and the MP2 calculations are done for benchmarking purposes. It is found that the combination of ifc with ATLAS library gives the best performance for GAUSSIAN 98 on all of these PC-Linux computers, including AMD and Intel CPUs. Even on AMD systems, the Intel FORTRAN compiler invariably produces binaries with better performance than pgf77. The enhancement provided by the ATLAS library is more significant for post-Hartree-Fock calculations. The performance on one single CPU is potentially as good as that on an Alpha 21264A workstation or an SGI supercomputer. The floating-point marks by SpecFP2000 have similar trends to the results of GAUSSIAN 98 package.
Benchmark problems for numerical implementations of phase field models

DOE PAGES

Jokisaari, A. M.; Voorhees, P. W.; Guyer, J. E.; ...

2016-10-01

Here, we present the first set of benchmark problems for phase field models that are being developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST). While many scientific research areas use a limited set of well-established software, the growing phase field community continues to develop a wide variety of codes and lacks benchmark problems to consistently evaluate the numerical performance of new implementations. Phase field modeling has become significantly more popular as computational power has increased and is now becoming mainstream, driving the need for benchmark problems to validate and verifymore » new implementations. We follow the example set by the micromagnetics community to develop an evolving set of benchmark problems that test the usability, computational resources, numerical capabilities and physical scope of phase field simulation codes. In this paper, we propose two benchmark problems that cover the physics of solute diffusion and growth and coarsening of a second phase via a simple spinodal decomposition model and a more complex Ostwald ripening model. We demonstrate the utility of benchmark problems by comparing the results of simulations performed with two different adaptive time stepping techniques, and we discuss the needs of future benchmark problems. The development of benchmark problems will enable the results of quantitative phase field models to be confidently incorporated into integrated computational materials science and engineering (ICME), an important goal of the Materials Genome Initiative.« less
76 FR 54209 - Corrosion-Resistant Carbon Steel Flat Products From the Republic of Korea: Preliminary Results of...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-31

... description of the merchandise is dispositive. Subsidies Valuation Information A. Benchmarks for Short-Term Financing For those programs requiring the application of a won-denominated, short-term interest rate... Issues and Decision Memorandum (CORE from Korea 2006 Decision Memorandum) at ``Benchmarks for Short-Term...
Implementation of NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry

2000-01-01

A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.
Evaluation of RAPID for a UNF cask benchmark problem

NASA Astrophysics Data System (ADS)

Mascolino, Valerio; Haghighat, Alireza; Roskoff, Nathan J.

2017-09-01

This paper examines the accuracy and performance of the RAPID (Real-time Analysis for Particle transport and In-situ Detection) code system for the simulation of a used nuclear fuel (UNF) cask. RAPID is capable of determining eigenvalue, subcritical multiplication, and pin-wise, axially-dependent fission density throughout a UNF cask. We study the source convergence based on the analysis of the different parameters used in an eigenvalue calculation in the MCNP Monte Carlo code. For this study, we consider a single assembly surrounded by absorbing plates with reflective boundary conditions. Based on the best combination of eigenvalue parameters, a reference MCNP solution for the single assembly is obtained. RAPID results are in excellent agreement with the reference MCNP solutions, while requiring significantly less computation time (i.e., minutes vs. days). A similar set of eigenvalue parameters is used to obtain a reference MCNP solution for the whole UNF cask. Because of time limitation, the MCNP results near the cask boundaries have significant uncertainties. Except for these, the RAPID results are in excellent agreement with the MCNP predictions, and its computation time is significantly lower, 35 second on 1 core versus 9.5 days on 16 cores.
A One-group, One-dimensional Transport Benchmark in Cylindrical Geometry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barry Ganapol; Abderrafi M. Ougouag

A 1-D, 1-group computational benchmark in cylndrical geometry is described. This neutron transport benchmark is useful for evaluating reactor concepts that possess azimuthal symmetry such as a pebble-bed reactor.
Research on computer systems benchmarking

NASA Technical Reports Server (NTRS)

Smith, Alan Jay (Principal Investigator)

1996-01-01

This grant addresses the topic of research on computer systems benchmarking and is more generally concerned with performance issues in computer systems. This report reviews work in those areas during the period of NASA support under this grant. The bulk of the work performed concerned benchmarking and analysis of CPUs, compilers, caches, and benchmark programs. The first part of this work concerned the issue of benchmark performance prediction. A new approach to benchmarking and machine characterization was reported, using a machine characterizer that measures the performance of a given system in terms of a Fortran abstract machine. Another report focused on analyzing compiler performance. The performance impact of optimization in the context of our methodology for CPU performance characterization was based on the abstract machine model. Benchmark programs are analyzed in another paper. A machine-independent model of program execution was developed to characterize both machine performance and program execution. By merging these machine and program characterizations, execution time can be estimated for arbitrary machine/program combinations. The work was continued into the domain of parallel and vector machines, including the issue of caches in vector processors and multiprocessors. All of the afore-mentioned accomplishments are more specifically summarized in this report, as well as those smaller in magnitude supported by this grant.

Benchmark problems and solutions

NASA Technical Reports Server (NTRS)

Tam, Christopher K. W.

1995-01-01

The scientific committee, after careful consideration, adopted six categories of benchmark problems for the workshop. These problems do not cover all the important computational issues relevant to Computational Aeroacoustics (CAA). The deciding factor to limit the number of categories to six was the amount of effort needed to solve these problems. For reference purpose, the benchmark problems are provided here. They are followed by the exact or approximate analytical solutions. At present, an exact solution for the Category 6 problem is not available.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S.; Yoginath, Srikanth B.

Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case study is presented here in reversing a computational core, namely, Basic Linear Algebra Subprograms, which is widely used in scientific applications. A new Reversible BLAS (RBLAS) library interface has been designed, and a prototype has been implemented with two modes: (1) amore » memory-mode in which reversibility is obtained by checkpointing to memory in forward and restoring from memory in reverse, and (2) a computational-mode in which nothing is saved in the forward, but restoration is done entirely via inverse computation in reverse. The article is focused on detailed performance benchmarking to evaluate the runtime dynamics and performance effects, comparing reversible computation with checkpointing on both traditional CPU platforms and recent GPU accelerator platforms. For BLAS Level-1 subprograms, data indicates over an order of magnitude better speed of reversible computation compared to checkpointing. For BLAS Level-2 and Level-3, a more complex tradeoff is observed between reversible computation and checkpointing, depending on computational and memory complexities of the subprograms.« less
HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adams, Mark; Brown, Jed; Shalf, John

2014-05-05

This document provides an overview of the benchmark ? HPGMG ? for ranking large scale general purpose computers for use on the Top500 list [8]. We provide a rationale for the need for a replacement for the current metric HPL, some background of the Top500 list and the challenges of developing such a metric; we discuss our design philosophy and methodology, and an overview of the specification of the benchmark. The primary documentation with maintained details on the specification can be found at hpgmg.org and the Wiki and benchmark code itself can be found in the repository https://bitbucket.org/hpgmg/hpgmg.
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems

DOE PAGES

Dongarra, Jack; Heroux, Michael A.; Luszczek, Piotr

2015-08-17

Here, we describe a new high-performance conjugate-gradient (HPCG) benchmark. HPCG is composed of computations and data-access patterns commonly found in scientific applications. HPCG strives for a better correlation to existing codes from the computational science domain and to be representative of their performance. Furthermore, HPCG is meant to help drive the computer system design and implementation in directions that will better impact future performance improvement.
Calculation of the Phenix end-of-life test 'Control Rod Withdrawal' with the ERANOS code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tiberi, V.

2012-07-01

The Inst. of Radiological Protection and Nuclear Safety (IRSN) acts as technical support to French public authorities. As such, IRSN is in charge of safety assessment of operating and under construction reactors, as well as future projects. In this framework, one current objective of IRSN is to evaluate the ability and accuracy of numerical tools to foresee consequences of accidents. Neutronic studies step in the safety assessment from different points of view among which the core design and its protection system. They are necessary to evaluate the core behavior in case of accident in order to assess the integrity ofmore » the first barrier and the absence of a prompt criticality risk. To reach this objective one main physical quantity has to be evaluated accurately: the neutronic power distribution in core during whole reactor lifetime. Phenix end of life tests, carried out in 2009, aim at increasing the experience feedback on sodium cooled fast reactors. These experiments have been done in the framework of the development of the 4. generation of nuclear reactors. Ten tests have been carried out: 6 on neutronic and fuel aspects, 2 on thermal hydraulics and 2 for the emergency shutdown. Two of them have been chosen for an international exercise on thermal hydraulics and neutronics in the frame of an IAEA Coordinated Research Project. Concerning neutronics, the Control Rod Withdrawal test is relevant for safety because it allows evaluating the capability of calculation tools to compute the radial power distribution on fast reactors core configurations in which the flux field is very deformed. IRSN participated to this benchmark with the ERANOS code developed by CEA for fast reactors studies. This paper presents the results obtained in the framework of the benchmark activity. A relatively good agreement was found with available measures considering the approximations done in the modeling. The work underlines the importance of burn-up calculations in order to have a fine core concentrations mesh for the calculation of the power distribution. (authors)« less
Screened exchange hybrid density functional for accurate and efficient structures and interaction energies.

PubMed

Brandenburg, Jan Gerit; Caldeweyher, Eike; Grimme, Stefan

2016-06-21

We extend the recently introduced PBEh-3c global hybrid density functional [S. Grimme et al., J. Chem. Phys., 2015, 143, 054107] by a screened Fock exchange variant based on the Henderson-Janesko-Scuseria exchange hole model. While the excellent performance of the global hybrid is maintained for small covalently bound molecules, its performance for computed condensed phase mass densities is further improved. Most importantly, a speed up of 30 to 50% can be achieved and especially for small orbital energy gap cases, the method is numerically much more robust. The latter point is important for many applications, e.g., for metal-organic frameworks, organic semiconductors, or protein structures. This enables an accurate density functional based electronic structure calculation of a full DNA helix structure on a single core desktop computer which is presented as an example in addition to comprehensive benchmark results.
Validation Data and Model Development for Fuel Assembly Response to Seismic Loads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bardet, Philippe; Ricciardi, Guillaume

2016-01-31

Vibrations are inherently present in nuclear reactors, especially in cores and steam generators of pressurized water reactors (PWR). They can have significant effects on local heat transfer and wear and tear in the reactor and often set safety margins. The simulation of these multiphysics phenomena from first principles requires the coupling of several codes, which is one the most challenging tasks in modern computer simulation. Here an ambitious multiphysics multidisciplinary validation campaign is conducted. It relied on an integrated team of experimentalists and code developers to acquire benchmark and validation data for fluid-structure interaction codes. Data are focused on PWRmore » fuel bundle behavior during seismic transients.« less
Supercomputer simulations of structure formation in the Universe

NASA Astrophysics Data System (ADS)

Ishiyama, Tomoaki

2017-06-01

We describe the implementation and performance results of our massively parallel MPI†/OpenMP‡ hybrid TreePM code for large-scale cosmological N-body simulations. For domain decomposition, a recursive multi-section algorithm is used and the size of domains are automatically set so that the total calculation time is the same for all processes. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. For two trillion particles benchmark simulation, the average performance on the fullsystem of K computer (82,944 nodes, the total number of core is 663,552) is 5.8 Pflops, which corresponds to 55% of the peak speed.
An efficient implementation of semi-numerical computation of the Hartree-Fock exchange on the Intel Phi processor

NASA Astrophysics Data System (ADS)

Liu, Fenglai; Kong, Jing

2018-07-01

Unique technical challenges and their solutions for implementing semi-numerical Hartree-Fock exchange on the Phil Processor are discussed, especially concerning the single- instruction-multiple-data type of processing and small cache size. Benchmark calculations on a series of buckyball molecules with various Gaussian basis sets on a Phi processor and a six-core CPU show that the Phi processor provides as much as 12 times of speedup with large basis sets compared with the conventional four-center electron repulsion integration approach performed on the CPU. The accuracy of the semi-numerical scheme is also evaluated and found to be comparable to that of the resolution-of-identity approach.
Second Computational Aeroacoustics (CAA) Workshop on Benchmark Problems

NASA Technical Reports Server (NTRS)

Tam, C. K. W. (Editor); Hardin, J. C. (Editor)

1997-01-01

The proceedings of the Second Computational Aeroacoustics (CAA) Workshop on Benchmark Problems held at Florida State University are the subject of this report. For this workshop, problems arising in typical industrial applications of CAA were chosen. Comparisons between numerical solutions and exact solutions are presented where possible.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0

NASA Technical Reports Server (NTRS)

Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine

2004-01-01

We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Service profiling and outcomes benchmarking using the CORE-OM: toward practice-based evidence in the psychological therapies. Clinical Outcomes in Routine Evaluation-Outcome Measures.

PubMed

Barkham, M; Margison, F; Leach, C; Lucock, M; Mellor-Clark, J; Evans, C; Benson, L; Connell, J; Audin, K; McGrath, G

2001-04-01

To complement the evidence-based practice paradigm, the authors argued for a core outcome measure to provide practice-based evidence for the psychological therapies. Utility requires instruments that are acceptable scientifically, as well as to service users, and a coordinated implementation of the measure at a national level. The development of the Clinical Outcomes in Routine Evaluation-Outcome Measure (CORE-OM) is summarized. Data are presented across 39 secondary-care services (n = 2,710) and within an intensively evaluated single service (n = 1,455). Results suggest that the CORE-OM is a valid and reliable measure for multiple settings and is acceptable to users and clinicians as well as policy makers. Baseline data levels of patient presenting problem severity, including risk, are reported in addition to outcome benchmarks that use the concept of reliable and clinically significant change. Basic quality improvement in outcomes for a single service is considered.
Addressing capability computing challenges of high-resolution global climate modelling at the Oak Ridge Leadership Computing Facility

NASA Astrophysics Data System (ADS)

Anantharaj, Valentine; Norman, Matthew; Evans, Katherine; Taylor, Mark; Worley, Patrick; Hack, James; Mayer, Benjamin

2014-05-01

During 2013, high-resolution climate model simulations accounted for over 100 million "core hours" using Titan at the Oak Ridge Leadership Computing Facility (OLCF). The suite of climate modeling experiments, primarily using the Community Earth System Model (CESM) at nearly 0.25 degree horizontal resolution, generated over a petabyte of data and nearly 100,000 files, ranging in sizes from 20 MB to over 100 GB. Effective utilization of leadership class resources requires careful planning and preparation. The application software, such as CESM, need to be ported, optimized and benchmarked for the target platform in order to meet the computational readiness requirements. The model configuration needs to be "tuned and balanced" for the experiments. This can be a complicated and resource intensive process, especially for high-resolution configurations using complex physics. The volume of I/O also increases with resolution; and new strategies may be required to manage I/O especially for large checkpoint and restart files that may require more frequent output for resiliency. It is also essential to monitor the application performance during the course of the simulation exercises. Finally, the large volume of data needs to be analyzed to derive the scientific results; and appropriate data and information delivered to the stakeholders. Titan is currently the largest supercomputer available for open science. The computational resources, in terms of "titan core hours" are allocated primarily via the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) and ASCR Leadership Computing Challenge (ALCC) programs, both sponsored by the U.S. Department of Energy (DOE) Office of Science. Titan is a Cray XK7 system, capable of a theoretical peak performance of over 27 PFlop/s, consists of 18,688 compute nodes, with a NVIDIA Kepler K20 GPU and a 16-core AMD Opteron CPU in every node, for a total of 299,008 Opteron cores and 18,688 GPUs offering a cumulative 560,640 equivalent cores. Scientific applications, such as CESM, are also required to demonstrate a "computational readiness capability" to efficiently scale across and utilize 20% of the entire system. The 0,25 deg configuration of the spectral element dynamical core of the Community Atmosphere Model (CAM-SE), the atmospheric component of CESM, has been demonstrated to scale efficiently across more than 5,000 nodes (80,000 CPU cores) on Titan. The tracer transport routines of CAM-SE have also been ported to take advantage of the hybrid many-core architecture of Titan using GPUs [see EGU2014-4233], yielding over 2X speedup when transporting over 100 tracers. The high throughput I/O in CESM, based on the Parallel IO Library (PIO), is being further augmented to support even higher resolutions and enhance resiliency. The application performance of the individual runs are archived in a database and routinely analyzed to identify and rectify performance degradation during the course of the experiments. The various resources available at the OLCF now support a scientific workflow to facilitate high-resolution climate modelling. A high-speed center-wide parallel file system, called ATLAS, capable of 1 TB/s, is available on Titan as well as on the clusters used for analysis (Rhea) and visualization (Lens/EVEREST). Long-term archive is facilitated by the HPSS storage system. The Earth System Grid (ESG), featuring search & discovery, is also used to deliver data. The end-to-end workflow allows OLCF users to efficiently share data and publish results in a timely manner.
A Computational Fluid Dynamic and Heat Transfer Model for Gaseous Core and Gas Cooled Space Power and Propulsion Reactors

NASA Technical Reports Server (NTRS)

Anghaie, S.; Chen, G.

1996-01-01

A computational model based on the axisymmetric, thin-layer Navier-Stokes equations is developed to predict the convective, radiation and conductive heat transfer in high temperature space nuclear reactors. An implicit-explicit, finite volume, MacCormack method in conjunction with the Gauss-Seidel line iteration procedure is utilized to solve the thermal and fluid governing equations. Simulation of coolant and propellant flows in these reactors involves the subsonic and supersonic flows of hydrogen, helium and uranium tetrafluoride under variable boundary conditions. An enthalpy-rebalancing scheme is developed and implemented to enhance and accelerate the rate of convergence when a wall heat flux boundary condition is used. The model also incorporated the Baldwin and Lomax two-layer algebraic turbulence scheme for the calculation of the turbulent kinetic energy and eddy diffusivity of energy. The Rosseland diffusion approximation is used to simulate the radiative energy transfer in the optically thick environment of gas core reactors. The computational model is benchmarked with experimental data on flow separation angle and drag force acting on a suspended sphere in a cylindrical tube. The heat transfer is validated by comparing the computed results with the standard heat transfer correlations predictions. The model is used to simulate flow and heat transfer under a variety of design conditions. The effect of internal heat generation on the heat transfer in the gas core reactors is examined for a variety of power densities, 100 W/cc, 500 W/cc and 1000 W/cc. The maximum temperature, corresponding with the heat generation rates, are 2150 K, 2750 K and 3550 K, respectively. This analysis shows that the maximum temperature is strongly dependent on the value of heat generation rate. It also indicates that a heat generation rate higher than 1000 W/cc is necessary to maintain the gas temperature at about 3500 K, which is typical design temperature required to achieve high efficiency in the gas core reactors. The model is also used to predict the convective and radiation heat fluxes for the gas core reactors. The maximum value of heat flux occurs at the exit of the reactor core. Radiation heat flux increases with higher wall temperature. This behavior is due to the fact that the radiative heat flux is strongly dependent on wall temperature. This study also found that at temperature close to 3500 K the radiative heat flux is comparable with the convective heat flux in a uranium fluoride failed gas core reactor.
How to Advance TPC Benchmarks with Dependability Aspects

NASA Astrophysics Data System (ADS)

Almeida, Raquel; Poess, Meikel; Nambiar, Raghunath; Patil, Indira; Vieira, Marco

Transactional systems are the core of the information systems of most organizations. Although there is general acknowledgement that failures in these systems often entail significant impact both on the proceeds and reputation of companies, the benchmarks developed and managed by the Transaction Processing Performance Council (TPC) still maintain their focus on reporting bare performance. Each TPC benchmark has to pass a list of dependability-related tests (to verify ACID properties), but not all benchmarks require measuring their performances. While TPC-E measures the recovery time of some system failures, TPC-H and TPC-C only require functional correctness of such recovery. Consequently, systems used in TPC benchmarks are tuned mostly for performance. In this paper we argue that nowadays systems should be tuned for a more comprehensive suite of dependability tests, and that a dependability metric should be part of TPC benchmark publications. The paper discusses WHY and HOW this can be achieved. Two approaches are introduced and discussed: augmenting each TPC benchmark in a customized way, by extending each specification individually; and pursuing a more unified approach, defining a generic specification that could be adjoined to any TPC benchmark.
The MCNP6 Analytic Criticality Benchmark Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, Forrest B.

2016-06-16

Analytical benchmarks provide an invaluable tool for verifying computer codes used to simulate neutron transport. Several collections of analytical benchmark problems [1-4] are used routinely in the verification of production Monte Carlo codes such as MCNP® [5,6]. Verification of a computer code is a necessary prerequisite to the more complex validation process. The verification process confirms that a code performs its intended functions correctly. The validation process involves determining the absolute accuracy of code results vs. nature. In typical validations, results are computed for a set of benchmark experiments using a particular methodology (code, cross-section data with uncertainties, and modeling)more » and compared to the measured results from the set of benchmark experiments. The validation process determines bias, bias uncertainty, and possibly additional margins. Verification is generally performed by the code developers, while validation is generally performed by code users for a particular application space. The VERIFICATION_KEFF suite of criticality problems [1,2] was originally a set of 75 criticality problems found in the literature for which exact analytical solutions are available. Even though the spatial and energy detail is necessarily limited in analytical benchmarks, typically to a few regions or energy groups, the exact solutions obtained can be used to verify that the basic algorithms, mathematics, and methods used in complex production codes perform correctly. The present work has focused on revisiting this benchmark suite. A thorough review of the problems resulted in discarding some of them as not suitable for MCNP benchmarking. For the remaining problems, many of them were reformulated to permit execution in either multigroup mode or in the normal continuous-energy mode for MCNP. Execution of the benchmarks in continuous-energy mode provides a significant advance to MCNP verification methods.« less
Phase field benchmark problems for dendritic growth and linear elasticity

DOE PAGES

Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.; ...

2018-03-26

We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
Phase field benchmark problems for dendritic growth and linear elasticity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.

We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
High-performance computational fluid dynamics: a custom-code approach

NASA Astrophysics Data System (ADS)

Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.

2016-07-01

We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.
Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing

NASA Astrophysics Data System (ADS)

Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C.; Gao, Wen

2018-05-01

The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of GPU. We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation and the memory access are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU to resolve the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which has harmoniously leveraged the advantages of GPU platforms, and yielded significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.

Evaluating the Efficacy of the Cloud for Cluster Computation

NASA Technical Reports Server (NTRS)

Knight, David; Shams, Khawaja; Chang, George; Soderstrom, Tom

2012-01-01

Computing requirements vary by industry, and it follows that NASA and other research organizations have computing demands that fall outside the mainstream. While cloud computing made rapid inroads for tasks such as powering web applications, performance issues on highly distributed tasks hindered early adoption for scientific computation. One venture to address this problem is Nebula, NASA's homegrown cloud project tasked with delivering science-quality cloud computing resources. However, another industry development is Amazon's high-performance computing (HPC) instances on Elastic Cloud Compute (EC2) that promises improved performance for cluster computation. This paper presents results from a series of benchmarks run on Amazon EC2 and discusses the efficacy of current commercial cloud technology for running scientific applications across a cluster. In particular, a 240-core cluster of cloud instances achieved 2 TFLOPS on High-Performance Linpack (HPL) at 70% of theoretical computational performance. The cluster's local network also demonstrated sub-100 ?s inter-process latency with sustained inter-node throughput in excess of 8 Gbps. Beyond HPL, a real-world Hadoop image processing task from NASA's Lunar Mapping and Modeling Project (LMMP) was run on a 29 instance cluster to process lunar and Martian surface images with sizes on the order of tens of gigapixels. These results demonstrate that while not a rival of dedicated supercomputing clusters, commercial cloud technology is now a feasible option for moderately demanding scientific workloads.
Initial Coupling of the RELAP-7 and PRONGHORN Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

J. Ortensi; D. Andrs; A.A. Bingham

2012-10-01

Modern nuclear reactor safety codes require the ability to solve detailed coupled neutronic- thermal fluids problems. For larger cores, this implies fully coupled higher dimensionality spatial dynamics with appropriate feedback models that can provide enough resolution to accurately compute core heat generation and removal during steady and unsteady conditions. The reactor analysis code PRONGHORN is being coupled to RELAP-7 as a first step to extend RELAP’s current capabilities. This report details the mathematical models, the type of coupling, and the testing results from the integrated system. RELAP-7 is a MOOSE-based application that solves the continuity, momentum, and energy equations inmore » 1-D for a compressible fluid. The pipe and joint capabilities enable it to model parts of the power conversion unit. The PRONGHORN application, also developed on the MOOSE infrastructure, solves the coupled equations that define the neutron diffusion, fluid flow, and heat transfer in a full core model. The two systems are loosely coupled to simplify the transition towards a more complex infrastructure. The integration is tested on a simplified version of the OECD/NEA MHTGR-350 Coupled Neutronics-Thermal Fluids benchmark model.« less
Thermo-hydro-mechanical-chemical processes in fractured-porous media: Benchmarks and examples

NASA Astrophysics Data System (ADS)

Kolditz, O.; Shao, H.; Görke, U.; Kalbacher, T.; Bauer, S.; McDermott, C. I.; Wang, W.

2012-12-01

The book comprises an assembly of benchmarks and examples for porous media mechanics collected over the last twenty years. Analysis of thermo-hydro-mechanical-chemical (THMC) processes is essential to many applications in environmental engineering, such as geological waste deposition, geothermal energy utilisation, carbon capture and storage, water resources management, hydrology, even climate change. In order to assess the feasibility as well as the safety of geotechnical applications, process-based modelling is the only tool to put numbers, i.e. to quantify future scenarios. This charges a huge responsibility concerning the reliability of computational tools. Benchmarking is an appropriate methodology to verify the quality of modelling tools based on best practices. Moreover, benchmarking and code comparison foster community efforts. The benchmark book is part of the OpenGeoSys initiative - an open source project to share knowledge and experience in environmental analysis and scientific computation.
Third Computational Aeroacoustics (CAA) Workshop on Benchmark Problems

NASA Technical Reports Server (NTRS)

Dahl, Milo D. (Editor)

2000-01-01

The proceedings of the Third Computational Aeroacoustics (CAA) Workshop on Benchmark Problems cosponsored by the Ohio Aerospace Institute and the NASA Glenn Research Center are the subject of this report. Fan noise was the chosen theme for this workshop with representative problems encompassing four of the six benchmark problem categories. The other two categories were related to jet noise and cavity noise. For the first time in this series of workshops, the computational results for the cavity noise problem were compared to experimental data. All the other problems had exact solutions, which are included in this report. The Workshop included a panel discussion by representatives of industry. The participants gave their views on the status of applying computational aeroacoustics to solve practical industry related problems and what issues need to be addressed to make CAA a robust design tool.
Comparing performance of many-core CPUs and GPUs for static and motion compensated reconstruction of C-arm CT data.

PubMed

Hofmann, Hannes G; Keck, Benjamin; Rohkohl, Christopher; Hornegger, Joachim

2011-01-01

Interventional reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. Hardware optimization is not an option but mandatory for interventional image processing and, in particular, for image reconstruction due to the high demands on performance. Several groups have published fast analytical 3-D reconstruction on highly parallel hardware such as GPUs to mitigate this issue. The authors show that the performance of modern CPU-based systems is in the same order as current GPUs for static 3-D reconstruction and outperforms them for a recent motion compensated (3-D+time) image reconstruction algorithm. This work investigates two algorithms: Static 3-D reconstruction as well as a recent motion compensated algorithm. The evaluation was performed using a standardized reconstruction benchmark, RABBITCT, to get comparable results and two additional clinical data sets. The authors demonstrate for a parametric B-spline motion estimation scheme that the derivative computation, which requires many write operations to memory, performs poorly on the GPU and can highly benefit from modern CPU architectures with large caches. Moreover, on a 32-core Intel Xeon server system, the authors achieve linear scaling with the number of cores used and reconstruction times almost in the same range as current GPUs. Algorithmic innovations in the field of motion compensated image reconstruction may lead to a shift back to CPUs in the future. For analytical 3-D reconstruction, the authors show that the gap between GPUs and CPUs became smaller. It can be performed in less than 20 s (on-the-fly) using a 32-core server.
Deterministic Modeling of the High Temperature Test Reactor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ortensi, J.; Cogliati, J. J.; Pope, M. A.

2010-06-01

Idaho National Laboratory (INL) is tasked with the development of reactor physics analysis capability of the Next Generation Nuclear Power (NGNP) project. In order to examine INL’s current prismatic reactor deterministic analysis tools, the project is conducting a benchmark exercise based on modeling the High Temperature Test Reactor (HTTR). This exercise entails the development of a model for the initial criticality, a 19 column thin annular core, and the fully loaded core critical condition with 30 columns. Special emphasis is devoted to the annular core modeling, which shares more characteristics with the NGNP base design. The DRAGON code is usedmore » in this study because it offers significant ease and versatility in modeling prismatic designs. Despite some geometric limitations, the code performs quite well compared to other lattice physics codes. DRAGON can generate transport solutions via collision probability (CP), method of characteristics (MOC), and discrete ordinates (Sn). A fine group cross section library based on the SHEM 281 energy structure is used in the DRAGON calculations. HEXPEDITE is the hexagonal z full core solver used in this study and is based on the Green’s Function solution of the transverse integrated equations. In addition, two Monte Carlo (MC) based codes, MCNP5 and PSG2/SERPENT, provide benchmarking capability for the DRAGON and the nodal diffusion solver codes. The results from this study show a consistent bias of 2–3% for the core multiplication factor. This systematic error has also been observed in other HTTR benchmark efforts and is well documented in the literature. The ENDF/B VII graphite and U235 cross sections appear to be the main source of the error. The isothermal temperature coefficients calculated with the fully loaded core configuration agree well with other benchmark participants but are 40% higher than the experimental values. This discrepancy with the measurement stems from the fact that during the experiments the control rods were adjusted to maintain criticality, whereas in the model, the rod positions were fixed. In addition, this work includes a brief study of a cross section generation approach that seeks to decouple the domain in order to account for neighbor effects. This spectral interpenetration is a dominant effect in annular HTR physics. This analysis methodology should be further explored in order to reduce the error that is systematically propagated in the traditional generation of cross sections.« less
Towards reversible basic linear algebra subprograms: A performance study

DOE PAGES

Perumalla, Kalyan S.; Yoginath, Srikanth B.

2014-12-06

Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case study is presented here in reversing a computational core, namely, Basic Linear Algebra Subprograms, which is widely used in scientific applications. A new Reversible BLAS (RBLAS) library interface has been designed, and a prototype has been implemented with two modes: (1) amore » memory-mode in which reversibility is obtained by checkpointing to memory in forward and restoring from memory in reverse, and (2) a computational-mode in which nothing is saved in the forward, but restoration is done entirely via inverse computation in reverse. The article is focused on detailed performance benchmarking to evaluate the runtime dynamics and performance effects, comparing reversible computation with checkpointing on both traditional CPU platforms and recent GPU accelerator platforms. For BLAS Level-1 subprograms, data indicates over an order of magnitude better speed of reversible computation compared to checkpointing. For BLAS Level-2 and Level-3, a more complex tradeoff is observed between reversible computation and checkpointing, depending on computational and memory complexities of the subprograms.« less
Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images.

PubMed

Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy

2017-10-06

The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
Benchmark gas core critical experiment.

NASA Technical Reports Server (NTRS)

Kunze, J. F.; Lofthouse, J. H.; Cooper, C. G.; Hyland, R. E.

1972-01-01

A critical experiment with spherical symmetry has been conducted on the gas core nuclear reactor concept. The nonspherical perturbations in the experiment were evaluated experimentally and produce corrections to the observed eigenvalue of approximately 1% delta k. The reactor consisted of a low density, central uranium hexafluoride gaseous core, surrounded by an annulus of void or low density hydrocarbon, which in turn was surrounded with a 97-cm-thick heavy water reflector.
Benchmarking Brain-Computer Interfaces Outside the Laboratory: The Cybathlon 2016

PubMed Central

Novak, Domen; Sigrist, Roland; Gerig, Nicolas J.; Wyss, Dario; Bauer, René; Götz, Ulrich; Riener, Robert

2018-01-01

This paper presents a new approach to benchmarking brain-computer interfaces (BCIs) outside the lab. A computer game was created that mimics a real-world application of assistive BCIs, with the main outcome metric being the time needed to complete the game. This approach was used at the Cybathlon 2016, a competition for people with disabilities who use assistive technology to achieve tasks. The paper summarizes the technical challenges of BCIs, describes the design of the benchmarking game, then describes the rules for acceptable hardware, software and inclusion of human pilots in the BCI competition at the Cybathlon. The 11 participating teams, their approaches, and their results at the Cybathlon are presented. Though the benchmarking procedure has some limitations (for instance, we were unable to identify any factors that clearly contribute to BCI performance), it can be successfully used to analyze BCI performance in realistic, less structured conditions. In the future, the parameters of the benchmarking game could be modified to better mimic different applications (e.g., the need to use some commands more frequently than others). Furthermore, the Cybathlon has the potential to showcase such devices to the general public. PMID:29375294
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Implicit time-integration method for simultaneous solution of a coupled non-linear system

NASA Astrophysics Data System (ADS)

Watson, Justin Kyle

Historically large physical problems have been divided into smaller problems based on the physics involved. This is no different in reactor safety analysis. The problem of analyzing a nuclear reactor for design basis accidents is performed by a handful of computer codes each solving a portion of the problem. The reactor thermal hydraulic response to an event is determined using a system code like TRAC RELAP Advanced Computational Engine (TRACE). The core power response to the same accident scenario is determined using a core physics code like Purdue Advanced Core Simulator (PARCS). Containment response to the reactor depressurization in a Loss Of Coolant Accident (LOCA) type event is calculated by a separate code. Sub-channel analysis is performed with yet another computer code. This is just a sample of the computer codes used to solve the overall problems of nuclear reactor design basis accidents. Traditionally each of these codes operates independently from each other using only the global results from one calculation as boundary conditions to another. Industry's drive to uprate power for reactors has motivated analysts to move from a conservative approach to design basis accident towards a best estimate method. To achieve a best estimate calculation efforts have been aimed at coupling the individual physics models to improve the accuracy of the analysis and reduce margins. The current coupling techniques are sequential in nature. During a calculation time-step data is passed between the two codes. The individual codes solve their portion of the calculation and converge to a solution before the calculation is allowed to proceed to the next time-step. This thesis presents a fully implicit method of simultaneous solving the neutron balance equations, heat conduction equations and the constitutive fluid dynamics equations. It discusses the problems involved in coupling different physics phenomena within multi-physics codes and presents a solution to these problems. The thesis also outlines the basic concepts behind the nodal balance equations, heat transfer equations and the thermal hydraulic equations, which will be coupled to form a fully implicit nonlinear system of equations. The coupling of separate physics models to solve a larger problem and improve accuracy and efficiency of a calculation is not a new idea, however implementing them in an implicit manner and solving the system simultaneously is. Also the application to reactor safety codes is new and has not be done with thermal hydraulics and neutronics codes on realistic applications in the past. The coupling technique described in this thesis is applicable to other similar coupled thermal hydraulic and core physics reactor safety codes. This technique is demonstrated using coupled input decks to show that the system is solved correctly and then verified by using two derivative test problems based on international benchmark problems the OECD/NRC Three mile Island (TMI) Main Steam Line Break (MSLB) problem (representative of pressurized water reactor analysis) and the OECD/NRC Peach Bottom (PB) Turbine Trip (TT) benchmark (representative of boiling water reactor analysis).
A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design.

PubMed

Ó Conchúir, Shane; Barlow, Kyle A; Pache, Roland A; Ollikainen, Noah; Kundert, Kale; O'Meara, Matthew J; Smith, Colin A; Kortemme, Tanja

2015-01-01

The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.
New Multi-group Transport Neutronics (PHISICS) Capabilities for RELAP5-3D and its Application to Phase I of the OECD/NEA MHTGR-350 MW Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhard Strydom; Cristian Rabiti; Andrea Alfonsi

2012-10-01

PHISICS is a neutronics code system currently under development at the Idaho National Laboratory (INL). Its goal is to provide state of the art simulation capability to reactor designers. The different modules for PHISICS currently under development are a nodal and semi-structured transport core solver (INSTANT), a depletion module (MRTAU) and a cross section interpolation (MIXER) module. The INSTANT module is the most developed of the mentioned above. Basic functionalities are ready to use, but the code is still in continuous development to extend its capabilities. This paper reports on the effort of coupling the nodal kinetics code package PHISICSmore » (INSTANT/MRTAU/MIXER) to the thermal hydraulics system code RELAP5-3D, to enable full core and system modeling. This will enable the possibility to model coupled (thermal-hydraulics and neutronics) problems with more options for 3D neutron kinetics, compared to the existing diffusion theory neutron kinetics module in RELAP5-3D (NESTLE). In the second part of the paper, an overview of the OECD/NEA MHTGR-350 MW benchmark is given. This benchmark has been approved by the OECD, and is based on the General Atomics 350 MW Modular High Temperature Gas Reactor (MHTGR) design. The benchmark includes coupled neutronics thermal hydraulics exercises that require more capabilities than RELAP5-3D with NESTLE offers. Therefore, the MHTGR benchmark makes extensive use of the new PHISICS/RELAP5-3D coupling capabilities. The paper presents the preliminary results of the three steady state exercises specified in Phase I of the benchmark using PHISICS/RELAP5-3D.« less
An Integrated Development Environment for Adiabatic Quantum Programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humble, Travis S; McCaskey, Alex; Bennink, Ryan S

2014-01-01

Adiabatic quantum computing is a promising route to the computational power afforded by quantum information processing. The recent availability of adiabatic hardware raises the question of how well quantum programs perform. Benchmarking behavior is challenging since the multiple steps to synthesize an adiabatic quantum program are highly tunable. We present an adiabatic quantum programming environment called JADE that provides control over all the steps taken during program development. JADE captures the workflow needed to rigorously benchmark performance while also allowing a variety of problem types, programming techniques, and processor configurations. We have also integrated JADE with a quantum simulation enginemore » that enables program profiling using numerical calculation. The computational engine supports plug-ins for simulation methodologies tailored to various metrics and computing resources. We present the design, integration, and deployment of JADE and discuss its use for benchmarking adiabatic quantum programs.« less
MARC calculations for the second WIPP structural benchmark problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morgan, H.S.

1981-05-01

This report describes calculations made with the MARC structural finite element code for the second WIPP structural benchmark problem. Specific aspects of problem implementation such as element choice, slip line modeling, creep law implementation, and thermal-mechanical coupling are discussed in detail. Also included are the computational results specified in the benchmark problem formulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Marck, Steven C. van der, E-mail: vandermarck@nrg.eu

Recent releases of three major world nuclear reaction data libraries, ENDF/B-VII.1, JENDL-4.0, and JEFF-3.1.1, have been tested extensively using benchmark calculations. The calculations were performed with the latest release of the continuous energy Monte Carlo neutronics code MCNP, i.e. MCNP6. Three types of benchmarks were used, viz. criticality safety benchmarks, (fusion) shielding benchmarks, and reference systems for which the effective delayed neutron fraction is reported. For criticality safety, more than 2000 benchmarks from the International Handbook of Criticality Safety Benchmark Experiments were used. Benchmarks from all categories were used, ranging from low-enriched uranium, compound fuel, thermal spectrum ones (LEU-COMP-THERM), tomore » mixed uranium-plutonium, metallic fuel, fast spectrum ones (MIX-MET-FAST). For fusion shielding many benchmarks were based on IAEA specifications for the Oktavian experiments (for Al, Co, Cr, Cu, LiF, Mn, Mo, Si, Ti, W, Zr), Fusion Neutronics Source in Japan (for Be, C, N, O, Fe, Pb), and Pulsed Sphere experiments at Lawrence Livermore National Laboratory (for {sup 6}Li, {sup 7}Li, Be, C, N, O, Mg, Al, Ti, Fe, Pb, D2O, H2O, concrete, polyethylene and teflon). The new functionality in MCNP6 to calculate the effective delayed neutron fraction was tested by comparison with more than thirty measurements in widely varying systems. Among these were measurements in the Tank Critical Assembly (TCA in Japan) and IPEN/MB-01 (Brazil), both with a thermal spectrum, two cores in Masurca (France) and three cores in the Fast Critical Assembly (FCA, Japan), all with fast spectra. The performance of the three libraries, in combination with MCNP6, is shown to be good. The results for the LEU-COMP-THERM category are on average very close to the benchmark value. Also for most other categories the results are satisfactory. Deviations from the benchmark values do occur in certain benchmark series, or in isolated cases within benchmark series. Such instances can often be related to nuclear data for specific non-fissile elements, such as C, Fe, or Gd. Indications are that the intermediate and mixed spectrum cases are less well described. The results for the shielding benchmarks are generally good, with very similar results for the three libraries in the majority of cases. Nevertheless there are, in certain cases, strong deviations between calculated and benchmark values, such as for Co and Mg. Also, the results show discrepancies at certain energies or angles for e.g. C, N, O, Mo, and W. The functionality of MCNP6 to calculate the effective delayed neutron fraction yields very good results for all three libraries.« less
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.

PubMed

Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal

2015-06-01

Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
Benchmark Evaluation of Fuel Effect and Material Worth Measurements for a Beryllium-Reflected Space Reactor Mockup

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marshall, Margaret A.; Bess, John D.

2015-02-01

The critical configuration of the small, compact critical assembly (SCCA) experiments performed at the Oak Ridge Critical Experiments Facility (ORCEF) in 1962-1965 have been evaluated as acceptable benchmark experiments for inclusion in the International Handbook of Evaluated Criticality Safety Benchmark Experiments. The initial intent of these experiments was to support the design of the Medium Power Reactor Experiment (MPRE) program, whose purpose was to study “power plants for the production of electrical power in space vehicles.” The third configuration in this series of experiments was a beryllium-reflected assembly of stainless-steel-clad, highly enriched uranium (HEU)-O 2 fuel mockup of a potassium-cooledmore » space power reactor. Reactivity measurements cadmium ratio spectral measurements and fission rate measurements were measured through the core and top reflector. Fuel effect worth measurements and neutron moderating and absorbing material worths were also measured in the assembly fuel region. The cadmium ratios, fission rate, and worth measurements were evaluated for inclusion in the International Handbook of Evaluated Criticality Safety Benchmark Experiments. The fuel tube effect and neutron moderating and absorbing material worth measurements are the focus of this paper. Additionally, a measurement of the worth of potassium filling the core region was performed but has not yet been evaluated Pellets of 93.15 wt.% enriched uranium dioxide (UO 2) were stacked in 30.48 cm tall stainless steel fuel tubes (0.3 cm tall end caps). Each fuel tube had 26 pellets with a total mass of 295.8 g UO 2 per tube. 253 tubes were arranged in 1.506-cm triangular lattice. An additional 7-tube cluster critical configuration was also measured but not used for any physics measurements. The core was surrounded on all side by a beryllium reflector. The fuel effect worths were measured by removing fuel tubes at various radius. An accident scenario was also simulated by moving outward twenty fuel rods from the periphery of the core so they were touching the core tank. The change in the system reactivity when the fuel tube(s) were removed/moved compared with the base configuration was the worth of the fuel tubes or accident scenario. The worth of neutron absorbing and moderating materials was measured by inserting material rods into the core at regular intervals or placing lids at the top of the core tank. Stainless steel 347, tungsten, niobium, polyethylene, graphite, boron carbide, aluminum and cadmium rods and/or lid worths were all measured. The change in the system reactivity when a material was inserted into the core is the worth of the material.« less

Coalescent: an open-science framework for importance sampling in coalescent theory.

PubMed

Tewari, Susanta; Spouge, John L

2015-01-01

Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schemes computationally or benchmark against different schemes, in a manner that is reliable and maintainable. Moreover, most studies use computer programs lacking a convenient user interface or the flexibility to meet the current demands of open science. In particular, current computer frameworks can only evaluate the efficiency of a single importance sampling scheme or compare the efficiencies of different schemes in an ad hoc manner. Results. We have designed a general framework (http://coalescent.sourceforge.net; language: Java; License: GPLv3) for importance sampling that computes likelihoods under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. For a given dataset, it computes the likelihood and provides the maximum likelihood estimate of the mutation parameter. Well-known benchmarks in the coalescent literature validate the accuracy of the framework. The framework provides an intuitive user interface with minimal clutter. For performance, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework reliable and maintainable. Conclusions. In coalescent theory, many studies of computational efficiency consider only effective sample size. Here, we evaluate proposals in the coalescent literature, to discover that the order of efficiency among the three importance sampling schemes changes when one considers running time as well as effective sample size. We also describe a computational technique called "just-in-time delegation" available to improve the trade-off between running time and precision by constructing improved importance sampling schemes from existing ones. Thus, our systems approach is a potential solution to the "2(8) programs problem" highlighted by Felsenstein, because it provides the flexibility to include or exclude various features of similar coalescent models or importance sampling schemes.
Computers for real time flight simulation: A market survey

NASA Technical Reports Server (NTRS)

Bekey, G. A.; Karplus, W. J.

1977-01-01

An extensive computer market survey was made to determine those available systems suitable for current and future flight simulation studies at Ames Research Center. The primary requirement is for the computation of relatively high frequency content (5 Hz) math models representing powered lift flight vehicles. The Rotor Systems Research Aircraft (RSRA) was used as a benchmark vehicle for computation comparison studies. The general nature of helicopter simulations and a description of the benchmark model are presented, and some of the sources of simulation difficulties are examined. A description of various applicable computer architectures is presented, along with detailed discussions of leading candidate systems and comparisons between them.
Coupled Monte Carlo neutronics and thermal hydraulics for power reactors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernnat, W.; Buck, M.; Mattes, M.

The availability of high performance computing resources enables more and more the use of detailed Monte Carlo models even for full core power reactors. The detailed structure of the core can be described by lattices, modeled by so-called repeated structures e.g. in Monte Carlo codes such as MCNP5 or MCNPX. For cores with mainly uniform material compositions, fuel and moderator temperatures, there is no problem in constructing core models. However, when the material composition and the temperatures vary strongly a huge number of different material cells must be described which complicate the input and in many cases exceed code ormore » memory limits. The second problem arises with the preparation of corresponding temperature dependent cross sections and thermal scattering laws. Only if these problems can be solved, a realistic coupling of Monte Carlo neutronics with an appropriate thermal-hydraulics model is possible. In this paper a method for the treatment of detailed material and temperature distributions in MCNP5 is described based on user-specified internal functions which assign distinct elements of the core cells to material specifications (e.g. water density) and temperatures from a thermal-hydraulics code. The core grid itself can be described with a uniform material specification. The temperature dependency of cross sections and thermal neutron scattering laws is taken into account by interpolation, requiring only a limited number of data sets generated for different temperatures. Applications will be shown for the stationary part of the Purdue PWR benchmark using ATHLET for thermal- hydraulics and for a generic Modular High Temperature reactor using THERMIX for thermal- hydraulics. (authors)« less
Aeroelasticity Benchmark Assessment: Subsonic Fixed Wing Program

NASA Technical Reports Server (NTRS)

Florance, Jennifer P.; Chwalowski, Pawel; Wieseman, Carol D.

2010-01-01

The fundamental technical challenge in computational aeroelasticity is the accurate prediction of unsteady aerodynamic phenomena and the effect on the aeroelastic response of a vehicle. Currently, a benchmarking standard for use in validating the accuracy of computational aeroelasticity codes does not exist. Many aeroelastic data sets have been obtained in wind-tunnel and flight testing throughout the world; however, none have been globally presented or accepted as an ideal data set. There are numerous reasons for this. One reason is that often, such aeroelastic data sets focus on the aeroelastic phenomena alone (flutter, for example) and do not contain associated information such as unsteady pressures and time-correlated structural dynamic deflections. Other available data sets focus solely on the unsteady pressures and do not address the aeroelastic phenomena. Other discrepancies can include omission of relevant data, such as flutter frequency and / or the acquisition of only qualitative deflection data. In addition to these content deficiencies, all of the available data sets present both experimental and computational technical challenges. Experimental issues include facility influences, nonlinearities beyond those being modeled, and data processing. From the computational perspective, technical challenges include modeling geometric complexities, coupling between the flow and the structure, grid issues, and boundary conditions. The Aeroelasticity Benchmark Assessment task seeks to examine the existing potential experimental data sets and ultimately choose the one that is viewed as the most suitable for computational benchmarking. An initial computational evaluation of that configuration will then be performed using the Langley-developed computational fluid dynamics (CFD) software FUN3D1 as part of its code validation process. In addition to the benchmarking activity, this task also includes an examination of future research directions. Researchers within the Aeroelasticity Branch will examine other experimental efforts within the Subsonic Fixed Wing (SFW) program (such as testing of the NASA Common Research Model (CRM)) and other NASA programs and assess aeroelasticity issues and research topics.
Knowledge and Practices of Faculty at NASM Accredited Institutions in the Southeast Region Regarding Standards-Based Instruction

ERIC Educational Resources Information Center

Nelson, Jonathan Leon

2017-01-01

In 1993, Congress passed the mandate "Goals 2000: Educate America Act," which established standards for K-12 education that outlined the core benchmarks of student achievement for individuals who have mastered the core curricula required to earn a high school diploma (Mark, 1995). Unfortunately, these curricular requirements did not…
Benchmarking and Accreditation Goals Support the Value of an Undergraduate Business Law Core Course

ERIC Educational Resources Information Center

O'Brien, Christine Neylon; Powers, Richard E.; Wesner, Thomas L.

2018-01-01

This article provides information about the value of a core course in business law and why it remains essential to business education. It goes on to identify highly ranked undergraduate business programs that require one or more business law courses. Using "Business Week" and "US News and World Report" to identify top…
Research-Based Writing Practices and the Common Core: Meta-Analysis and Meta-Synthesis

ERIC Educational Resources Information Center

Graham, Steve; Harris, Karen R.; Santangelo, Tanya

2015-01-01

In order to meet writing objectives specified in the Common Core State Standards (CCSS), many teachers need to make significant changes in how writing is taught. While CCSS identified what students need to master, it did not provide guidance on how teachers are to meet these writing benchmarks. The current article presents research-supported…
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

NASA Astrophysics Data System (ADS)

Griffiths, M. K.; Fedun, V.; Erdélyi, R.

2015-03-01

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Deterministically estimated fission source distributions for Monte Carlo k-eigenvalue problems

DOE PAGES

Biondo, Elliott D.; Davidson, Gregory G.; Pandya, Tara M.; ...

2018-04-30

The standard Monte Carlo (MC) k-eigenvalue algorithm involves iteratively converging the fission source distribution using a series of potentially time-consuming inactive cycles before quantities of interest can be tallied. One strategy for reducing the computational time requirements of these inactive cycles is the Sourcerer method, in which a deterministic eigenvalue calculation is performed to obtain an improved initial guess for the fission source distribution. This method has been implemented in the Exnihilo software suite within SCALE using the SPNSPN or SNSN solvers in Denovo and the Shift MC code. The efficacy of this method is assessed with different Denovo solutionmore » parameters for a series of typical k-eigenvalue problems including small criticality benchmarks, full-core reactors, and a fuel cask. Here it is found that, in most cases, when a large number of histories per cycle are required to obtain a detailed flux distribution, the Sourcerer method can be used to reduce the computational time requirements of the inactive cycles.« less
Parallel replica dynamics method for bistable stochastic reaction networks: Simulation and sensitivity analysis

NASA Astrophysics Data System (ADS)

Wang, Ting; Plecháč, Petr

2017-12-01

Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.
Deterministically estimated fission source distributions for Monte Carlo k-eigenvalue problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Biondo, Elliott D.; Davidson, Gregory G.; Pandya, Tara M.

The standard Monte Carlo (MC) k-eigenvalue algorithm involves iteratively converging the fission source distribution using a series of potentially time-consuming inactive cycles before quantities of interest can be tallied. One strategy for reducing the computational time requirements of these inactive cycles is the Sourcerer method, in which a deterministic eigenvalue calculation is performed to obtain an improved initial guess for the fission source distribution. This method has been implemented in the Exnihilo software suite within SCALE using the SPNSPN or SNSN solvers in Denovo and the Shift MC code. The efficacy of this method is assessed with different Denovo solutionmore » parameters for a series of typical k-eigenvalue problems including small criticality benchmarks, full-core reactors, and a fuel cask. Here it is found that, in most cases, when a large number of histories per cycle are required to obtain a detailed flux distribution, the Sourcerer method can be used to reduce the computational time requirements of the inactive cycles.« less
Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing.

PubMed

Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C; Gao, Wen

2018-05-01

The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of a CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of graphics processing unit (GPU). We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation as well as the memory access mechanism are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU for resolving the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which enables to leverage the advantages of GPU platforms harmoniously, and yield significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rouxelin, Pascal Nicolas; Strydom, Gerhard

Best-estimate plus uncertainty analysis of reactors is replacing the traditional conservative (stacked uncertainty) method for safety and licensing analysis. To facilitate uncertainty analysis applications, a comprehensive approach and methodology must be developed and applied. High temperature gas cooled reactors (HTGRs) have several features that require techniques not used in light-water reactor analysis (e.g., coated-particle design and large graphite quantities at high temperatures). The International Atomic Energy Agency has therefore launched the Coordinated Research Project on HTGR Uncertainty Analysis in Modeling to study uncertainty propagation in the HTGR analysis chain. The benchmark problem defined for the prismatic design is represented bymore » the General Atomics Modular HTGR 350. The main focus of this report is the compilation and discussion of the results obtained for various permutations of Exercise I 2c and the use of the cross section data in Exercise II 1a of the prismatic benchmark, which is defined as the last and first steps of the lattice and core simulation phases, respectively. The report summarizes the Idaho National Laboratory (INL) best estimate results obtained for Exercise I 2a (fresh single-fuel block), Exercise I 2b (depleted single-fuel block), and Exercise I 2c (super cell) in addition to the first results of an investigation into the cross section generation effects for the super-cell problem. The two dimensional deterministic code known as the New ESC based Weighting Transport (NEWT) included in the Standardized Computer Analyses for Licensing Evaluation (SCALE) 6.1.2 package was used for the cross section evaluation, and the results obtained were compared to the three dimensional stochastic SCALE module KENO VI. The NEWT cross section libraries were generated for several permutations of the current benchmark super-cell geometry and were then provided as input to the Phase II core calculation of the stand alone neutronics Exercise II 1a. The steady state core calculations were simulated with the INL coupled-code system known as the Parallel and Highly Innovative Simulation for INL Code System (PHISICS) and the system thermal-hydraulics code known as the Reactor Excursion and Leak Analysis Program (RELAP) 5 3D using the nuclear data libraries previously generated with NEWT. It was observed that significant differences in terms of multiplication factor and neutron flux exist between the various permutations of the Phase I super-cell lattice calculations. The use of these cross section libraries only leads to minor changes in the Phase II core simulation results for fresh fuel but shows significantly larger discrepancies for spent fuel cores. Furthermore, large incongruities were found between the SCALE NEWT and KENO VI results for the super cells, and while some trends could be identified, a final conclusion on this issue could not yet be reached. This report will be revised in mid 2016 with more detailed analyses of the super-cell problems and their effects on the core models, using the latest version of SCALE (6.2). The super-cell models seem to show substantial improvements in terms of neutron flux as compared to single-block models, particularly at thermal energies.« less
The X40×10 Halogen Bonding Benchmark Revisited: Surprising Importance of (n-1)d Subvalence Correlation.

PubMed

Kesharwani, Manoj K; Manna, Debashree; Sylvetsky, Nitai; Martin, Jan M L

2018-03-01

We have re-evaluated the X40×10 benchmark for halogen bonding using conventional and explicitly correlated coupled cluster methods. For the aromatic dimers at small separation, improved CCSD(T)-MP2 "high-level corrections" (HLCs) cause substantial reductions in the dissociation energy. For the bromine and iodine species, (n-1)d subvalence correlation increases dissociation energies and turns out to be more important for noncovalent interactions than is generally realized; (n-1)sp subvalence correlation is much less important. The (n-1)d subvalence term is dominated by core-valence correlation; with the smaller cc-pVDZ-F12-PP and cc-pVTZ-F12-PP basis sets, basis set convergence for the core-core contribution becomes sufficiently erratic that it may compromise results overall. The two factors conspire to generate discrepancies of up to 0.9 kcal/mol (0.16 kcal/mol RMS) between the original X40×10 data and the present revision.
Large-scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU).

PubMed

Shi, Yulin; Veidenbaum, Alexander V; Nicolau, Alex; Xu, Xiangmin

2015-01-15

Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post hoc processing and analysis. Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22× speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. Copyright © 2014 Elsevier B.V. All rights reserved.
Large scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU)

PubMed Central

Shi, Yulin; Veidenbaum, Alexander V.; Nicolau, Alex; Xu, Xiangmin

2014-01-01

Background Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post-hoc processing and analysis. New Method Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. Results We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22x speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. Comparison with Existing Method(s) To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Conclusions Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. PMID:25277633
BACT Simulation User Guide (Version 7.0)

NASA Technical Reports Server (NTRS)

Waszak, Martin R.

1997-01-01

This report documents the structure and operation of a simulation model of the Benchmark Active Control Technology (BACT) Wind-Tunnel Model. The BACT system was designed, built, and tested at NASA Langley Research Center as part of the Benchmark Models Program and was developed to perform wind-tunnel experiments to obtain benchmark quality data to validate computational fluid dynamics and computational aeroelasticity codes, to verify the accuracy of current aeroservoelasticity design and analysis tools, and to provide an active controls testbed for evaluating new and innovative control algorithms for flutter suppression and gust load alleviation. The BACT system has been especially valuable as a control system testbed.
HTR-PROTEUS Pebble Bed Experimental Program Cores 1, 1A, 2, and 3: Hexagonal Close Packing with a 1:2 Moderator-to-Fuel Pebble Ratio

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; Barbara H. Dolphin; James W. Sterbentz

2013-03-01

In its deployment as a pebble bed reactor (PBR) critical facility from 1992 to 1996, the PROTEUS facility was designated as HTR-PROTEUS. This experimental program was performed as part of an International Atomic Energy Agency (IAEA) Coordinated Research Project (CRP) on the Validation of Safety Related Physics Calculations for Low Enriched HTGRs. Within this project, critical experiments were conducted for graphite moderated LEU systems to determine core reactivity, flux and power profiles, reaction-rate ratios, the worth of control rods, both in-core and reflector based, the worth of burnable poisons, kinetic parameters, and the effects of moisture ingress on these parameters.more » Four benchmark experiments were evaluated in this report: Cores 1, 1A, 2, and 3. These core configurations represent the hexagonal close packing (HCP) configurations of the HTR-PROTEUS experiment with a moderator-to-fuel pebble ratio of 1:2. Core 1 represents the only configuration utilizing ZEBRA control rods. Cores 1A, 2, and 3 use withdrawable, hollow, stainless steel control rods. Cores 1 and 1A are similar except for the use of different control rods; Core 1A also has one less layer of pebbles (21 layers instead of 22). Core 2 retains the first 16 layers of pebbles from Cores 1 and 1A and has 16 layers of moderator pebbles stacked above the fueled layers. Core 3 retains the first 17 layers of pebbles but has polyethylene rods inserted between pebbles to simulate water ingress. The additional partial pebble layer (layer 18) for Core 3 was not included as it was used for core operations and not the reported critical configuration. Cores 1, 1A, 2, and 3 were determined to be acceptable benchmark experiments.« less
HTR-PROTEUS Pebble Bed Experimental Program Cores 1, 1A, 2, and 3: Hexagonal Close Packing with a 1:2 Moderator-to-Fuel Pebble Ratio

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; Barbara H. Dolphin; James W. Sterbentz

2012-03-01

In its deployment as a pebble bed reactor (PBR) critical facility from 1992 to 1996, the PROTEUS facility was designated as HTR-PROTEUS. This experimental program was performed as part of an International Atomic Energy Agency (IAEA) Coordinated Research Project (CRP) on the Validation of Safety Related Physics Calculations for Low Enriched HTGRs. Within this project, critical experiments were conducted for graphite moderated LEU systems to determine core reactivity, flux and power profiles, reaction-rate ratios, the worth of control rods, both in-core and reflector based, the worth of burnable poisons, kinetic parameters, and the effects of moisture ingress on these parameters.more » Four benchmark experiments were evaluated in this report: Cores 1, 1A, 2, and 3. These core configurations represent the hexagonal close packing (HCP) configurations of the HTR-PROTEUS experiment with a moderator-to-fuel pebble ratio of 1:2. Core 1 represents the only configuration utilizing ZEBRA control rods. Cores 1A, 2, and 3 use withdrawable, hollow, stainless steel control rods. Cores 1 and 1A are similar except for the use of different control rods; Core 1A also has one less layer of pebbles (21 layers instead of 22). Core 2 retains the first 16 layers of pebbles from Cores 1 and 1A and has 16 layers of moderator pebbles stacked above the fueled layers. Core 3 retains the first 17 layers of pebbles but has polyethylene rods inserted between pebbles to simulate water ingress. The additional partial pebble layer (layer 18) for Core 3 was not included as it was used for core operations and not the reported critical configuration. Cores 1, 1A, 2, and 3 were determined to be acceptable benchmark experiments.« less
Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

PubMed Central

Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

2018-01-01

In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888

BENCHMARK DOSE TECHNICAL GUIDANCE DOCUMENT ...

EPA Pesticide Factsheets

The purpose of this document is to provide guidance for the Agency on the application of the benchmark dose approach in determining the point of departure (POD) for health effects data, whether a linear or nonlinear low dose extrapolation is used. The guidance includes discussion on computation of benchmark doses and benchmark concentrations (BMDs and BMCs) and their lower confidence limits, data requirements, dose-response analysis, and reporting requirements. This guidance is based on today's knowledge and understanding, and on experience gained in using this approach.
Scaling a Convection-Resolving RCM to Near-Global Scales

NASA Astrophysics Data System (ADS)

Leutwyler, D.; Fuhrer, O.; Chadha, T.; Kwasniewski, G.; Hoefler, T.; Lapillonne, X.; Lüthi, D.; Osuna, C.; Schar, C.; Schulthess, T. C.; Vogt, H.

2017-12-01

In the recent years, first decade-long kilometer-scale resolution RCM simulations have been performed on continental-scale computational domains. However, the size of the planet Earth is still an order of magnitude larger and thus the computational implications of performing global climate simulations at this resolution are challenging. We explore the gap between the currently established RCM simulations and global simulations by scaling the GPU accelerated version of the COSMO model to a near-global computational domain. To this end, the evolution of an idealized moist baroclinic wave has been simulated over the course of 10 days with a grid spacing of up to 930 m. The computational mesh employs 36'000 x 16'001 x 60 grid points and covers 98.4% of the planet's surface. The code shows perfect weak scaling up to 4'888 Nodes of the Piz Daint supercomputer and yields 0.043 simulated years per day (SYPD) which is approximately one seventh of the 0.2-0.3 SYPD required to conduct AMIP-type simulations. However, at half the resolution (1.9 km) we've observed 0.23 SYPD. Besides formation of frontal precipitating systems containing embedded explicitly-resolved convective motions, the simulations reveal a secondary instability that leads to cut-off warm-core cyclonic vortices in the cyclone's core, once the grid spacing is refined to the kilometer scale. The explicit representation of embedded moist convection and the representation of the previously unresolved instabilities exhibit a physically different behavior in comparison to coarser-resolution simulations. The study demonstrates that global climate simulations using kilometer-scale resolution are imminent and serves as a baseline benchmark for global climate model applications and future exascale supercomputing systems.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

PubMed

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-04-01

Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Real-time computation of parameter fitting and image reconstruction using graphical processing units

NASA Astrophysics Data System (ADS)

Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

2017-06-01

In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.
Physics-based multiscale coupling for full core nuclear reactor simulation

DOE PAGES

Gaston, Derek R.; Permann, Cody J.; Peterson, John W.; ...

2015-10-01

Numerical simulation of nuclear reactors is a key technology in the quest for improvements in efficiency, safety, and reliability of both existing and future reactor designs. Historically, simulation of an entire reactor was accomplished by linking together multiple existing codes that each simulated a subset of the relevant multiphysics phenomena. Recent advances in the MOOSE (Multiphysics Object Oriented Simulation Environment) framework have enabled a new approach: multiple domain-specific applications, all built on the same software framework, are efficiently linked to create a cohesive application. This is accomplished with a flexible coupling capability that allows for a variety of different datamore » exchanges to occur simultaneously on high performance parallel computational hardware. Examples based on the KAIST-3A benchmark core, as well as a simplified Westinghouse AP-1000 configuration, demonstrate the power of this new framework for tackling—in a coupled, multiscale manner—crucial reactor phenomena such as CRUD-induced power shift and fuel shuffle. 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-SA license« less
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
A study of workstation computational performance for real-time flight simulation

NASA Technical Reports Server (NTRS)

Maddalon, Jeffrey M.; Cleveland, Jeff I., II

1995-01-01

With recent advances in microprocessor technology, some have suggested that modern workstations provide enough computational power to properly operate a real-time simulation. This paper presents the results of a computational benchmark, based on actual real-time flight simulation code used at Langley Research Center, which was executed on various workstation-class machines. The benchmark was executed on different machines from several companies including: CONVEX Computer Corporation, Cray Research, Digital Equipment Corporation, Hewlett-Packard, Intel, International Business Machines, Silicon Graphics, and Sun Microsystems. The machines are compared by their execution speed, computational accuracy, and porting effort. The results of this study show that the raw computational power needed for real-time simulation is now offered by workstations.
Taming Wild Horses: The Need for Virtual Time-based Scheduling of VMs in Network Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S; Henz, Brian J

2012-01-01

The next generation of scalable network simulators employ virtual machines (VMs) to act as high-fidelity models of traffic producer/consumer nodes in simulated networks. However, network simulations could be inaccurate if VMs are not scheduled according to virtual time, especially when many VMs are hosted per simulator core in a multi-core simulator environment. Since VMs are by default free-running, on the outset, it is not clear if, and to what extent, their untamed execution affects the results in simulated scenarios. Here, we provide the first quantitative basis for establishing the need for generalized virtual time scheduling of VMs in network simulators,more » based on an actual prototyped implementations. To exercise breadth, our system is tested with multiple disparate applications: (a) a set of message passing parallel programs, (b) a computer worm propagation phenomenon, and (c) a mobile ad-hoc wireless network simulation. We define and use error metrics and benchmarks in scaled tests to empirically report the poor match of traditional, fairness-based VM scheduling to VM-based network simulation, and also clearly show the better performance of our simulation-specific scheduler, with up to 64 VMs hosted on a 12-core simulator node.« less
Three-dimensional pin-to-pin analyses of VVER-440 cores by the MOBY-DICK code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lehmann, M.; Mikolas, P.

1994-12-31

Nuclear design for the Dukovany (EDU) VVER-440s nuclear power plant is routinely performed by the MOBY-DICK system. After its implementation on Hewlett Packard series 700 workstations, it is able to perform routinely three-dimensional pin-to-pin core analyses. For purposes of code validation, the benchmark prepared from EDU operational data was solved.
Let History Not Repeat Itself: Overcoming Obstacles to the Common Core's Success. ES Select

ERIC Educational Resources Information Center

Chubb, John

2012-01-01

The Common Core State Standards project is the latest in a series of efforts to improve the academic success of American students. Forty-five states and the District of Columbia have endorsed new academic benchmarks that substantially raise the bar for achievement in English and mathematics. Aiming at a deeper form of learning, the initiative is a…
High-level ab initio potential energy surface and dynamics of the F- + CH3I SN2 and proton-transfer reactions.

PubMed

Olasz, Balázs; Szabó, István; Czakó, Gábor

2017-04-01

Bimolecular nucleophilic substitution (S N 2) and proton transfer are fundamental processes in chemistry and F - + CH 3 I is an important prototype of these reactions. Here we develop the first full-dimensional ab initio analytical potential energy surface (PES) for the F - + CH 3 I system using a permutationally invariant fit of high-level composite energies obtained with the combination of the explicitly-correlated CCSD(T)-F12b method, the aug-cc-pVTZ basis, core electron correlation effects, and a relativistic effective core potential for iodine. The PES accurately describes the S N 2 channel producing I - + CH 3 F via Walden-inversion, front-side attack, and double-inversion pathways as well as the proton-transfer channel leading to HF + CH 2 I - . The relative energies of the stationary points on the PES agree well with the new explicitly-correlated all-electron CCSD(T)-F12b/QZ-quality benchmark values. Quasiclassical trajectory computations on the PES show that the proton transfer becomes significant at high collision energies and double-inversion as well as front-side attack trajectories can occur. The computed broad angular distributions and hot internal energy distributions indicate the dominance of indirect mechanisms at lower collision energies, which is confirmed by analyzing the integration time and leaving group velocity distributions. Comparison with available crossed-beam experiments shows usually good agreement.
Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.

PubMed

Nair, Pradeep S; John, Eugene B

2007-01-01

Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.
Benchmarking: contexts and details matter.

PubMed

Zheng, Siyuan

2017-07-05

Benchmarking is an essential step in the development of computational tools. We take this opportunity to pitch in our opinions on tool benchmarking, in light of two correspondence articles published in Genome Biology.Please see related Li et al. and Newman et al. correspondence articles: www.dx.doi.org/10.1186/s13059-017-1256-5 and www.dx.doi.org/10.1186/s13059-017-1257-4.
Benchmarking of Neutron Production of Heavy-Ion Transport Codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Remec, Igor; Ronningen, Reginald M.; Heilbronn, Lawrence

Accurate prediction of radiation fields generated by heavy ion interactions is important in medical applications, space missions, and in design and operation of rare isotope research facilities. In recent years, several well-established computer codes in widespread use for particle and radiation transport calculations have been equipped with the capability to simulate heavy ion transport and interactions. To assess and validate these capabilities, we performed simulations of a series of benchmark-quality heavy ion experiments with the computer codes FLUKA, MARS15, MCNPX, and PHITS. We focus on the comparisons of secondary neutron production. Results are encouraging; however, further improvements in models andmore » codes and additional benchmarking are required.« less
Benchmarking of Heavy Ion Transport Codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Remec, Igor; Ronningen, Reginald M.; Heilbronn, Lawrence

Accurate prediction of radiation fields generated by heavy ion interactions is important in medical applications, space missions, and in designing and operation of rare isotope research facilities. In recent years, several well-established computer codes in widespread use for particle and radiation transport calculations have been equipped with the capability to simulate heavy ion transport and interactions. To assess and validate these capabilities, we performed simulations of a series of benchmark-quality heavy ion experiments with the computer codes FLUKA, MARS15, MCNPX, and PHITS. We focus on the comparisons of secondary neutron production. Results are encouraging; however, further improvements in models andmore » codes and additional benchmarking are required.« less
PARALLEL HOP: A SCALABLE HALO FINDER FOR MASSIVE COSMOLOGICAL DATA SETS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skory, Stephen; Turk, Matthew J.; Norman, Michael L.

2010-11-15

Modern N-body cosmological simulations contain billions (10{sup 9}) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly employed halo finders, suchmore » that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here, we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes message passing interface and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger data sets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit {sup yt}, an analysis toolkit for adaptive mesh refinement data that include complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and data sets in excess of 2000{sup 3} particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.« less
Reference Solutions for Benchmark Turbulent Flows in Three Dimensions

NASA Technical Reports Server (NTRS)

Diskin, Boris; Thomas, James L.; Pandya, Mohagna J.; Rumsey, Christopher L.

2016-01-01

A grid convergence study is performed to establish benchmark solutions for turbulent flows in three dimensions (3D) in support of turbulence-model verification campaign at the Turbulence Modeling Resource (TMR) website. The three benchmark cases are subsonic flows around a 3D bump and a hemisphere-cylinder configuration and a supersonic internal flow through a square duct. Reference solutions are computed for Reynolds Averaged Navier Stokes equations with the Spalart-Allmaras turbulence model using a linear eddy-viscosity model for the external flows and a nonlinear eddy-viscosity model based on a quadratic constitutive relation for the internal flow. The study involves three widely-used practical computational fluid dynamics codes developed and supported at NASA Langley Research Center: FUN3D, USM3D, and CFL3D. Reference steady-state solutions computed with these three codes on families of consistently refined grids are presented. Grid-to-grid and code-to-code variations are described in detail.
Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

NASA Astrophysics Data System (ADS)

Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.

2018-05-01

Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
High-level ab initio potential energy surface and dynamics of the F– + CH3I SN2 and proton-transfer reactions† †Electronic supplementary information (ESI) available: Benchmark classical and adiabatic relative energies (Table S1), vibrational frequencies of all the stationary points (Tables S2 and S3), direct/indirect trajectory separation function parameters (Table S4), entrance-channel potential (Fig. S1), structures of the minima and saddle points corresponding to the abstraction channel (Fig. S2), reaction probabilities (Fig. S3), trajectory integration time distributions (Fig. S4), trajectory integration time vs. I– velocity distributions (Fig. S5), and mechanism-specific reaction probabilities (Fig. S6). See DOI: 10.1039/c7sc00033b Click here for additional data file.

PubMed Central

Olasz, Balázs; Szabó, István

2017-01-01

Bimolecular nucleophilic substitution (SN2) and proton transfer are fundamental processes in chemistry and F– + CH3I is an important prototype of these reactions. Here we develop the first full-dimensional ab initio analytical potential energy surface (PES) for the F– + CH3I system using a permutationally invariant fit of high-level composite energies obtained with the combination of the explicitly-correlated CCSD(T)-F12b method, the aug-cc-pVTZ basis, core electron correlation effects, and a relativistic effective core potential for iodine. The PES accurately describes the SN2 channel producing I– + CH3F via Walden-inversion, front-side attack, and double-inversion pathways as well as the proton-transfer channel leading to HF + CH2I–. The relative energies of the stationary points on the PES agree well with the new explicitly-correlated all-electron CCSD(T)-F12b/QZ-quality benchmark values. Quasiclassical trajectory computations on the PES show that the proton transfer becomes significant at high collision energies and double-inversion as well as front-side attack trajectories can occur. The computed broad angular distributions and hot internal energy distributions indicate the dominance of indirect mechanisms at lower collision energies, which is confirmed by analyzing the integration time and leaving group velocity distributions. Comparison with available crossed-beam experiments shows usually good agreement. PMID:28507692
SeSBench - An initiative to benchmark reactive transport models for environmental subsurface processes

NASA Astrophysics Data System (ADS)

Jacques, Diederik

2017-04-01

As soil functions are governed by a multitude of interacting hydrological, geochemical and biological processes, simulation tools coupling mathematical models for interacting processes are needed. Coupled reactive transport models are a typical example of such coupled tools mainly focusing on hydrological and geochemical coupling (see e.g. Steefel et al., 2015). Mathematical and numerical complexity for both the tool itself or of the specific conceptual model can increase rapidly. Therefore, numerical verification of such type of models is a prerequisite for guaranteeing reliability and confidence and qualifying simulation tools and approaches for any further model application. In 2011, a first SeSBench -Subsurface Environmental Simulation Benchmarking- workshop was held in Berkeley (USA) followed by four other ones. The objective is to benchmark subsurface environmental simulation models and methods with a current focus on reactive transport processes. The final outcome was a special issue in Computational Geosciences (2015, issue 3 - Reactive transport benchmarks for subsurface environmental simulation) with a collection of 11 benchmarks. Benchmarks, proposed by the participants of the workshops, should be relevant for environmental or geo-engineering applications; the latter were mostly related to radioactive waste disposal issues - excluding benchmarks defined for pure mathematical reasons. Another important feature is the tiered approach within a benchmark with the definition of a single principle problem and different sub problems. The latter typically benchmarked individual or simplified processes (e.g. inert solute transport, simplified geochemical conceptual model) or geometries (e.g. batch or one-dimensional, homogeneous). Finally, three codes should be involved into a benchmark. The SeSBench initiative contributes to confidence building for applying reactive transport codes. Furthermore, it illustrates the use of those type of models for different environmental and geo-engineering applications. SeSBench will organize new workshops to add new benchmarks in a new special issue. Steefel, C. I., et al. (2015). "Reactive transport codes for subsurface environmental simulation." Computational Geosciences 19: 445-478.

Fourth Computational Aeroacoustics (CAA) Workshop on Benchmark Problems

NASA Technical Reports Server (NTRS)

Dahl, Milo D. (Editor)

2004-01-01

This publication contains the proceedings of the Fourth Computational Aeroacoustics (CAA) Workshop on Benchmark Problems. In this workshop, as in previous workshops, the problems were devised to gauge the technological advancement of computational techniques to calculate all aspects of sound generation and propagation in air directly from the fundamental governing equations. A variety of benchmark problems have been previously solved ranging from simple geometries with idealized acoustic conditions to test the accuracy and effectiveness of computational algorithms and numerical boundary conditions; to sound radiation from a duct; to gust interaction with a cascade of airfoils; to the sound generated by a separating, turbulent viscous flow. By solving these and similar problems, workshop participants have shown the technical progress from the basic challenges to accurate CAA calculations to the solution of CAA problems of increasing complexity and difficulty. The fourth CAA workshop emphasized the application of CAA methods to the solution of realistic problems. The workshop was held at the Ohio Aerospace Institute in Cleveland, Ohio, on October 20 to 22, 2003. At that time, workshop participants presented their solutions to problems in one or more of five categories. Their solutions are presented in this proceedings along with the comparisons of their solutions to the benchmark solutions or experimental data. The five categories for the benchmark problems were as follows: Category 1:Basic Methods. The numerical computation of sound is affected by, among other issues, the choice of grid used and by the boundary conditions. Category 2:Complex Geometry. The ability to compute the sound in the presence of complex geometric surfaces is important in practical applications of CAA. Category 3:Sound Generation by Interacting With a Gust. The practical application of CAA for computing noise generated by turbomachinery involves the modeling of the noise source mechanism as a vortical gust interacting with an airfoil. Category 4:Sound Transmission and Radiation. Category 5:Sound Generation in Viscous Problems. Sound is generated under certain conditions by a viscous flow as the flow passes an object or a cavity.
Validation of updated neutronic calculation models proposed for Atucha-II PHWR. Part I: Benchmark comparisons of WIMS-D5 and DRAGON cell and control rod parameters with MCNP5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mollerach, R.; Leszczynski, F.; Fink, J.

2006-07-01

In 2005 the Argentine Government took the decision to complete the construction of the Atucha-II nuclear power plant, which has been progressing slowly during the last ten years. Atucha-II is a 745 MWe nuclear station moderated and cooled with heavy water, of German (Siemens) design located in Argentina. It has a pressure-vessel design with 451 vertical coolant channels, and the fuel assemblies (FA) are clusters of 37 natural UO{sub 2} rods with an active length of 530 cm. For the reactor physics area, a revision and update calculation methods and models (cell, supercell and reactor) was recently carried out coveringmore » cell, supercell (control rod) and core calculations. As a validation of the new models some benchmark comparisons were done with Monte Carlo calculations with MCNP5. This paper presents comparisons of cell and supercell benchmark problems based on a slightly idealized model of the Atucha-I core obtained with the WIMS-D5 and DRAGON codes with MCNP5 results. The Atucha-I core was selected because it is smaller, similar from a neutronic point of view, and more symmetric than Atucha-II Cell parameters compared include cell k-infinity, relative power levels of the different rings of fuel rods, and some two-group macroscopic cross sections. Supercell comparisons include supercell k-infinity changes due to the control rods (tubes) of steel and hafnium. (authors)« less
It's Not Education by Zip Code Anymore--But What is It? Conceptions of Equity under the Common Core

ERIC Educational Resources Information Center

Kornhaber, Mindy L.; Griffith, Kelly; Tyler, Alison

2014-01-01

The Common Core State Standards Initiative is a standards-based reform in which 45 U.S. states and the District of Columbia have agreed to participate. The reform seeks to anchor primary and secondary education across these states in one set of demanding, internationally benchmarked standards. Thereby, all students will be prepared for further…
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

NASA Astrophysics Data System (ADS)

Amooie, M. A.; Moortgat, J.

2017-12-01

We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Implementing Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle Particle-Mesh

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J

The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
Performance and Scalability of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
Benchmarking hypercube hardware and software

NASA Technical Reports Server (NTRS)

Grunwald, Dirk C.; Reed, Daniel A.

1986-01-01

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
Evaluation of the ACEC Benchmark Suite for Real-Time Applications

DTIC Science & Technology

1990-07-23

1.0 benchmark suite waSanalyzed with respect to its measuring of Ada real-time features such as tasking, memory management, input/output, scheduling...and delay statement, Chapter 13 features , pragmas, interrupt handling, subprogram overhead, numeric computations etc. For most of the features that...meant for programming real-time systems. The ACEC benchmarks have been analyzed extensively with respect to their measuring of Ada real-time features
INL Results for Phases I and III of the OECD/NEA MHTGR-350 Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhard Strydom; Javier Ortensi; Sonat Sen

2013-09-01

The Idaho National Laboratory (INL) Very High Temperature Reactor (VHTR) Technology Development Office (TDO) Methods Core Simulation group led the construction of the Organization for Economic Cooperation and Development (OECD) Modular High Temperature Reactor (MHTGR) 350 MW benchmark for comparing and evaluating prismatic VHTR analysis codes. The benchmark is sponsored by the OECD's Nuclear Energy Agency (NEA), and the project will yield a set of reference steady-state, transient, and lattice depletion problems that can be used by the Department of Energy (DOE), the Nuclear Regulatory Commission (NRC), and vendors to assess their code suits. The Methods group is responsible formore » defining the benchmark specifications, leading the data collection and comparison activities, and chairing the annual technical workshops. This report summarizes the latest INL results for Phase I (steady state) and Phase III (lattice depletion) of the benchmark. The INSTANT, Pronghorn and RattleSnake codes were used for the standalone core neutronics modeling of Exercise 1, and the results obtained from these codes are compared in Section 4. Exercise 2 of Phase I requires the standalone steady-state thermal fluids modeling of the MHTGR-350 design, and the results for the systems code RELAP5-3D are discussed in Section 5. The coupled neutronics and thermal fluids steady-state solution for Exercise 3 are reported in Section 6, utilizing the newly developed Parallel and Highly Innovative Simulation for INL Code System (PHISICS)/RELAP5-3D code suit. Finally, the lattice depletion models and results obtained for Phase III are compared in Section 7. The MHTGR-350 benchmark proved to be a challenging simulation set of problems to model accurately, and even with the simplifications introduced in the benchmark specification this activity is an important step in the code-to-code verification of modern prismatic VHTR codes. A final OECD/NEA comparison report will compare the Phase I and III results of all other international participants in 2014, while the remaining Phase II transient case results will be reported in 2015.« less
The PAC-MAN model: Benchmark case for linear acoustics in computational physics

NASA Astrophysics Data System (ADS)

Ziegelwanger, Harald; Reiter, Paul

2017-10-01

Benchmark cases in the field of computational physics, on the one hand, have to contain a certain complexity to test numerical edge cases and, on the other hand, require the existence of an analytical solution, because an analytical solution allows the exact quantification of the accuracy of a numerical simulation method. This dilemma causes a need for analytical sound field formulations of complex acoustic problems. A well known example for such a benchmark case for harmonic linear acoustics is the ;Cat's Eye model;, which describes the three-dimensional sound field radiated from a sphere with a missing octant analytically. In this paper, a benchmark case for two-dimensional (2D) harmonic linear acoustic problems, viz., the ;PAC-MAN model;, is proposed. The PAC-MAN model describes the radiated and scattered sound field around an infinitely long cylinder with a cut out sector of variable angular width. While the analytical calculation of the 2D sound field allows different angular cut-out widths and arbitrarily positioned line sources, the computational cost associated with the solution of this problem is similar to a 1D problem because of a modal formulation of the sound field in the PAC-MAN model.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yilk, Todd

The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

DOE PAGES

Yilk, Todd

2018-02-17

The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Trailing Vortex Measurements in the Wake of a Hovering Rotor Blade with Various Tip Shapes

NASA Technical Reports Server (NTRS)

Martin, Preston B.; Leishman, J. Gordon

2003-01-01

This work examined the wake aerodynamics of a single helicopter rotor blade with several tip shapes operating on a hover test stand. Velocity field measurements were conducted using three-component laser Doppler velocimetry (LDV). The objective of these measurements was to document the vortex velocity profiles and then extract the core properties, such as the core radius, peak swirl velocity, and axial velocity. The measured test cases covered a wide range of wake-ages and several tip shapes, including rectangular, tapered, swept, and a subwing tip. One of the primary differences shown by the change in tip shape was the wake geometry. The effect of blade taper reduced the initial peak swirl velocity by a significant fraction. It appears that this is accomplished by decreasing the vortex strength for a given blade loading. The subwing measurements showed that the interaction and merging of the subwing and primary vortices created a less coherent vortical structure. A source of vortex core instability is shown to be the ratio of the peak swirl velocity to the axial velocity deficit. The results show that if there is a turbulence producing region of the vortex structure, it will be outside of the core boundary. The LDV measurements were supported by laser light-sheet flow visualization. The results provide several benchmark test cases for future validation of theoretical vortex models, numerical free-wake models, and computational fluid dynamics results.
TomoPhantom, a software package to generate 2D-4D analytical phantoms for CT image reconstruction algorithm benchmarks

NASA Astrophysics Data System (ADS)

Kazantsev, Daniil; Pickalov, Valery; Nagella, Srikanth; Pasca, Edoardo; Withers, Philip J.

2018-01-01

In the field of computerized tomographic imaging, many novel reconstruction techniques are routinely tested using simplistic numerical phantoms, e.g. the well-known Shepp-Logan phantom. These phantoms cannot sufficiently cover the broad spectrum of applications in CT imaging where, for instance, smooth or piecewise-smooth 3D objects are common. TomoPhantom provides quick access to an external library of modular analytical 2D/3D phantoms with temporal extensions. In TomoPhantom, quite complex phantoms can be built using additive combinations of geometrical objects, such as, Gaussians, parabolas, cones, ellipses, rectangles and volumetric extensions of them. Newly designed phantoms are better suited for benchmarking and testing of different image processing techniques. Specifically, tomographic reconstruction algorithms which employ 2D and 3D scanning geometries, can be rigorously analyzed using the software. TomoPhantom also provides a capability of obtaining analytical tomographic projections which further extends the applicability of software towards more realistic, free from the "inverse crime" testing. All core modules of the package are written in the C-OpenMP language and wrappers for Python and MATLAB are provided to enable easy access. Due to C-based multi-threaded implementation, volumetric phantoms of high spatial resolution can be obtained with computational efficiency.
Scalable Metropolis Monte Carlo for simulation of hard shapes

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Eric Irrgang, M.; Glotzer, Sharon C.

2016-07-01

We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel-Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.
Benchmark calculations with correlated molecular wavefunctions. XIII. Potential energy curves for He2, Ne2 and Ar2 using correlation consistent basis sets through augmented sextuple zeta

NASA Astrophysics Data System (ADS)

van Mourik, Tanja

1999-02-01

The potential energy curves of the rare gas dimers He2, Ne2, and Ar2 have been computed using correlation consistent basis sets ranging from singly augmented aug-cc-pVDZ sets through triply augmented t-aug-cc-pV6Z sets, with the augmented sextuple basis sets being reported herein. Several methods for including electron correlation were investigated, namely Moller-Plesset perturbation theory (MP2, MP3 and MP4) and coupled cluster theory [CCSD and CCSD(T)]. For He2CCSD(T)/d-aug-cc-pV6Z calculations yield a well depth of 7.35cm-1 (10.58K), with an estimated complete basis set (CBS) limit of 7.40cm-1 (10.65K). The latter is smaller than the 'exact' well depth (Aziz, R. A., Janzen, A. R., and Moldover, M. R., 1995, Phys. Rev. Lett., 74, 1586) by about 0.2cm-1 (0.35K). The Ne well depth, computed with the CCSD(T)/d-aug-cc-pV6Z method, is 28.31cm-1 and the estimated CBS limit is 28.4cm-1, approximately 1cm-1 smaller than the empirical potential of Aziz, R. A., and Slaman, M., J., 1989, Chem. Phys., 130, 187. Inclusion of core and core-valence correlation effects has a negligible effect on the Ne well depth, decreasing it by only 0.04cm-1. For Ar2, CCSD(T)/ d-aug-cc-pV6Z calculations yield a well depth of 96.2cm-1. The corresponding HFDID potential of Aziz, R. A., 1993, J. chem. Phys., 99, 4518 predicts of D of 99.7cm-1. Inclusion of core and core-valence effects in Ar increases the well depth and decreases the discrepancy by approximately 1cm-1.
Benchmarking of neutron production of heavy-ion transport codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Remec, I.; Ronningen, R. M.; Heilbronn, L.

Document available in abstract form only, full text of document follows: Accurate prediction of radiation fields generated by heavy ion interactions is important in medical applications, space missions, and in design and operation of rare isotope research facilities. In recent years, several well-established computer codes in widespread use for particle and radiation transport calculations have been equipped with the capability to simulate heavy ion transport and interactions. To assess and validate these capabilities, we performed simulations of a series of benchmark-quality heavy ion experiments with the computer codes FLUKA, MARS15, MCNPX, and PHITS. We focus on the comparisons of secondarymore » neutron production. Results are encouraging; however, further improvements in models and codes and additional benchmarking are required. (authors)« less
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
DE-NE0008277_PROTEUS final technical report 2018

DOE Office of Scientific and Technical Information (OSTI.GOV)

Enqvist, Andreas

This project details re-evaluations of experiments of gas-cooled fast reactor (GCFR) core designs performed in the 1970s at the PROTEUS reactor and create a series of International Reactor Physics Experiment Evaluation Project (IRPhEP) benchmarks. Currently there are no gas-cooled fast reactor (GCFR) experiments available in the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook). These experiments are excellent candidates for reanalysis and development of multiple benchmarks because these experiments provide high-quality integral nuclear data relevant to the validation and refinement of thorium, neptunium, uranium, plutonium, iron, and graphite cross sections. It would be cost prohibitive to reproduce suchmore » a comprehensive suite of experimental data to support any future GCFR endeavors.« less
Fast Eigensolver for Computing 3D Earth's Normal Modes

NASA Astrophysics Data System (ADS)

Shi, J.; De Hoop, M. V.; Li, R.; Xi, Y.; Saad, Y.

2017-12-01

We present a novel parallel computational approach to compute Earth's normal modes. We discretize Earth via an unstructured tetrahedral mesh and apply the continuous Galerkin finite element method to the elasto-gravitational system. To resolve the eigenvalue pollution issue, following the analysis separating the seismic point spectrum, we utilize explicitly a representation of the displacement for describing the oscillations of the non-seismic modes in the fluid outer core. Effectively, we separate out the essential spectrum which is naturally related to the Brunt-Väisälä frequency. We introduce two Lanczos approaches with polynomial and rational filtering for solving this generalized eigenvalue problem in prescribed intervals. The polynomial filtering technique only accesses the matrix pair through matrix-vector products and is an ideal candidate for solving three-dimensional large-scale eigenvalue problems. The matrix-free scheme allows us to deal with fluid separation and self-gravitation in an efficient way, while the standard shift-and-invert method typically needs an explicit shifted matrix and its factorization. The rational filtering method converges much faster than the standard shift-and-invert procedure when computing all the eigenvalues inside an interval. Both two Lanczos approaches solve for the internal eigenvalues extremely accurately, comparing with the standard eigensolver. In our computational experiments, we compare our results with the radial earth model benchmark, and visualize the normal modes using vector plots to illustrate the properties of the displacements in different modes.

Higher-order ice-sheet modelling accelerated by multigrid on graphics cards

NASA Astrophysics Data System (ADS)

Brædstrup, Christian; Egholm, David

2013-04-01

Higher-order ice flow modelling is a very computer intensive process owing primarily to the nonlinear influence of the horizontal stress coupling. When applied for simulating long-term glacial landscape evolution, the ice-sheet models must consider very long time series, while both high temporal and spatial resolution is needed to resolve small effects. The use of higher-order and full stokes models have therefore seen very limited usage in this field. However, recent advances in graphics card (GPU) technology for high performance computing have proven extremely efficient in accelerating many large-scale scientific computations. The general purpose GPU (GPGPU) technology is cheap, has a low power consumption and fits into a normal desktop computer. It could therefore provide a powerful tool for many glaciologists working on ice flow models. Our current research focuses on utilising the GPU as a tool in ice-sheet and glacier modelling. To this extent we have implemented the Integrated Second-Order Shallow Ice Approximation (iSOSIA) equations on the device using the finite difference method. To accelerate the computations, the GPU solver uses a non-linear Red-Black Gauss-Seidel iterator coupled with a Full Approximation Scheme (FAS) multigrid setup to further aid convergence. The GPU finite difference implementation provides the inherent parallelization that scales from hundreds to several thousands of cores on newer cards. We demonstrate the efficiency of the GPU multigrid solver using benchmark experiments.
I/O-Efficient Scientific Computation Using TPIE

NASA Technical Reports Server (NTRS)

Vengroff, Darren Erik; Vitter, Jeffrey Scott

1996-01-01

In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.
Evaluation of the Xeon phi processor as a technology for the acceleration of real-time control in high-order adaptive optics systems

NASA Astrophysics Data System (ADS)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah; Vick, Andy; Schnetler, Hermine

2014-08-01

We present wavefront reconstruction acceleration of high-order AO systems using an Intel Xeon Phi processor. The Xeon Phi is a coprocessor providing many integrated cores and designed for accelerating compute intensive, numerical codes. Unlike other accelerator technologies, it allows virtually unchanged C/C++ to be recompiled to run on the Xeon Phi, giving the potential of making development, upgrade and maintenance faster and less complex. We benchmark the Xeon Phi in the context of AO real-time control by running a matrix vector multiply (MVM) algorithm. We investigate variability in execution time and demonstrate a substantial speed-up in loop frequency. We examine the integration of a Xeon Phi into an existing RTC system and show that performance improvements can be achieved with limited development effort.
ELAPSE - NASA AMES LISP AND ADA BENCHMARK SUITE: EFFICIENCY OF LISP AND ADA PROCESSING - A SYSTEM EVALUATION

NASA Technical Reports Server (NTRS)

Davis, G. J.

1994-01-01

One area of research of the Information Sciences Division at NASA Ames Research Center is devoted to the analysis and enhancement of processors and advanced computer architectures, specifically in support of automation and robotic systems. To compare systems' abilities to efficiently process Lisp and Ada, scientists at Ames Research Center have developed a suite of non-parallel benchmarks called ELAPSE. The benchmark suite was designed to test a single computer's efficiency as well as alternate machine comparisons on Lisp, and/or Ada languages. ELAPSE tests the efficiency with which a machine can execute the various routines in each environment. The sample routines are based on numeric and symbolic manipulations and include two-dimensional fast Fourier transformations, Cholesky decomposition and substitution, Gaussian elimination, high-level data processing, and symbol-list references. Also included is a routine based on a Bayesian classification program sorting data into optimized groups. The ELAPSE benchmarks are available for any computer with a validated Ada compiler and/or Common Lisp system. Of the 18 routines that comprise ELAPSE, provided within this package are 14 developed or translated at Ames. The others are readily available through literature. The benchmark that requires the most memory is CHOLESKY.ADA. Under VAX/VMS, CHOLESKY.ADA requires 760K of main memory. ELAPSE is available on either two 5.25 inch 360K MS-DOS format diskettes (standard distribution) or a 9-track 1600 BPI ASCII CARD IMAGE format magnetic tape. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The ELAPSE benchmarks were written in 1990. VAX and VMS are trademarks of Digital Equipment Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
How to benchmark methods for structure-based virtual screening of large compound libraries.

PubMed

Christofferson, Andrew J; Huang, Niu

2012-01-01

Structure-based virtual screening is a useful computational technique for ligand discovery. To systematically evaluate different docking approaches, it is important to have a consistent benchmarking protocol that is both relevant and unbiased. Here, we describe the designing of a benchmarking data set for docking screen assessment, a standard docking screening process, and the analysis and presentation of the enrichment of annotated ligands among a background decoy database.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bailey, David H.

The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less
Theoretical Background and Prognostic Modeling for Benchmarking SHM Sensors for Composite Structures

DTIC Science & Technology

2010-10-01

minimum flaw size can be detected by the existing SHM based monitoring methods. Sandwich panels with foam , WebCore and honeycomb structures were...Whether it be hat stiffened, corrugated sandwich, honeycomb sandwich, or foam filled sandwich, all composite structures have one basic handicap in...based monitoring methods. Sandwich panels with foam , WebCore and honeycomb structures were considered for use in this study. Eigenmode frequency
Spherical harmonic results for the 3D Kobayashi Benchmark suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, P N; Chang, B; Hanebutte, U R

1999-03-02

Spherical harmonic solutions are presented for the Kobayashi benchmark suite. The results were obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Domino, Stefan P.; Ananthan, Shreyas; Knaus, Robert C.

The former Nalu interior heterogeneous algorithm design, which was originally designed to manage matrix assembly operations over all elemental topology types, has been modified to operate over homogeneous collections of mesh entities. This newly templated kernel design allows for removal of workset variable resize operations that were formerly required at each loop over a Sierra ToolKit (STK) bucket (nominally, 512 entities in size). Extensive usage of the Standard Template Library (STL) std::vector has been removed in favor of intrinsic Kokkos memory views. In this milestone effort, the transition to Kokkos as the underlying infrastructure to support performance and portability onmore » many-core architectures has been deployed for key matrix algorithmic kernels. A unit-test driven design effort has developed a homogeneous entity algorithm that employs a team-based thread parallelism construct. The STK Single Instruction Multiple Data (SIMD) infrastructure is used to interleave data for improved vectorization. The collective algorithm design, which allows for concurrent threading and SIMD management, has been deployed for the core low-Mach element- based algorithm. Several tests to ascertain SIMD performance on Intel KNL and Haswell architectures have been carried out. The performance test matrix includes evaluation of both low- and higher-order methods. The higher-order low-Mach methodology builds on polynomial promotion of the core low-order control volume nite element method (CVFEM). Performance testing of the Kokkos-view/SIMD design indicates low-order matrix assembly kernel speed-up ranging between two and four times depending on mesh loading and node count. Better speedups are observed for higher-order meshes (currently only P=2 has been tested) especially on KNL. The increased workload per element on higher-order meshes bene ts from the wide SIMD width on KNL machines. Combining multiple threads with SIMD on KNL achieves a 4.6x speedup over the baseline, with assembly timings faster than that observed on Haswell architecture. The computational workload of higher-order meshes, therefore, seems ideally suited for the many-core architecture and justi es further exploration of higher-order on NGP platforms. A Trilinos/Tpetra-based multi-threaded GMRES preconditioned by symmetric Gauss Seidel (SGS) represents the core solver infrastructure for the low-Mach advection/diffusion implicit solves. The threaded solver stack has been tested on small problems on NREL's Peregrine system using the newly developed and deployed Kokkos-view/SIMD kernels. fforts are underway to deploy the Tpetra-based solver stack on NERSC Cori system to benchmark its performance at scale on KNL machines.« less
Thermal Hydraulic Computational Fluid Dynamics Simulations and Experimental Investigation of Deformed Fuel Assemblies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mays, Brian; Jackson, R. Brian

2017-03-08

The project, Toward a Longer Life Core: Thermal Hydraulic CFD Simulations and Experimental Investigation of Deformed Fuel Assemblies, DOE Project code DE-NE0008321, was a verification and validation project for flow and heat transfer through wire wrapped simulated liquid metal fuel assemblies that included both experiments and computational fluid dynamics simulations of those experiments. This project was a two year collaboration between AREVA, TerraPower, Argonne National Laboratory and Texas A&M University. Experiments were performed by AREVA and Texas A&M University. Numerical simulations of these experiments were performed by TerraPower and Argonne National Lab. Project management was performed by AREVA Federal Services.more » The first of a kind project resulted in the production of both local point temperature measurements and local flow mixing experiment data paired with numerical simulation benchmarking of the experiments. The project experiments included the largest wire-wrapped pin assembly Mass Index of Refraction (MIR) experiment in the world, the first known wire-wrapped assembly experiment with deformed duct geometries and the largest numerical simulations ever produced for wire-wrapped bundles.« less
An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

DOE PAGES

Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; ...

2017-10-04

The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overallmore » memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.« less
GeNN: a code generation framework for accelerated brain simulations

NASA Astrophysics Data System (ADS)

Yavuz, Esin; Turner, James; Nowotny, Thomas

2016-01-01

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/.
GeNN: a code generation framework for accelerated brain simulations.

PubMed

Yavuz, Esin; Turner, James; Nowotny, Thomas

2016-01-07

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/.
GeNN: a code generation framework for accelerated brain simulations

PubMed Central

Yavuz, Esin; Turner, James; Nowotny, Thomas

2016-01-01

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/. PMID:26740369
GPUs, a New Tool of Acceleration in CFD: Efficiency and Reliability on Smoothed Particle Hydrodynamics Methods

PubMed Central

Crespo, Alejandro C.; Dominguez, Jose M.; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D.

2011-01-01

Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability. PMID:21695185
Benchmark tests of JENDL-3.2 for thermal and fast reactors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Takano, Hideki; Akie, Hiroshi; Kikuchi, Yasuyuki

1994-12-31

Benchmark calculations for a variety of thermal and fast reactors have been performed by using the newly evaluated JENDL-3 Version-2 (JENDL-3.2) file. In the thermal reactor calculations for the uranium and plutonium fueled cores of TRX and TCA, the k{sub eff} and lattice parameters were well predicted. The fast reactor calculations for ZPPR-9 and FCA assemblies showed that the k{sub eff} reactivity worths of Doppler, sodium void and control rod, and reaction rate distribution were in a very good agreement with the experiments.
Summary of the Tandem Cylinder Solutions from the Benchmark Problems for Airframe Noise Computations-I Workshop

NASA Technical Reports Server (NTRS)

Lockard, David P.

2011-01-01

Fifteen submissions in the tandem cylinders category of the First Workshop on Benchmark problems for Airframe Noise Computations are summarized. Although the geometry is relatively simple, the problem involves complex physics. Researchers employed various block-structured, overset, unstructured and embedded Cartesian grid techniques and considerable computational resources to simulate the flow. The solutions are compared against each other and experimental data from 2 facilities. Overall, the simulations captured the gross features of the flow, but resolving all the details which would be necessary to compute the noise remains challenging. In particular, how to best simulate the effects of the experimental transition strip, and the associated high Reynolds number effects, was unclear. Furthermore, capturing the spanwise variation proved difficult.
NAS Grid Benchmarks: A Tool for Grid Space Exploration

NASA Technical Reports Server (NTRS)

Frumkin, Michael; VanderWijngaart, Rob F.; Biegel, Bryan (Technical Monitor)

2001-01-01

We present an approach for benchmarking services provided by computational Grids. It is based on the NAS Parallel Benchmarks (NPB) and is called NAS Grid Benchmark (NGB) in this paper. We present NGB as a data flow graph encapsulating an instance of an NPB code in each graph node, which communicates with other nodes by sending/receiving initialization data. These nodes may be mapped to the same or different Grid machines. Like NPB, NGB will specify several different classes (problem sizes). NGB also specifies the generic Grid services sufficient for running the bench-mark. The implementor has the freedom to choose any specific Grid environment. However, we describe a reference implementation in Java, and present some scenarios for using NGB.
NAS Grid Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.

A benchmark study of the sea-level equation in GIA modelling

NASA Astrophysics Data System (ADS)

Martinec, Zdenek; Klemann, Volker; van der Wal, Wouter; Riva, Riccardo; Spada, Giorgio; Simon, Karen; Blank, Bas; Sun, Yu; Melini, Daniele; James, Tom; Bradley, Sarah

2017-04-01

The sea-level load in glacial isostatic adjustment (GIA) is described by the so called sea-level equation (SLE), which represents the mass redistribution between ice sheets and oceans on a deforming earth. Various levels of complexity of SLE have been proposed in the past, ranging from a simple mean global sea level (the so-called eustatic sea level) to the load with a deforming ocean bottom, migrating coastlines and a changing shape of the geoid. Several approaches to solve the SLE have been derived, from purely analytical formulations to fully numerical methods. Despite various teams independently investigating GIA, there has been no systematic intercomparison amongst the solvers through which the methods may be validated. The goal of this paper is to present a series of benchmark experiments designed for testing and comparing numerical implementations of the SLE. Our approach starts with simple load cases even though the benchmark will not result in GIA predictions for a realistic loading scenario. In the longer term we aim for a benchmark with a realistic loading scenario, and also for benchmark solutions with rotational feedback. The current benchmark uses an earth model for which Love numbers have been computed and benchmarked in Spada et al (2011). In spite of the significant differences in the numerical methods employed, the test computations performed so far show a satisfactory agreement between the results provided by the participants. The differences found can often be attributed to the different approximations inherent to the various algorithms. Literature G. Spada, V. R. Barletta, V. Klemann, R. E. M. Riva, Z. Martinec, P. Gasperini, B. Lund, D. Wolf, L. L. A. Vermeersen, and M. A. King, 2011. A benchmark study for glacial isostatic adjustment codes. Geophys. J. Int. 185: 106-132 doi:10.1111/j.1365-
Benchmarking comparison and validation of MCNP photon interaction data

NASA Astrophysics Data System (ADS)

Colling, Bethany; Kodeli, I.; Lilley, S.; Packer, L. W.

2017-09-01

The objective of the research was to test available photoatomic data libraries for fusion relevant applications, comparing against experimental and computational neutronics benchmarks. Photon flux and heating was compared using the photon interaction data libraries (mcplib 04p, 05t, 84p and 12p). Suitable benchmark experiments (iron and water) were selected from the SINBAD database and analysed to compare experimental values with MCNP calculations using mcplib 04p, 84p and 12p. In both the computational and experimental comparisons, the majority of results with the 04p, 84p and 12p photon data libraries were within 1σ of the mean MCNP statistical uncertainty. Larger differences were observed when comparing computational results with the 05t test photon library. The Doppler broadening sampling bug in MCNP-5 is shown to be corrected for fusion relevant problems through use of the 84p photon data library. The recommended libraries for fusion neutronics are 84p (or 04p) with MCNP6 and 84p if using MCNP-5.
Effectiveness of Social Marketing Interventions to Promote Physical Activity Among Adults: A Systematic Review.

PubMed

Xia, Yuan; Deshpande, Sameer; Bonates, Tiberius

2016-11-01

Social marketing managers promote desired behaviors to an audience by making them tangible in the form of environmental opportunities to enhance benefits and reduce barriers. This study proposed "benchmarks," modified from those found in the past literature, that would match important concepts of the social marketing framework and the inclusion of which would ensure behavior change effectiveness. In addition, we analyzed behavior change interventions on a "social marketing continuum" to assess whether the number of benchmarks and the role of specific benchmarks influence the effectiveness of physical activity promotion efforts. A systematic review of social marketing interventions available in academic studies published between 1997 and 2013 revealed 173 conditions in 92 interventions. Findings based on χ 2 , Mallows' Cp, and Logical Analysis of Data tests revealed that the presence of more benchmarks in interventions increased the likelihood of success in promoting physical activity. The presence of more than 3 benchmarks improved the success of the interventions; specifically, all interventions were successful when more than 7.5 benchmarks were present. Further, primary formative research, core product, actual product, augmented product, promotion, and behavioral competition all had a significant influence on the effectiveness of interventions. Social marketing is an effective approach in promoting physical activity among adults when a substantial number of benchmarks are used and when managers understand the audience, make the desired behavior tangible, and promote the desired behavior persuasively.
Predicting Cost/Performance Trade-Offs for Whitney: A Commodity Computing Cluster

NASA Technical Reports Server (NTRS)

Becker, Jeffrey C.; Nitzberg, Bill; VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

Recent advances in low-end processor and network technology have made it possible to build a "supercomputer" out of commodity components. We develop simple models of the NAS Parallel Benchmarks version 2 (NPB 2) to explore the cost/performance trade-offs involved in building a balanced parallel computer supporting a scientific workload. We develop closed form expressions detailing the number and size of messages sent by each benchmark. Coupling these with measured single processor performance, network latency, and network bandwidth, our models predict benchmark performance to within 30%. A comparison based on total system cost reveals that current commodity technology (200 MHz Pentium Pros with 100baseT Ethernet) is well balanced for the NPBs up to a total system cost of around $1,000,000.
RISC Processors and High Performance Computing

NASA Technical Reports Server (NTRS)

Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

1995-01-01

This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.
HTR-PROTEUS PEBBLE BED EXPERIMENTAL PROGRAM CORE 4: RANDOM PACKING WITH A 1:1 MODERATOR-TO-FUEL PEBBLE RATIO

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; Leland M. Montierth

2013-03-01

In its deployment as a pebble bed reactor (PBR) critical facility from 1992 to 1996, the PROTEUS facility was designated as HTR-PROTEUS. This experimental program was performed as part of an International Atomic Energy Agency (IAEA) Coordinated Research Project (CRP) on the Validation of Safety Related Physics Calculations for Low Enriched HTGRs. Within this project, critical experiments were conducted for graphite moderated LEU systems to determine core reactivity, flux and power profiles, reaction-rate ratios, the worth of control rods, both in-core and reflector based, the worth of burnable poisons, kinetic parameters, and the effects of moisture ingress on these parameters.more » One benchmark experiment was evaluated in this report: Core 4. Core 4 represents the only configuration with random pebble packing in the HTR-PROTEUS series of experiments, and has a moderator-to-fuel pebble ratio of 1:1. Three random configurations were performed. The initial configuration, Core 4.1, was rejected because the method for pebble loading, separate delivery tubes for the moderator and fuel pebbles, may not have been completely random; this core loading was rejected by the experimenters. Cores 4.2 and 4.3 were loaded using a single delivery tube, eliminating the possibility for systematic ordering effects. The second and third cores differed slightly in the quantity of pebbles loaded (40 each of moderator and fuel pebbles), stacked height of the pebbles in the core cavity (0.02 m), withdrawn distance of the stainless steel control rods (20 mm), and withdrawn distance of the autorod (30 mm). The 34 coolant channels in the upper axial reflector and the 33 coolant channels in the lower axial reflector were open. Additionally, the axial graphite fillers used in all other HTR-PROTEUS configurations to create a 12-sided core cavity were not used in the randomly packed cores. Instead, graphite fillers were placed on the cavity floor, creating a funnel-like base, to discourage ordering effects during pebble loading. Core 4 was determined to be acceptable benchmark experiment.« less
HTR-proteus pebble bed experimental program core 4: random packing with a 1:1 moderator-to-fuel pebble ratio

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Montierth, Leland M.; Sterbentz, James W.

2014-03-01

In its deployment as a pebble bed reactor (PBR) critical facility from 1992 to 1996, the PROTEUS facility was designated as HTR-PROTEUS. This experimental program was performed as part of an International Atomic Energy Agency (IAEA) Coordinated Research Project (CRP) on the Validation of Safety Related Physics Calculations for Low Enriched HTGRs. Within this project, critical experiments were conducted for graphite moderated LEU systems to determine core reactivity, flux and power profiles, reaction-rate ratios, the worth of control rods, both in-core and reflector based, the worth of burnable poisons, kinetic parameters, and the effects of moisture ingress on these parameters.more » One benchmark experiment was evaluated in this report: Core 4. Core 4 represents the only configuration with random pebble packing in the HTR-PROTEUS series of experiments, and has a moderator-to-fuel pebble ratio of 1:1. Three random configurations were performed. The initial configuration, Core 4.1, was rejected because the method for pebble loading, separate delivery tubes for the moderator and fuel pebbles, may not have been completely random; this core loading was rejected by the experimenters. Cores 4.2 and 4.3 were loaded using a single delivery tube, eliminating the possibility for systematic ordering effects. The second and third cores differed slightly in the quantity of pebbles loaded (40 each of moderator and fuel pebbles), stacked height of the pebbles in the core cavity (0.02 m), withdrawn distance of the stainless steel control rods (20 mm), and withdrawn distance of the autorod (30 mm). The 34 coolant channels in the upper axial reflector and the 33 coolant channels in the lower axial reflector were open. Additionally, the axial graphite fillers used in all other HTR-PROTEUS configurations to create a 12-sided core cavity were not used in the randomly packed cores. Instead, graphite fillers were placed on the cavity floor, creating a funnel-like base, to discourage ordering effects during pebble loading. Core 4 was determined to be acceptable benchmark experiment.« less
Quantum Monte Carlo Computations of Phase Stability, Equations of State, and Elasticity of High-Pressure Silica

NASA Astrophysics Data System (ADS)

Driver, K. P.; Cohen, R. E.; Wu, Z.; Militzer, B.; Ríos, P. L.; Towler, M. D.; Needs, R. J.; Wilkins, J. W.

2011-12-01

Silica (SiO2) is an abundant component of the Earth whose crystalline polymorphs play key roles in its structure and dynamics. First principle density functional theory (DFT) methods have often been used to accurately predict properties of silicates, but fundamental failures occur. Such failures occur even in silica, the simplest silicate, and understanding pure silica is a prerequisite to understanding the rocky part of the Earth. Here, we study silica with quantum Monte Carlo (QMC), which until now was not computationally possible for such complex materials, and find that QMC overcomes the failures of DFT. QMC is a benchmark method that does not rely on density functionals but rather explicitly treats the electrons and their interactions via a stochastic solution of Schrödinger's equation. Using ground-state QMC plus phonons within the quasiharmonic approximation of density functional perturbation theory, we obtain the thermal pressure and equations of state of silica phases up to Earth's core-mantle boundary. Our results provide the best constrained equations of state and phase boundaries available for silica. QMC indicates a transition to the dense α-PbO2 structure above the core-insulating D" layer, but the absence of a seismic signature suggests the transition does not contribute significantly to global seismic discontinuities in the lower mantle. However, the transition could still provide seismic signals from deeply subducted oceanic crust. We also find an accurate shear elastic constant for stishovite and its geophysically important softening with pressure.
3D streamers simulation in a pin to plane configuration using massively parallel computing

NASA Astrophysics Data System (ADS)

Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.

2018-03-01

This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
Distributed Memory Parallel Computing with SEAWAT

NASA Astrophysics Data System (ADS)

Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

2017-12-01

Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources. Speed-ups up to 40 were obtained with the new PKS solver.
Accelerated event-by-event Monte Carlo microdosimetric calculations of electrons and protons tracks on a multi-core CPU and a CUDA-enabled GPU.

PubMed

Kalantzis, Georgios; Tachibana, Hidenobu

2014-01-01

For microdosimetric calculations event-by-event Monte Carlo (MC) methods are considered the most accurate. The main shortcoming of those methods is the extensive requirement for computational time. In this work we present an event-by-event MC code of low projectile energy electron and proton tracks for accelerated microdosimetric MC simulations on a graphic processing unit (GPU). Additionally, a hybrid implementation scheme was realized by employing OpenMP and CUDA in such a way that both GPU and multi-core CPU were utilized simultaneously. The two implementation schemes have been tested and compared with the sequential single threaded MC code on the CPU. Performance comparison was established on the speed-up for a set of benchmarking cases of electron and proton tracks. A maximum speedup of 67.2 was achieved for the GPU-based MC code, while a further improvement of the speedup up to 20% was achieved for the hybrid approach. The results indicate the capability of our CPU-GPU implementation for accelerated MC microdosimetric calculations of both electron and proton tracks without loss of accuracy. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Next Generation School Districts: What Capacities Do Districts Need to Create and Sustain Schools That Are Ready to Deliver on Common Core?

ERIC Educational Resources Information Center

Lake, Robin; Hill, Paul T.; Maas, Tricia

2015-01-01

Every sector of the U.S. economy is working on ways to deliver services in a more customized manner. If all goes well, education is headed in the same direction. Personalized learning and globally benchmarked academic standards (a.k.a. Common Core) are the focus of most major school districts and charter school networks. Educators and parents know…
Simulation of Benchmark Cases with the Terminal Area Simulation System (TASS)

NASA Technical Reports Server (NTRS)

Ahmad, Nashat N.; Proctor, Fred H.

2011-01-01

The hydrodynamic core of the Terminal Area Simulation System (TASS) is evaluated against different benchmark cases. In the absence of closed form solutions for the equations governing atmospheric flows, the models are usually evaluated against idealized test cases. Over the years, various authors have suggested a suite of these idealized cases which have become standards for testing and evaluating the dynamics and thermodynamics of atmospheric flow models. In this paper, simulations of three such cases are described. In addition, the TASS model is evaluated against a test case that uses an exact solution of the Navier-Stokes equations. The TASS results are compared against previously reported simulations of these benchmark cases in the literature. It is demonstrated that the TASS model is highly accurate, stable and robust.
Coupled thermo-chemical boundary conditions in double-diffusive geodynamo models at arbitrary Lewis numbers.

NASA Astrophysics Data System (ADS)

Bouffard, M.

2016-12-01

Convection in the Earth's outer core is driven by the combination of two buoyancy sources: a thermal source directly related to the Earth's secular cooling, the release of latent heat and possibly the heat generated by radioactive decay, and a compositional source due to the crystallization of the growing inner core which releases light elements into the liquid outer core. The dynamics of fusion/crystallization being dependent on the heat flux distribution, the thermochemical boundary conditions are coupled at the inner core boundary which may affect the dynamo in various ways, particularly if heterogeneous conditions are imposed at one boundary. In addition, the thermal and compositional molecular diffusivities differ by three orders of magnitude. This can produce significant differences in the convective dynamics compared to pure thermal or compositional convection due to the potential occurence of double-diffusive phenomena. Traditionally, temperature and composition have been combined into one single variable called codensity under the assumption that turbulence mixes all physical properties at an "eddy-diffusion" rate. This description does not allow for a proper treatment of the thermochemical coupling and is certainly incorrect within stratified layers in which double-diffusive phenomena can be expected. For a more general and rigorous approach, two distinct transport equations should therefore be solved for temperature and composition. However, the weak compositional diffusivity is technically difficult to handle in current geodynamo codes and requires the use of a semi-Lagrangian description to minimize numerical diffusion. We implemented a "particle-in-cell" method into a geodynamo code to properly describe the compositional field. The code is suitable for High Parallel Computing architectures and was successfully tested on two benchmarks. Following the work by Aubert et al. (2008) we use this new tool to perform dynamo simulations including thermochemical coupling at the inner core boundary as well as exploration of the infinite Lewis number limit to study the effect of a heterogeneous core mantle boundary heat flow on the inner core growth.
Time and frequency structure of causal correlation networks in the China bond market

NASA Astrophysics Data System (ADS)

Wang, Zhongxing; Yan, Yan; Chen, Xiaosong

2017-07-01

There are more than eight hundred interest rates published in the China bond market every day. Identifying the benchmark interest rates that have broad influences on most other interest rates is a major concern for economists. In this paper, a multi-variable Granger causality test is developed and applied to construct a directed network of interest rates, whose important nodes, regarded as key interest rates, are evaluated with CheiRank scores. The results indicate that repo rates are the benchmark of short-term rates, the central bank bill rates are in the core position of mid-term interest rates network, and treasury bond rates lead the long-term bond rates. The evolution of benchmark interest rates from 2008 to 2014 is also studied, and it is found that SHIBOR has generally become the benchmark interest rate in China. In the frequency domain we identify the properties of information flows between interest rates, and the result confirms the existence of market segmentation in the China bond market.
On the accuracy of density functional theory and wave function methods for calculating vertical ionization energies

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKechnie, Scott; Booth, George H.; Cohen, Aron J.

The best practice in computational methods for determining vertical ionization energies (VIEs) is assessed, via reference to experimentally determined VIEs that are corroborated by highly accurate coupled-cluster calculations. These reference values are used to benchmark the performance of density-functional theory (DFT) and wave function methods: Hartree-Fock theory (HF), second-order Møller-Plesset perturbation theory (MP2) and Electron Propagator Theory (EPT). The core test set consists of 147 small molecules. An extended set of six larger molecules, from benzene to hexacene, is also considered to investigate the dependence of the results on molecule size. The closest agreement with experiment is found for ionizationmore » energies obtained from total energy diff calculations. In particular, DFT calculations using exchange-correlation functionals with either a large amount of exact exchange or long-range correction perform best. The results from these functionals are also the least sensitive to an increase in molecule size. In general, ionization energies calculated directly from the orbital energies of the neutral species are less accurate and more sensitive to an increase in molecule size. For the single-calculation approach, the EPT calculations are in closest agreement for both sets of molecules. For the orbital energies from DFT functionals, only those with long-range correction give quantitative agreement with dramatic failing for all other functionals considered. The results offer a practical hierarchy of approximations for the calculation of vertical ionization energies. In addition, the experimental and computational reference values can be used as a standardized set of benchmarks, against which other approximate methods can be compared.« less
Generalization of the Last Scattering Approximation for the Second Solar Spectrum Modeling: The Ca I 4227 Å Line as a Case Study

NASA Astrophysics Data System (ADS)

Anusha, L. S.; Nagendra, K. N.; Stenflo, J. O.; Bianda, M.; Sampoorna, M.; Frisch, H.; Holzreuter, R.; Ramelli, R.

2010-08-01

To model the second solar spectrum (the linearly polarized spectrum of the Sun that is due to coherent scattering processes), one needs to solve the polarized radiative transfer (RT) equation. For strong resonance lines, partial frequency redistribution (PRD) effects must be accounted for, which make the problem computationally demanding. The "last scattering approximation" (LSA) is a concept that has been introduced to make this highly complex problem more tractable. An earlier application of a simple LSA version could successfully model the wings of the strong Ca I 4227 Å resonance line in Stokes Q/I (fractional linear polarization), but completely failed to reproduce the observed Q/I peak in the line core. Since the magnetic field signatures from the Hanle effect only occur in the line core, we need to generalize the existing LSA approach if it is to be useful for the diagnostics of chromospheric and turbulent magnetic fields. In this paper, we explore three different approximation levels for LSA and compare each of them with the benchmark represented by the solution of the full polarized RT, including PRD effects. The simplest approximation level is LSA-1, which uses the observed center-to-limb variation of the intensity profile to obtain the anisotropy of the radiation field at the surface, without solving any transfer equation. In contrast, the next two approximation levels use the solution of the unpolarized transfer equation to derive the anisotropy of the incident radiation field and use it as an input. In the case of LSA-2, the anisotropy at level τλ = μ, the atmospheric level from which an observed photon is most likely to originate, is used. LSA-3, on the other hand, makes use of the full depth dependence of the radiation anisotropy. The Q/I formula for LSA-3 is obtained by keeping the first term in a series expansion of the Q-source function in powers of the mean number of scattering events. Computationally, LSA-1 is 21 times faster than LSA-2, which is 5 times faster than the more general LSA-3, which itself is 8 times faster than the polarized RT approach. A comparison of the calculated Q/I spectra with the RT benchmark shows excellent agreement for LSA-3, including good modeling of the Q/I core region with its PRD effects. In contrast, both LSA-1 and LSA-2 fail to model the core region. The RT and LSA-3 approaches are then applied to model the recently observed Q/I profile of the Ca I 4227 Å line in quiet regions of the Sun. Apart from a global scale factor both give a very good fit to the Q/I spectra for all the wavelengths, including the core peak and blend line depolarizations. We conclude that LSA-3 is an excellent substitute for the full polarized RT and can be used to interpret the second solar spectrum, including the Hanle effect with PRD. It also allows the techniques developed for unpolarized three-dimensional RT to be applied to the modeling of the second solar spectrum.
Benchmarking of calculation schemes in APOLLO2 and COBAYA3 for WER lattices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheleva, N.; Ivanov, P.; Todorova, G.

This paper presents solutions of the NURISP WER lattice benchmark using APOLLO2, TRIPOLI4 and COBAYA3 pin-by-pin. The main objective is to validate MOC based calculation schemes for pin-by-pin cross-section generation with APOLLO2 against TRIPOLI4 reference results. A specific objective is to test the APOLLO2 generated cross-sections and interface discontinuity factors in COBAYA3 pin-by-pin calculations with unstructured mesh. The VVER-1000 core consists of large hexagonal assemblies with 2 mm inter-assembly water gaps which require the use of unstructured meshes in the pin-by-pin core simulators. The considered 2D benchmark problems include 19-pin clusters, fuel assemblies and 7-assembly clusters. APOLLO2 calculation schemes withmore » the step characteristic method (MOC) and the higher-order Linear Surface MOC have been tested. The comparison of APOLLO2 vs. TRIPOLI4 results shows a very close agreement. The 3D lattice solver in COBAYA3 uses transport corrected multi-group diffusion approximation with interface discontinuity factors of Generalized Equivalence Theory (GET) or Black Box Homogenization (BBH) type. The COBAYA3 pin-by-pin results in 2, 4 and 8 energy groups are close to the reference solutions when using side-dependent interface discontinuity factors. (authors)« less
Benchmarking road safety performance: Identifying a meaningful reference (best-in-class).

PubMed

Chen, Faan; Wu, Jiaorong; Chen, Xiaohong; Wang, Jianjun; Wang, Di

2016-01-01

For road safety improvement, comparing and benchmarking performance are widely advocated as the emerging and preferred approaches. However, there is currently no universally agreed upon approach for the process of road safety benchmarking, and performing the practice successfully is by no means easy. This is especially true for the two core activities of which: (1) developing a set of road safety performance indicators (SPIs) and combining them into a composite index; and (2) identifying a meaningful reference (best-in-class), one which has already obtained outstanding road safety practices. To this end, a scientific technique that can combine the multi-dimensional safety performance indicators (SPIs) into an overall index, and subsequently can identify the 'best-in-class' is urgently required. In this paper, the Entropy-embedded RSR (Rank-sum ratio), an innovative, scientific and systematic methodology is investigated with the aim of conducting the above two core tasks in an integrative and concise procedure, more specifically in a 'one-stop' way. Using a combination of results from other methods (e.g. the SUNflower approach) and other measures (e.g. Human Development Index) as a relevant reference, a given set of European countries are robustly ranked and grouped into several classes based on the composite Road Safety Index. Within each class the 'best-in-class' is then identified. By benchmarking road safety performance, the results serve to promote best practice, encourage the adoption of successful road safety strategies and measures and, more importantly, inspire the kind of political leadership needed to create a road transport system that maximizes safety. Copyright © 2015 Elsevier Ltd. All rights reserved.
Properties of 5052 Aluminum For Use as Honeycomb Core in Manned Spaceflight

NASA Technical Reports Server (NTRS)

Lerch, Bradley A.

2018-01-01

This work explains that the properties of Al 5052 material used commonly for honeycomb cores in sandwich panels are highly dependent on the tempering condition. It has not been common to specify the temper when ordering HC material nor is it common for the supplier to state what the temper is. For aerospace uses, a temper of H38 or H39 is probably recommended. This temper should be stated in the bill of material and should be verified upon receipt of the core. To this end some properties provided herein can aid as benchmark values.

Investigation of Abnormal Heat Transfer and Flow in a VHTR Reactor Core

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kawaji, Masahiro; Valentin, Francisco I.; Artoun, Narbeh

2015-12-21

The main objective of this project was to identify and characterize the conditions under which abnormal heat transfer phenomena would occur in a Very High Temperature Reactor (VHTR) with a prismatic core. High pressure/high temperature experiments have been conducted to obtain data that could be used for validation of VHTR design and safety analysis codes. The focus of these experiments was on the generation of benchmark data for design and off-design heat transfer for forced, mixed and natural circulation in a VHTR core. In particular, a flow laminarization phenomenon was intensely investigated since it could give rise to hot spotsmore » in the VHTR core.« less
Debris characterization diagnostic for the NIF

NASA Astrophysics Data System (ADS)

Miller, M. C.; Celeste, J. R.; Stoyer, M. A.; Suter, L. J.; Tobin, M. T.; Grun, J.; Davis, J. F.; Barnes, C. W.; Wilson, D. C.

2001-01-01

Generation of debris from targets and by x-ray ablation of surrounding materials will be a matter of concern for experimenters and National Ignition Facility (NIF) operations. Target chamber and final optics protection, for example debris shield damage, drive the interest for NIF operations. Experimenters are primarily concerned with diagnostic survivability, separation of mechanical versus radiation induced test object response in the case of effects tests, and radiation transport through the debris field when the net radiation output is used to benchmark computer codes. In addition, radiochemical analysis of activated capsule debris during ignition shots can provide a measure of the ablator <ρr>. Conceptual design of the Debris Monitor and Rad-Chem Station, one of the NIF core diagnostics, is presented. Methods of debris collection, particle size and mass analysis, impulse measurement, and radiochemical analysis are given. A description of recent experiments involving debris collection and impulse measurement on the OMEGA and Pharos lasers is also provided.
A suite of exercises for verifying dynamic earthquake rupture codes

USGS Publications Warehouse

Harris, Ruth A.; Barall, Michael; Aagaard, Brad T.; Ma, Shuo; Roten, Daniel; Olsen, Kim B.; Duan, Benchun; Liu, Dunyu; Luo, Bin; Bai, Kangchen; Ampuero, Jean-Paul; Kaneko, Yoshihiro; Gabriel, Alice-Agnes; Duru, Kenneth; Ulrich, Thomas; Wollherr, Stephanie; Shi, Zheqiang; Dunham, Eric; Bydlon, Sam; Zhang, Zhenguo; Chen, Xiaofei; Somala, Surendra N.; Pelties, Christian; Tago, Josue; Cruz-Atienza, Victor Manuel; Kozdon, Jeremy; Daub, Eric; Aslam, Khurram; Kase, Yuko; Withers, Kyle; Dalguer, Luis

2018-01-01

We describe a set of benchmark exercises that are designed to test if computer codes that simulate dynamic earthquake rupture are working as intended. These types of computer codes are often used to understand how earthquakes operate, and they produce simulation results that include earthquake size, amounts of fault slip, and the patterns of ground shaking and crustal deformation. The benchmark exercises examine a range of features that scientists incorporate in their dynamic earthquake rupture simulations. These include implementations of simple or complex fault geometry, off‐fault rock response to an earthquake, stress conditions, and a variety of formulations for fault friction. Many of the benchmarks were designed to investigate scientific problems at the forefronts of earthquake physics and strong ground motions research. The exercises are freely available on our website for use by the scientific community.
Proposed biopsy performance benchmarks for MRI based on an audit of a large academic center.

PubMed

Sedora Román, Neda I; Mehta, Tejas S; Sharpe, Richard E; Slanetz, Priscilla J; Venkataraman, Shambhavi; Fein-Zachary, Valerie; Dialani, Vandana

2018-05-01

Performance benchmarks exist for mammography (MG); however, performance benchmarks for magnetic resonance imaging (MRI) are not yet fully developed. The purpose of our study was to perform an MRI audit based on established MG and screening MRI benchmarks and to review whether these benchmarks can be applied to an MRI practice. An IRB approved retrospective review of breast MRIs was performed at our center from 1/1/2011 through 12/31/13. For patients with biopsy recommendation, core biopsy and surgical pathology results were reviewed. The data were used to derive mean performance parameter values, including abnormal interpretation rate (AIR), positive predictive value (PPV), cancer detection rate (CDR), percentage of minimal cancers and axillary node negative cancers and compared with MG and screening MRI benchmarks. MRIs were also divided by screening and diagnostic indications to assess for differences in performance benchmarks amongst these two groups. Of the 2455 MRIs performed over 3-years, 1563 were performed for screening indications and 892 for diagnostic indications. With the exception of PPV2 for screening breast MRIs from 2011 to 2013, PPVs were met for our screening and diagnostic populations when compared to the MRI screening benchmarks established by the Breast Imaging Reporting and Data System (BI-RADS) 5 Atlas ® . AIR and CDR were lower for screening indications as compared to diagnostic indications. New MRI screening benchmarks can be used for screening MRI audits while the American College of Radiology (ACR) desirable goals for diagnostic MG can be used for diagnostic MRI audits. Our study corroborates established findings regarding differences in AIR and CDR amongst screening versus diagnostic indications. © 2017 Wiley Periodicals, Inc.
Processor Emulator with Benchmark Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lloyd, G. Scott; Pearce, Roger; Gokhale, Maya

2015-11-13

A processor emulator and a suite of benchmark applications have been developed to assist in characterizing the performance of data-centric workloads on current and future computer architectures. Some of the applications have been collected from other open source projects. For more details on the emulator and an example of its usage, see reference [1].
Under Construction: Benchmark Assessments and Common Core Math Implementation in Grades K-8. Formative Evaluation Cycle Report for the Math in Common Initiative, Volume 1

ERIC Educational Resources Information Center

Flaherty, John, Jr.; Sobolew-Shubin, Alexandria; Heredia, Alberto; Chen-Gaddini, Min; Klarin, Becca; Finkelstein, Neal D.

2014-01-01

Math in Common® (MiC) is a five-year initiative that supports a formal network of 10 California school districts as they implement the Common Core State Standards in mathematics (CCSS-M) across grades K-8. As the MiC initiative moves into its second year, one of the central activities that each of the districts is undergoing to support CCSS…
Probabilistic performance estimators for computational chemistry methods: The empirical cumulative distribution function of absolute errors

NASA Astrophysics Data System (ADS)

Pernot, Pascal; Savin, Andreas

2018-06-01

Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.
First benchmark of the Unstructured Grid Adaptation Working Group

NASA Technical Reports Server (NTRS)

Ibanez, Daniel; Barral, Nicolas; Krakos, Joshua; Loseille, Adrien; Michal, Todd; Park, Mike

2017-01-01

Unstructured grid adaptation is a technology that holds the potential to improve the automation and accuracy of computational fluid dynamics and other computational disciplines. Difficulty producing the highly anisotropic elements necessary for simulation on complex curved geometries that satisfies a resolution request has limited this technology's widespread adoption. The Unstructured Grid Adaptation Working Group is an open gathering of researchers working on adapting simplicial meshes to conform to a metric field. Current members span a wide range of institutions including academia, industry, and national laboratories. The purpose of this group is to create a common basis for understanding and improving mesh adaptation. We present our first major contribution: a common set of benchmark cases, including input meshes and analytic metric specifications, that are publicly available to be used for evaluating any mesh adaptation code. We also present the results of several existing codes on these benchmark cases, to illustrate their utility in identifying key challenges common to all codes and important differences between available codes. Future directions are defined to expand this benchmark to mature the technology necessary to impact practical simulation workflows.
Verification and benchmark testing of the NUFT computer code

NASA Astrophysics Data System (ADS)

Lee, K. H.; Nitao, J. J.; Kulshrestha, A.

1993-10-01

This interim report presents results of work completed in the ongoing verification and benchmark testing of the NUFT (Nonisothermal Unsaturated-saturated Flow and Transport) computer code. NUFT is a suite of multiphase, multicomponent models for numerical solution of thermal and isothermal flow and transport in porous media, with application to subsurface contaminant transport problems. The code simulates the coupled transport of heat, fluids, and chemical components, including volatile organic compounds. Grid systems may be cartesian or cylindrical, with one-, two-, or fully three-dimensional configurations possible. In this initial phase of testing, the NUFT code was used to solve seven one-dimensional unsaturated flow and heat transfer problems. Three verification and four benchmarking problems were solved. In the verification testing, excellent agreement was observed between NUFT results and the analytical or quasianalytical solutions. In the benchmark testing, results of code intercomparison were very satisfactory. From these testing results, it is concluded that the NUFT code is ready for application to field and laboratory problems similar to those addressed here. Multidimensional problems, including those dealing with chemical transport, will be addressed in a subsequent report.
Optimization of a solid-state electron spin qubit using Gate Set Tomography

DOE PAGES

Dehollain, Juan P.; Muhonen, Juha T.; Blume-Kohout, Robin J.; ...

2016-10-13

Here, state of the art qubit systems are reaching the gate fidelities required for scalable quantum computation architectures. Further improvements in the fidelity of quantum gates demands characterization and benchmarking protocols that are efficient, reliable and extremely accurate. Ideally, a benchmarking protocol should also provide information on how to rectify residual errors. Gate Set Tomography (GST) is one such protocol designed to give detailed characterization of as-built qubits. We implemented GST on a high-fidelity electron-spin qubit confined by a single 31P atom in 28Si. The results reveal systematic errors that a randomized benchmarking analysis could measure but not identify, whereasmore » GST indicated the need for improved calibration of the length of the control pulses. After introducing this modification, we measured a new benchmark average gate fidelity of 99.942(8)%, an improvement on the previous value of 99.90(2)%. Furthermore, GST revealed high levels of non-Markovian noise in the system, which will need to be understood and addressed when the qubit is used within a fault-tolerant quantum computation scheme.« less
Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA.

PubMed

Xia, Fei; Jin, Guoqing

2014-06-01

PKNOTS is a most famous benchmark program and has been widely used to predict RNA secondary structure including pseudoknots. It adopts the standard four-dimensional (4D) dynamic programming (DP) method and is the basis of many variants and improved algorithms. Unfortunately, the O(N(6)) computing requirements and complicated data dependency greatly limits the usefulness of PKNOTS package with the explosion in gene database size. In this paper, we present a fine-grained parallel PKNOTS package and prototype system for accelerating RNA folding application based on FPGA chip. We adopted a series of storage optimization strategies to resolve the "Memory Wall" problem. We aggressively exploit parallel computing strategies to improve computational efficiency. We also propose several methods that collectively reduce the storage requirements for FPGA on-chip memory. To the best of our knowledge, our design is the first FPGA implementation for accelerating 4D DP problem for RNA folding application including pseudoknots. The experimental results show a factor of more than 50x average speedup over the PKNOTS-1.08 software running on a PC platform with Intel Core2 Q9400 Quad CPU for input RNA sequences. However, the power consumption of our FPGA accelerator is only about 50% of the general-purpose micro-processors.
GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus Stenotrophomonas.

PubMed

Vinuesa, Pablo; Ochoa-Sánchez, Luz E; Contreras-Moreira, Bruno

2018-01-01

The massive accumulation of genome-sequences in public databases promoted the proliferation of genome-level phylogenetic analyses in many areas of biological research. However, due to diverse evolutionary and genetic processes, many loci have undesirable properties for phylogenetic reconstruction. These, if undetected, can result in erroneous or biased estimates, particularly when estimating species trees from concatenated datasets. To deal with these problems, we developed GET_PHYLOMARKERS, a pipeline designed to identify high-quality markers to estimate robust genome phylogenies from the orthologous clusters, or the pan-genome matrix (PGM), computed by GET_HOMOLOGUES. In the first context, a set of sequential filters are applied to exclude recombinant alignments and those producing anomalous or poorly resolved trees. Multiple sequence alignments and maximum likelihood (ML) phylogenies are computed in parallel on multi-core computers. A ML species tree is estimated from the concatenated set of top-ranking alignments at the DNA or protein levels, using either FastTree or IQ-TREE (IQT). The latter is used by default due to its superior performance revealed in an extensive benchmark analysis. In addition, parsimony and ML phylogenies can be estimated from the PGM. We demonstrate the practical utility of the software by analyzing 170 Stenotrophomonas genome sequences available in RefSeq and 10 new complete genomes of Mexican environmental S. maltophilia complex (Smc) isolates reported herein. A combination of core-genome and PGM analyses was used to revise the molecular systematics of the genus. An unsupervised learning approach that uses a goodness of clustering statistic identified 20 groups within the Smc at a core-genome average nucleotide identity (cgANIb) of 95.9% that are perfectly consistent with strongly supported clades on the core- and pan-genome trees. In addition, we identified 16 misclassified RefSeq genome sequences, 14 of them labeled as S. maltophilia , demonstrating the broad utility of the software for phylogenomics and geno-taxonomic studies. The code, a detailed manual and tutorials are freely available for Linux/UNIX servers under the GNU GPLv3 license at https://github.com/vinuesa/get_phylomarkers. A docker image bundling GET_PHYLOMARKERS with GET_HOMOLOGUES is available at https://hub.docker.com/r/csicunam/get_homologues/, which can be easily run on any platform.
Tinker-HP: a massively parallel molecular dynamics package for multiscale simulations of large complex systems with advanced point dipole polarizable force fields.

PubMed

Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F; Harger, Matthew; Torabifard, Hedieh; Cisneros, G Andrés; Schnieders, Michael J; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y; Ponder, Jay W; Piquemal, Jean-Philip

2018-01-28

We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over a single-core computation is observed for the largest systems. The extension of the present CPU implementation of Tinker-HP to other computational platforms is discussed.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo; Sterpin, Edmond

2016-04-15

Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithmmore » of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.« less
Benchmarking atomic physics models for magnetically confined fusion plasma physics experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

May, M.J.; Finkenthal, M.; Soukhanovskii, V.

In present magnetically confined fusion devices, high and intermediate {ital Z} impurities are either puffed into the plasma for divertor radiative cooling experiments or are sputtered from the high {ital Z} plasma facing armor. The beneficial cooling of the edge as well as the detrimental radiative losses from the core of these impurities can be properly understood only if the atomic physics used in the modeling of the cooling curves is very accurate. To this end, a comprehensive experimental and theoretical analysis of some relevant impurities is undertaken. Gases (Ne, Ar, Kr, and Xe) are puffed and nongases are introducedmore » through laser ablation into the FTU tokamak plasma. The charge state distributions and total density of these impurities are determined from spatial scans of several photometrically calibrated vacuum ultraviolet and x-ray spectrographs (3{endash}1600 {Angstrom}), the multiple ionization state transport code transport code (MIST) and a collisional radiative model. The radiative power losses are measured with bolometery, and the emissivity profiles were measured by a visible bremsstrahlung array. The ionization balance, excitation physics, and the radiative cooling curves are computed from the Hebrew University Lawrence Livermore atomic code (HULLAC) and are benchmarked by these experiments. (Supported by U.S. DOE Grant No. DE-FG02-86ER53214 at JHU and Contract No. W-7405-ENG-48 at LLNL.) {copyright} {ital 1999 American Institute of Physics.}« less
PdnCO (n = 1,2): accurate Ab initio bond energies, geometries, and dipole moments and the applicability of density functional theory for fuel cell modeling.

PubMed

Schultz, Nathan E; Gherman, Benjamin F; Cramer, Christopher J; Truhlar, Donald G

2006-11-30

Electrode poisoning by CO is a major concern in fuel cells. As interest in applying computational methods to electrochemistry is increasing, it is important to understand the levels of theory required for reliable treatments of metal-CO interactions. In this paper we justify the use of relativistic effective core potentials for the treatment of PdCO and hence, by inference, for metal-CO interactions where the predominant bonding mechanism is charge transfer. We also sort out key issues involving basis sets and we recommend that bond energies of 17.2, 43.3, and 69.4 kcal/mol be used as the benchmark bond energy for dissociation of Pd2 into Pd atoms, PdCO into Pd and CO, and Pd2CO into Pd2 and CO, respectively. We calculated the dipole moments of PdCO and Pd2CO, and we recommend benchmark values of 2.49 and 2.81 D, respectively. Furthermore, we tested 27 density functionals for this system and found that only hybrid density functionals can qualitatively and quantitatively predict the nature of the sigma-donation/pi-back-donation mechanism that is associated with the Pd-CO and Pd2-CO bonds. The most accurate density functionals for the systems tested in this paper are O3LYP, OLYP, PW6B95, and PBEh.
Benchmark ab Initio Characterization of the Complex Potential Energy Surfaces of the X- + NH2Y [X, Y = F, Cl, Br, I] Reactions.

PubMed

Hajdu, Bálint; Czakó, Gábor

2018-02-22

We report a comprehensive high-level explicitly correlated ab initio study on the X - + NH 2 Y [X,Y = F, Cl, Br, I] reactions characterizing the stationary points of the S N 2 (Y - + NH 2 X) and proton-transfer (HX + NHY - ) pathways as well as the reaction enthalpies of various endothermic additional product channels such as H - + NHXY, XY - + NH 2 , XY + NH 2 - , and XHY - + NH. Benchmark structures and harmonic vibrational frequencies are obtained at the CCSD(T)-F12b/aug-cc-pVTZ(-PP) level of theory, followed by CCSD(T)-F12b/aug-cc-pVnZ(-PP) [n = Q and 5] and core correlation energy computations. In the entrance and exit channels we find two equivalent hydrogen-bonded C 1 minima, X - ···HH'NY and X - ···H'HNY connected by a C s first-order saddle point, X - ···H 2 NY, as well as a halogen-bonded front-side complex, X - ···YNH 2 . S N 2 reactions can proceed via back-side attack Walden inversion and front-side attack retention pathways characterized by first-order saddle points, submerged [X-NH 2 -Y] - and high-energy [H 2 NXY] - , respectively. Product-like stationary points below the HX + NHY - asymptotes are involved in the proton-transfer processes.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)

2002-01-01

The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
MATCHED-INDEX-OF-REFRACTION FLOW FACILITY FOR FUNDAMENTAL AND APPLIED RESEARCH

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piyush Sabharwall; Carl Stoots; Donald M. McEligot

2014-11-01

Significant challenges face reactor designers with regard to thermal hydraulic design and associated modeling for advanced reactor concepts. Computational thermal hydraulic codes solve only a piece of the core. There is a need for a whole core dynamics system code with local resolution to investigate and understand flow behavior with all the relevant physics and thermo-mechanics. The matched index of refraction (MIR) flow facility at Idaho National Laboratory (INL) has a unique capability to contribute to the development of validated computational fluid dynamics (CFD) codes through the use of state-of-the-art optical measurement techniques, such as Laser Doppler Velocimetry (LDV) andmore » Particle Image Velocimetry (PIV). PIV is a non-intrusive velocity measurement technique that tracks flow by imaging the movement of small tracer particles within a fluid. At the heart of a PIV calculation is the cross correlation algorithm, which is used to estimate the displacement of particles in some small part of the image over the time span between two images. Generally, the displacement is indicated by the location of the largest peak. To quantify these measurements accurately, sophisticated processing algorithms correlate the locations of particles within the image to estimate the velocity (Ref. 1). Prior to use with reactor deign, the CFD codes have to be experimentally validated, which requires rigorous experimental measurements to produce high quality, multi-dimensional flow field data with error quantification methodologies. Computational thermal hydraulic codes solve only a piece of the core. There is a need for a whole core dynamics system code with local resolution to investigate and understand flow behavior with all the relevant physics and thermo-mechanics. Computational techniques with supporting test data may be needed to address the heat transfer from the fuel to the coolant during the transition from turbulent to laminar flow, including the possibility of an early laminarization of the flow (Refs. 2 and 3) (laminarization is caused when the coolant velocity is theoretically in the turbulent regime, but the heat transfer properties are indicative of the coolant velocity being in the laminar regime). Such studies are complicated enough that computational fluid dynamics (CFD) models may not converge to the same conclusion. Thus, experimentally scaled thermal hydraulic data with uncertainties should be developed to support modeling and simulation for verification and validation activities. The fluid/solid index of refraction matching technique allows optical access in and around geometries that would otherwise be impossible while the large test section of the INL system provides better spatial and temporal resolution than comparable facilities. Benchmark data for assessing computational fluid dynamics can be acquired for external flows, internal flows, and coupled internal/external flows for better understanding of physical phenomena of interest. The core objective of this study is to describe MIR and its capabilities, and mention current development areas for uncertainty quantification, mainly the uncertainty surface method and cross-correlation method. Using these methods, it is anticipated to establish a suitable approach to quantify PIV uncertainty for experiments performed in the MIR.« less
Computation of thermodynamic equilibrium in systems under stress

NASA Astrophysics Data System (ADS)

Vrijmoed, Johannes C.; Podladchikov, Yuri Y.

2016-04-01

Metamorphic reactions may be partly controlled by the local stress distribution as suggested by observations of phase assemblages around garnet inclusions related to an amphibolite shear zone in granulite of the Bergen Arcs in Norway. A particular example presented in fig. 14 of Mukai et al. [1] is discussed here. A garnet crystal embedded in a plagioclase matrix is replaced on the left side by a high pressure intergrowth of kyanite and quartz and on the right side by chlorite-amphibole. This texture apparently represents disequilibrium. In this case, the minerals adapt to the low pressure ambient conditions only where fluids were present. Alternatively, here we compute that this particular low pressure and high pressure assemblage around a stressed rigid inclusion such as garnet can coexist in equilibrium. To do the computations we developed the Thermolab software package. The core of the software package consists of Matlab functions that generate Gibbs energy of minerals and melts from the Holland and Powell database [2] and aqueous species from the SUPCRT92 database [3]. Most up to date solid solutions are included in a general formulation. The user provides a Matlab script to do the desired calculations using the core functions. Gibbs energy of all minerals, solutions and species are benchmarked versus THERMOCALC, PerpleX [4] and SUPCRT92 and are reproduced within round off computer error. Multi-component phase diagrams have been calculated using Gibbs minimization to benchmark with THERMOCALC and Perple_X. The Matlab script to compute equilibrium in a stressed system needs only two modifications of the standard phase diagram script. Firstly, Gibbs energy of phases considered in the calculation is generated for multiple values of thermodynamic pressure. Secondly, for the Gibbs minimization the proportion of the system at each particular thermodynamic pressure needs to be constrained. The user decides which part of the stress tensor is input as thermodynamic pressure. To compute a case of high and low pressure around a stressed inclusion we first did a Finite Element Method calculation of a rigid inclusion in a viscous matrix under simple shear. From the computed stress distribution we took the local pressure (mean stress) in each grid point of the FEM calculation. This was used as input thermodynamic pressure in the Gibbs minimization and the result showed it is possible to have an equilibrium situation in which chlorite-amphibole is stable in the low pressure domain and kyanite in the high pressure domain of the stress field around the inclusion. Interestingly, the calculation predicts the redistribution of fluid from an average content of fluid in the system. The fluid in equilibrium tends to accumulate in the low pressure areas whereas it leaves the high pressure areas dry. Transport of fluid components occurs not necessarily by fluid flow, but may happen for example by diffusion. We conclude that an apparent disequilibrium texture may be explained by equilibrium under pressure variations, and apparent fluid addition by redistribution of fluid controlled by the local stress distribution. [1] Mukai et al. (2014), Journal of Petrology, 55 (8), p. 1457-1477. [2] Holland and Powell (1998), Journal of Metamorphic Geology, 16, p. 309-343 [3] Johnson et al. (1992), Computers & Geosciences, 18 (7), p. 899-947 [4] Connolly (2005), Earth and Planetary Science Letters, 236, p. 524-541

Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
TerraFERMA: Harnessing Advanced Computational Libraries in Earth Science

NASA Astrophysics Data System (ADS)

Wilson, C. R.; Spiegelman, M.; van Keken, P.

2012-12-01

Many important problems in Earth sciences can be described by non-linear coupled systems of partial differential equations. These "multi-physics" problems include thermo-chemical convection in Earth and planetary interiors, interactions of fluids and magmas with the Earth's mantle and crust and coupled flow of water and ice. These problems are of interest to a large community of researchers but are complicated to model and understand. Much of this complexity stems from the nature of multi-physics where small changes in the coupling between variables or constitutive relations can lead to radical changes in behavior, which in turn affect critical computational choices such as discretizations, solvers and preconditioners. To make progress in understanding such coupled systems requires a computational framework where multi-physics problems can be described at a high-level while maintaining the flexibility to easily modify the solution algorithm. Fortunately, recent advances in computational science provide a basis for implementing such a framework. Here we present the Transparent Finite Element Rapid Model Assembler (TerraFERMA), which leverages several advanced open-source libraries for core functionality. FEniCS (fenicsproject.org) provides a high level language for describing the weak forms of coupled systems of equations, and an automatic code generator that produces finite element assembly code. PETSc (www.mcs.anl.gov/petsc) provides a wide range of scalable linear and non-linear solvers that can be composed into effective multi-physics preconditioners. SPuD (amcg.ese.ic.ac.uk/Spud) is an application neutral options system that provides both human and machine-readable interfaces based on a single xml schema. Our software integrates these libraries and provides the user with a framework for exploring multi-physics problems. A single options file fully describes the problem, including all equations, coefficients and solver options. Custom compiled applications are generated from this file but share an infrastructure for services common to all models, e.g. diagnostics, checkpointing and global non-linear convergence monitoring. This maximizes code reusability, reliability and longevity ensuring that scientific results and the methods used to acquire them are transparent and reproducible. TerraFERMA has been tested against many published geodynamic benchmarks including 2D/3D thermal convection problems, the subduction zone benchmarks and benchmarks for magmatic solitary waves. It is currently being used in the investigation of reactive cracking phenomena with applications to carbon sequestration, but we will principally discuss its use in modeling the migration of fluids in subduction zones. Subduction zones require an understanding of the highly nonlinear interactions of fluids with solids and thus provide an excellent scientific driver for the development of multi-physics software.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Excore Modeling with VERAShift

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pandya, Tara M.; Evans, Thomas M.

It is important to be able to accurately predict the neutron flux outside the immediate reactor core for a variety of safety and material analyses. Monte Carlo radiation transport calculations are required to produce the high fidelity excore responses. Under this milestone VERA (specifically the VERAShift package) has been extended to perform excore calculations by running radiation transport calculations with Shift. This package couples VERA-CS with Shift to perform excore tallies for multiple state points concurrently, with each component capable of parallel execution on independent domains. Specifically, this package performs fluence calculations in the core barrel and vessel, or, performsmore » the requested tallies in any user-defined excore regions. VERAShift takes advantage of the general geometry package in Shift. This gives VERAShift the flexibility to explicitly model features outside the core barrel, including detailed vessel models, detectors, and power plant details. A very limited set of experimental and numerical benchmarks is available for excore simulation comparison. The Consortium for the Advanced Simulation of Light Water Reactors (CASL) has developed a set of excore benchmark problems to include as part of the VERA-CS verification and validation (V&V) problems. The excore capability in VERAShift has been tested on small representative assembly problems, multiassembly problems, and quarter-core problems. VERAView has also been extended to visualize these vessel fluence results from VERAShift. Preliminary vessel fluence results for quarter-core multistate calculations look very promising. Further development is needed to determine the details relevant to excore simulations. Validation of VERA for fluence and excore detectors still needs to be performed against experimental and numerical results.« less
Qualification of CASMO5 / SIMULATE-3K against the SPERT-III E-core cold start-up experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grandi, G.; Moberg, L.

SIMULATE-3K is a three-dimensional kinetic code applicable to LWR Reactivity Initiated Accidents. S3K has been used to calculate several international recognized benchmarks. However, the feedback models in the benchmark exercises are different from the feedback models that SIMULATE-3K uses for LWR reactors. For this reason, it is worth comparing the SIMULATE-3K capabilities for Reactivity Initiated Accidents against kinetic experiments. The Special Power Excursion Reactor Test III was a pressurized-water, nuclear-research facility constructed to analyze the reactor kinetic behavior under initial conditions similar to those of commercial LWRs. The SPERT III E-core resembles a PWR in terms of fuel type, moderator,more » coolant flow rate, and system pressure. The initial test conditions (power, core flow, system pressure, core inlet temperature) are representative of cold start-up, hot start-up, hot standby, and hot full power. The qualification of S3K against the SPERT III E-core measurements is an ongoing work at Studsvik. In this paper, the results for the 30 cold start-up tests are presented. The results show good agreement with the experiments for the reactivity initiated accident main parameters: peak power, energy release and compensated reactivity. Predicted and measured peak powers differ at most by 13%. Measured and predicted reactivity compensations at the time of the peak power differ less than 0.01 $. Predicted and measured energy release differ at most by 13%. All differences are within the experimental uncertainty. (authors)« less
Benchmark On Sensitivity Calculation (Phase III)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ivanova, Tatiana; Laville, Cedric; Dyrda, James

2012-01-01

The sensitivities of the keff eigenvalue to neutron cross sections have become commonly used in similarity studies and as part of the validation algorithm for criticality safety assessments. To test calculations of the sensitivity coefficients, a benchmark study (Phase III) has been established by the OECD-NEA/WPNCS/EG UACSA (Expert Group on Uncertainty Analysis for Criticality Safety Assessment). This paper presents some sensitivity results generated by the benchmark participants using various computational tools based upon different computational methods: SCALE/TSUNAMI-3D and -1D, MONK, APOLLO2-MORET 5, DRAGON-SUSD3D and MMKKENO. The study demonstrates the performance of the tools. It also illustrates how model simplifications impactmore » the sensitivity results and demonstrates the importance of 'implicit' (self-shielding) sensitivities. This work has been a useful step towards verification of the existing and developed sensitivity analysis methods.« less
Anharmonic Vibrational Spectroscopy on Metal Transition Complexes

NASA Astrophysics Data System (ADS)

Latouche, Camille; Bloino, Julien; Barone, Vincenzo

2014-06-01

Advances in hardware performance and the availability of efficient and reliable computational models have made possible the application of computational spectroscopy to ever larger molecular systems. The systematic interpretation of experimental data and the full characterization of complex molecules can then be facilitated. Focusing on vibrational spectroscopy, several approaches have been proposed to simulate spectra beyond the double harmonic approximation, so that more details become available. However, a routine use of such tools requires the preliminary definition of a valid protocol with the most appropriate combination of electronic structure and nuclear calculation models. Several benchmark of anharmonic calculations frequency have been realized on organic molecules. Nevertheless, benchmarks of organometallics or inorganic metal complexes at this level are strongly lacking despite the interest of these systems due to their strong emission and vibrational properties. Herein we report the benchmark study realized with anharmonic calculations on simple metal complexes, along with some pilot applications on systems of direct technological or biological interest.
Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal

The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes themore » FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. Benchmarking of OpenCL-based framework is an effective way for analyzing the performance of system by studying the execution of the benchmark applications. CHO is a suite of benchmark applications that provides support for OpenCL [1]. The authors presented CHO as an OpenCL port of the CHStone benchmark. Using Altera OpenCL (AOCL) compiler to synthesize the benchmark applications, they listed the resource usage and performance of each kernel that can be successfully synthesized by the compiler. In this report, we evaluate the resource usage and performance of the CHO benchmark applications using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board that features an Arria 10 FPGA device. The focus of the report is to have a better understanding of the resource usage and performance of the kernel implementations using Arria-10 FPGA devices compared to Stratix-5 FPGA devices. In addition, we also gain knowledge about the limitations of the current compiler when it fails to synthesize a benchmark application.« less
Information processing using a single dynamical node as complex system

PubMed Central

Appeltant, L.; Soriano, M.C.; Van der Sande, G.; Danckaert, J.; Massar, S.; Dambre, J.; Schrauwen, B.; Mirasso, C.R.; Fischer, I.

2011-01-01

Novel methods for information processing are highly desired in our information-driven society. Inspired by the brain's ability to process information, the recently introduced paradigm known as 'reservoir computing' shows that complex networks can efficiently perform computation. Here we introduce a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback. Through an electronic implementation, we experimentally and numerically demonstrate excellent performance in a speech recognition benchmark. Complementary numerical studies also show excellent performance for a time series prediction benchmark. These results prove that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing. This finding paves the way to feasible and resource-efficient technological implementations of reservoir computing. PMID:21915110
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Numerical benchmarking of a Coarse-Mesh Transport (COMET) Method for medical physics applications

NASA Astrophysics Data System (ADS)

Blackburn, Megan Satterfield

2009-12-01

Radiation therapy has become a very import method for treating cancer patients. Thus, it is extremely important to accurately determine the location of energy deposition during these treatments, maximizing dose to the tumor region and minimizing it to healthy tissue. A Coarse-Mesh Transport Method (COMET) has been developed at the Georgia Institute of Technology in the Computational Reactor and Medical Physics Group for use very successfully with neutron transport to analyze whole-core criticality. COMET works by decomposing a large, heterogeneous system into a set of smaller fixed source problems. For each unique local problem that exists, a solution is obtained that we call a response function. These response functions are pre-computed and stored in a library for future use. The overall solution to the global problem can then be found by a linear superposition of these local problems. This method has now been extended to the transport of photons and electrons for use in medical physics problems to determine energy deposition from radiation therapy treatments. The main goal of this work was to develop benchmarks for testing in order to evaluate the COMET code to determine its strengths and weaknesses for these medical physics applications. For response function calculations, legendre polynomial expansions are necessary for space, angle, polar angle, and azimuthal angle. An initial sensitivity study was done to determine the best orders for future testing. After the expansion orders were found, three simple benchmarks were tested: a water phantom, a simplified lung phantom, and a non-clinical slab phantom. Each of these benchmarks was decomposed into 1cm x 1cm and 0.5cm x 0.5cm coarse meshes. Three more clinically relevant problems were developed from patient CT scans. These benchmarks modeled a lung patient, a prostate patient, and a beam re-entry situation. As before, the problems were divided into 1cm x 1cm, 0.5cm x 0.5cm, and 0.25cm x 0.25cm coarse mesh cases. Multiple beam energies were also tested for each case. The COMET solutions for each case were compared to a reference solution obtained by pure Monte Carlo results from EGSnrc. When comparing the COMET results to the reference cases, a pattern of differences appeared in each phantom case. It was found that better results were obtained for lower energy incident photon beams as well as for larger mesh sizes. Possible changes may need to be made with the expansion orders used for energy and angle to better model high energy secondary electrons. Heterogeneity also did not pose a problem for the COMET methodology. Heterogeneous results were found in a comparable amount of time to the homogeneous water phantom. The COMET results were typically found in minutes to hours of computational time, whereas the reference cases typically required hundreds or thousands of hours. A second sensitivity study was also performed on a more stringent problem and with smaller coarse meshes. Previously, the same expansion order was used for each incident photon beam energy so better comparisons could be made. From this second study, it was found that it is optimal to have different expansion orders based on the incident beam energy. Recommendations for future work with this method include more testing on higher expansion orders or possible code modification to better handle secondary electrons. The method also needs to handle more clinically relevant beam descriptions with an energy and angular distribution associated with it.
An Experimental Study of Characteristic Combustion-Driven Flow for CFD Validation

NASA Technical Reports Server (NTRS)

Santoro, Robert J.

1997-01-01

A series of uni-element rocket injector studies were completed to provide benchmark quality data needed to validate computational fluid dynamic models. A shear coaxial injector geometry was selected as the primary injector for study using gaseous hydrogen/oxygen and gaseous hydrogen/liquid oxygen propellants. Emphasis was placed on the use of nonintrusive diagnostic techniques to characterize the flowfields inside an optically-accessible rocket chamber. Measurements of the velocity and species fields were obtained using laser velocimetry and Raman spectroscopy, respectively. Qualitative flame shape information was also obtained using laser-induced fluorescence excited from OH radicals and laser light scattering studies of aluminum oxide particle seeded combusting flows. The gaseous hydrogen/liquid oxygen propellant studies for the shear coaxial injector focused on breakup mechanisms associated with the liquid oxygen jet under subcritical pressure conditions. Laser sheet illumination techniques were used to visualize the core region of the jet and a Phase Doppler Particle Analyzer was utilized for drop velocity, size and size distribution characterization. The results of these studies indicated that the shear coaxial geometry configuration was a relatively poor injector in terms of mixing. The oxygen core was observed to extend well downstream of the injector and a significant fraction of the mixing occurred in the near nozzle region where measurements were not possible to obtain. Detailed velocity and species measurements were obtained to allow CFD model validation and this set of benchmark data represents the most comprehensive data set available to date. As an extension of the investigation, a series of gas/gas injector studies were conducted in support of the X-33 Reusable Launch Vehicle program. A Gas/Gas Injector Technology team was formed consisting of the Marshall Space Flight Center, the NASA Lewis Research Center, Rocketdyne and Penn State. Injector geometries studied under this task included shear and swirl coaxial configurations as well as an impinging jet injector.
An Experimental Study of Characteristic Combustion-Driven Flow for CFD Validation

NASA Technical Reports Server (NTRS)

Santoro, Robert J.

1997-01-01

A series of uni-element rocket injector studies were completed to provide benchmark quality data needed to validate computational fluid dynamic models. A shear coaxial injector geometry was selected as the primary injector for study using gaseous hydrogen/oxygen and gaseous hydrogen/liquid oxygen propellants. Emphasis was placed on the use of non-intrusive diagnostic techniques to characterize the flowfields inside an optically-accessible rocket chamber. Measurements of the velocity and species fields were obtained using laser velocimetry and Raman spectroscopy, respectively Qualitative flame shape information was also obtained using laser-induced fluorescence excited from OH radicals and laser light scattering studies of aluminum oxide particle seeded combusting flows. The gaseous hydrogen/liquid oxygen propellant studies for the shear coaxial injector focused on breakup mechanisms associated with the liquid oxygen jet under sub-critical pressure conditions. Laser sheet illumination techniques were used to visualize the core region of the jet and a Phase Doppler Particle Analyzer was utilized for drop velocity, size and size distribution characterization. The results of these studies indicated that the shear coaxial geometry configuration was a relatively poor injector in terms of mixing. The oxygen core was observed to extend well downstream of the injector and a significant fraction of the mixing occurred in the near nozzle region where measurements were not possible to obtain Detailed velocity and species measurements were obtained to allow CFD model validation and this set of benchmark data represents the most comprehensive data set available to date As an extension of the investigation, a series of gas/gas injector studies were conducted in support of the X-33 Reusable Launch Vehicle program. A Gas/Gas Injector Technology team was formed consisting of the Marshall Space Flight Center, the NASA Lewis Research Center, Rocketdyne and Penn State. Injector geometries studied under this task included shear and swirl coaxial configurations as well as an impinging jet injector.
A 3-D Finite-Volume Non-hydrostatic Icosahedral Model (NIM)

NASA Astrophysics Data System (ADS)

Lee, Jin

2014-05-01

The Nonhydrostatic Icosahedral Model (NIM) formulates the latest numerical innovation of the three-dimensional finite-volume control volume on the quasi-uniform icosahedral grid suitable for ultra-high resolution simulations. NIM's modeling goal is to improve numerical accuracy for weather and climate simulations as well as to utilize the state-of-art computing architecture such as massive parallel CPUs and GPUs to deliver routine high-resolution forecasts in timely manner. NIM dynamic corel innovations include: * A local coordinate system remapped spherical surface to plane for numerical accuracy (Lee and MacDonald, 2009), * Grid points in a table-driven horizontal loop that allow any horizontal point sequence (A.E. MacDonald, et al., 2010), * Flux-Corrected Transport formulated on finite-volume operators to maintain conservative positive definite transport (J.-L, Lee, ET. Al., 2010), *Icosahedral grid optimization (Wang and Lee, 2011), * All differentials evaluated as three-dimensional finite-volume integrals around the control volume. The three-dimensional finite-volume solver in NIM is designed to improve pressure gradient calculation and orographic precipitation over complex terrain. NIM dynamical core has been successfully verified with various non-hydrostatic benchmark test cases such as internal gravity wave, and mountain waves in Dynamical Cores Model Inter-comparisons Projects (DCMIP). Physical parameterizations suitable for NWP are incorporated into NIM dynamical core and successfully tested with multimonth aqua-planet simulations. Recently, NIM has started real data simulations using GFS initial conditions. Results from the idealized tests as well as real-data simulations will be shown in the conference.
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

PubMed

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
Requirements for benchmarking personal image retrieval systems

NASA Astrophysics Data System (ADS)

Bouguet, Jean-Yves; Dulong, Carole; Kozintsev, Igor; Wu, Yi

2006-01-01

It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.1 For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in2: - data sets - query tasks - ground truth - evaluation measures - benchmarking events. This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
Toward benchmarking in catalysis science: Best practices, challenges, and opportunities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bligaard, Thomas; Bullock, R. Morris; Campbell, Charles T.

Benchmarking is a community-based and (preferably) community-driven activity involving consensus-based decisions on how to make reproducible, fair, and relevant assessments. In catalysis science, important catalyst performance metrics include activity, selectivity, and the deactivation profile, which enable comparisons between new and standard catalysts. Benchmarking also requires careful documentation, archiving, and sharing of methods and measurements, to ensure that the full value of research data can be realized. Beyond these goals, benchmarking presents unique opportunities to advance and accelerate understanding of complex reaction systems by combining and comparing experimental information from multiple, in situ and operando techniques with theoretical insights derived frommore » calculations characterizing model systems. This Perspective describes the origins and uses of benchmarking and its applications in computational catalysis, heterogeneous catalysis, molecular catalysis, and electrocatalysis. As a result, it also discusses opportunities and challenges for future developments in these fields.« less
Toward benchmarking in catalysis science: Best practices, challenges, and opportunities

DOE PAGES

Bligaard, Thomas; Bullock, R. Morris; Campbell, Charles T.; ...

2016-03-07

Benchmarking is a community-based and (preferably) community-driven activity involving consensus-based decisions on how to make reproducible, fair, and relevant assessments. In catalysis science, important catalyst performance metrics include activity, selectivity, and the deactivation profile, which enable comparisons between new and standard catalysts. Benchmarking also requires careful documentation, archiving, and sharing of methods and measurements, to ensure that the full value of research data can be realized. Beyond these goals, benchmarking presents unique opportunities to advance and accelerate understanding of complex reaction systems by combining and comparing experimental information from multiple, in situ and operando techniques with theoretical insights derived frommore » calculations characterizing model systems. This Perspective describes the origins and uses of benchmarking and its applications in computational catalysis, heterogeneous catalysis, molecular catalysis, and electrocatalysis. As a result, it also discusses opportunities and challenges for future developments in these fields.« less
Protein binding hot spots prediction from sequence only by a new ensemble learning method.

PubMed

Hu, Shan-Shan; Chen, Peng; Wang, Bing; Li, Jinyan

2017-10-01

Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set. http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm .

Wavenumber-extended high-order oscillation control finite volume schemes for multi-dimensional aeroacoustic computations

NASA Astrophysics Data System (ADS)

Kim, Sungtae; Lee, Soogab; Kim, Kyu Hong

2008-04-01

A new numerical method toward accurate and efficient aeroacoustic computations of multi-dimensional compressible flows has been developed. The core idea of the developed scheme is to unite the advantages of the wavenumber-extended optimized scheme and M-AUSMPW+/MLP schemes by predicting a physical distribution of flow variables more accurately in multi-space dimensions. The wavenumber-extended optimization procedure for the finite volume approach based on the conservative requirement is newly proposed for accuracy enhancement, which is required to capture the acoustic portion of the solution in the smooth region. Furthermore, the new distinguishing mechanism which is based on the Gibbs phenomenon in discontinuity, between continuous and discontinuous regions is introduced to eliminate the excessive numerical dissipation in the continuous region by the restricted application of MLP according to the decision of the distinguishing function. To investigate the effectiveness of the developed method, a sequence of benchmark simulations such as spherical wave propagation, nonlinear wave propagation, shock tube problem and vortex preservation test problem are executed. Also, throughout more realistic shock-vortex interaction and muzzle blast flow problems, the utility of the new method for aeroacoustic applications is verified by comparing with the previous numerical or experimental results.
Benchmarking health IT among OECD countries: better data for better policy

PubMed Central

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

Objective To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. Materials and methods The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. Results The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Discussion Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. Conclusions As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this. PMID:23721983
Benchmarking health IT among OECD countries: better data for better policy.

PubMed

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this.
Elementary School Students' Science Talk Ability in Inquiry-Oriented Settings in Taiwan: Test Development, Verification, and Performance Benchmarks

ERIC Educational Resources Information Center

Lin, Sheau-Wen; Liu, Yu; Chen, Shin-Feng; Wang, Jing-Ru; Kao, Huey-Lien

2016-01-01

The purpose of this study was to develop a computer-based measure of elementary students' science talk and to report students' benchmarks. The development procedure had three steps: defining the framework of the test, collecting and identifying key reference sets of science talk, and developing and verifying the science talk instrument. The…
Hybrid parallel code acceleration methods in full-core reactor physics calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Courau, T.; Plagne, L.; Ponicot, A.

2012-07-01

When dealing with nuclear reactor calculation schemes, the need for three dimensional (3D) transport-based reference solutions is essential for both validation and optimization purposes. Considering a benchmark problem, this work investigates the potential of discrete ordinates (Sn) transport methods applied to 3D pressurized water reactor (PWR) full-core calculations. First, the benchmark problem is described. It involves a pin-by-pin description of a 3D PWR first core, and uses a 8-group cross-section library prepared with the DRAGON cell code. Then, a convergence analysis is performed using the PENTRAN parallel Sn Cartesian code. It discusses the spatial refinement and the associated angular quadraturemore » required to properly describe the problem physics. It also shows that initializing the Sn solution with the EDF SPN solver COCAGNE reduces the number of iterations required to converge by nearly a factor of 6. Using a best estimate model, PENTRAN results are then compared to multigroup Monte Carlo results obtained with the MCNP5 code. Good consistency is observed between the two methods (Sn and Monte Carlo), with discrepancies that are less than 25 pcm for the k{sub eff}, and less than 2.1% and 1.6% for the flux at the pin-cell level and for the pin-power distribution, respectively. (authors)« less
Least-Squares Spectral Element Solutions to the CAA Workshop Benchmark Problems

NASA Technical Reports Server (NTRS)

Lin, Wen H.; Chan, Daniel C.

1997-01-01

This paper presents computed results for some of the CAA benchmark problems via the acoustic solver developed at Rocketdyne CFD Technology Center under the corporate agreement between Boeing North American, Inc. and NASA for the Aerospace Industry Technology Program. The calculations are considered as benchmark testing of the functionality, accuracy, and performance of the solver. Results of these computations demonstrate that the solver is capable of solving the propagation of aeroacoustic signals. Testing of sound generation and on more realistic problems is now pursued for the industrial applications of this solver. Numerical calculations were performed for the second problem of Category 1 of the current workshop problems for an acoustic pulse scattered from a rigid circular cylinder, and for two of the first CAA workshop problems, i. e., the first problem of Category 1 for the propagation of a linear wave and the first problem of Category 4 for an acoustic pulse reflected from a rigid wall in a uniform flow of Mach 0.5. The aim for including the last two problems in this workshop is to test the effectiveness of some boundary conditions set up in the solver. Numerical results of the last two benchmark problems have been compared with their corresponding exact solutions and the comparisons are excellent. This demonstrates the high fidelity of the solver in handling wave propagation problems. This feature lends the method quite attractive in developing a computational acoustic solver for calculating the aero/hydrodynamic noise in a violent flow environment.
Sensitivity and Uncertainty Analysis of the GFR MOX Fuel Subassembly

NASA Astrophysics Data System (ADS)

Lüley, J.; Vrban, B.; Čerba, Š.; Haščík, J.; Nečas, V.; Pelloni, S.

2014-04-01

We performed sensitivity and uncertainty analysis as well as benchmark similarity assessment of the MOX fuel subassembly designed for the Gas-Cooled Fast Reactor (GFR) as a representative material of the core. Material composition was defined for each assembly ring separately allowing us to decompose the sensitivities not only for isotopes and reactions but also for spatial regions. This approach was confirmed by direct perturbation calculations for chosen materials and isotopes. Similarity assessment identified only ten partly comparable benchmark experiments that can be utilized in the field of GFR development. Based on the determined uncertainties, we also identified main contributors to the calculation bias.
Cholesky-decomposed density MP2 with density fitting: Accurate MP2 and double-hybrid DFT energies for large systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maurer, Simon A.; Clin, Lucien; Ochsenfeld, Christian, E-mail: christian.ochsenfeld@uni-muenchen.de

2014-06-14

Our recently developed QQR-type integral screening is introduced in our Cholesky-decomposed pseudo-densities Møller-Plesset perturbation theory of second order (CDD-MP2) method. We use the resolution-of-the-identity (RI) approximation in combination with efficient integral transformations employing sparse matrix multiplications. The RI-CDD-MP2 method shows an asymptotic cubic scaling behavior with system size and a small prefactor that results in an early crossover to conventional methods for both small and large basis sets. We also explore the use of local fitting approximations which allow to further reduce the scaling behavior for very large systems. The reliability of our method is demonstrated on test sets formore » interaction and reaction energies of medium sized systems and on a diverse selection from our own benchmark set for total energies of larger systems. Timings on DNA systems show that fast calculations for systems with more than 500 atoms are feasible using a single processor core. Parallelization extends the range of accessible system sizes on one computing node with multiple cores to more than 1000 atoms in a double-zeta basis and more than 500 atoms in a triple-zeta basis.« less
EqualWrites: Reducing Intra-set Write Variations for Enhancing Lifetime of Non-volatile Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S.

Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased and hence, the researchers have explored non-volatile memories (NVMs) which provide high density and consume low-leakage power. Since NVMs have low write-endurance and the existing cache management policies are write variation-unaware, effective wear-leveling techniques are required for achieving reasonable cache lifetimes using NVMs. We present EqualWrites, a technique for mitigating intra-set write variation. In this paper, our technique works by recording the number of writes on a block and changing the cache-block location of a hot data-item to redirect themore » future writes to a cold block to achieve wear-leveling. Simulation experiments have been performed using an x86-64 simulator and benchmarks from SPEC06 and HPC (high-performance computing) field. The results show that for single, dual and quad-core system configurations, EqualWrites improves cache lifetime by 6.31X, 8.74X and 10.54X, respectively. In addition, its implementation overhead is very small and it provides larger improvement in lifetime than three other intra-set wear-leveling techniques and a cache replacement policy.« less
EqualWrites: Reducing Intra-set Write Variations for Enhancing Lifetime of Non-volatile Caches

DOE PAGES

Mittal, Sparsh; Vetter, Jeffrey S.

2015-01-29

Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased and hence, the researchers have explored non-volatile memories (NVMs) which provide high density and consume low-leakage power. Since NVMs have low write-endurance and the existing cache management policies are write variation-unaware, effective wear-leveling techniques are required for achieving reasonable cache lifetimes using NVMs. We present EqualWrites, a technique for mitigating intra-set write variation. In this paper, our technique works by recording the number of writes on a block and changing the cache-block location of a hot data-item to redirect themore » future writes to a cold block to achieve wear-leveling. Simulation experiments have been performed using an x86-64 simulator and benchmarks from SPEC06 and HPC (high-performance computing) field. The results show that for single, dual and quad-core system configurations, EqualWrites improves cache lifetime by 6.31X, 8.74X and 10.54X, respectively. In addition, its implementation overhead is very small and it provides larger improvement in lifetime than three other intra-set wear-leveling techniques and a cache replacement policy.« less
Xenon-induced power oscillations in a generic small modular reactor

NASA Astrophysics Data System (ADS)

Kitcher, Evans Damenortey

As world demand for energy continues to grow at unprecedented rates, the world energy portfolio of the future will inevitably include a nuclear energy contribution. It has been suggested that the Small Modular Reactor (SMR) could play a significant role in the spread of civilian nuclear technology to nations previously without nuclear energy. As part of the design process, the SMR design must be assessed for the threat to operations posed by xenon-induced power oscillations. In this research, a generic SMR design was analyzed with respect to just such a threat. In order to do so, a multi-physics coupling routine was developed with MCNP/MCNPX as the neutronics solver. Thermal hydraulic assessments were performed using a single channel analysis tool developed in Python. Fuel and coolant temperature profiles were implemented in the form of temperature dependent fuel cross sections generated using the SIGACE code and reactor core coolant densities. The Power Axial Offset (PAO) and Xenon Axial Offset (XAO) parameters were chosen to quantify any oscillatory behavior observed. The methodology was benchmarked against results from literature of startup tests performed at a four-loop PWR in Korea. The developed benchmark model replicated the pertinent features of the reactor within ten percent of the literature values. The results of the benchmark demonstrated that the developed methodology captured the desired phenomena accurately. Subsequently, a high fidelity SMR core model was developed and assessed. Results of the analysis revealed an inherently stable SMR design at beginning of core life and end of core life under full-power and half-power conditions. The effect of axial discretization, stochastic noise and convergence of the Monte Carlo tallies in the calculations of the PAO and XAO parameters was investigated. All were found to be quite small and the inherently stable nature of the core design with respect to xenon-induced power oscillations was confirmed. Finally, a preliminary investigation into excess reactivity control options for the SMR design was conducted confirming the generally held notion that existing PWR control mechanisms can be used in iPWR SMRs with similar effectiveness. With the desire to operate the SMR under the boron free coolant condition, erbium oxide fuel integral burnable absorber rods were identified as a possible means to retain the dispersed absorber effect of soluble boron in the reactor coolant in replacement.
The Management Development Program: A Competency-Based Model for Preparing Hospitality Leaders.

ERIC Educational Resources Information Center

Brownell, Judi; Chung, Beth G.

2001-01-01

The master of management program at Cornell University focused on competency-based development of skills for the hospitality industry through core courses, minicourses, skill benchmarking, and continuous improvement. Benefits include a shift in the teacher role to advocate/coach, increased information sharing, student satisfaction, and clear…
The Cognitive Science behind the Common Core

ERIC Educational Resources Information Center

Marchitello, Max; Wilhelm, Megan

2014-01-01

Raising academic standards has been part of the education policy discourse for decades. As early as the 1990s, states and school districts attempted to raise student achievement by developing higher standards and measuring student progress according to more rigorous benchmarks. However, the caliber of the standards--and their assessments--varied…
The Effect of a High School Financial Literacy Course on Student Financial Knowledge

ERIC Educational Resources Information Center

McCann, Karen L.

2010-01-01

New Jersey school districts establish curriculums to meet the proficiencies found in the New Jersey Core Curriculum Content Standards (NJCCCS). The research focuses on the effectiveness of the Washington Township High School Career and Technology Education Department's curriculum in addressing the NJCCS Financial Literacy benchmarks. The…
An Application-Based Performance Characterization of the Columbia Supercluster

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Djomehri, Jahed M.; Hood, Robert; Jin, Hoaqiang; Kiris, Cetin; Saini, Subhash

2005-01-01

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors.
GPU-accelerated algorithms for many-particle continuous-time quantum walks

NASA Astrophysics Data System (ADS)

Piccinini, Enrico; Benedetti, Claudia; Siloi, Ilaria; Paris, Matteo G. A.; Bordone, Paolo

2017-06-01

Many-particle continuous-time quantum walks (CTQWs) represent a resource for several tasks in quantum technology, including quantum search algorithms and universal quantum computation. In order to design and implement CTQWs in a realistic scenario, one needs effective simulation tools for Hamiltonians that take into account static noise and fluctuations in the lattice, i.e. Hamiltonians containing stochastic terms. To this aim, we suggest a parallel algorithm based on the Taylor series expansion of the evolution operator, and compare its performances with those of algorithms based on the exact diagonalization of the Hamiltonian or a 4th order Runge-Kutta integration. We prove that both Taylor-series expansion and Runge-Kutta algorithms are reliable and have a low computational cost, the Taylor-series expansion showing the additional advantage of a memory allocation not depending on the precision of calculation. Both algorithms are also highly parallelizable within the SIMT paradigm, and are thus suitable for GPGPU computing. In turn, we have benchmarked 4 NVIDIA GPUs and 3 quad-core Intel CPUs for a 2-particle system over lattices of increasing dimension, showing that the speedup provided by GPU computing, with respect to the OPENMP parallelization, lies in the range between 8x and (more than) 20x, depending on the frequency of post-processing. GPU-accelerated codes thus allow one to overcome concerns about the execution time, and make it possible simulations with many interacting particles on large lattices, with the only limit of the memory available on the device.
Benchmark for Peak Detection Algorithms in Fiber Bragg Grating Interrogation and a New Neural Network for its Performance Improvement

PubMed Central

Negri, Lucas; Nied, Ademir; Kalinowski, Hypolito; Paterno, Aleksander

2011-01-01

This paper presents a benchmark for peak detection algorithms employed in fiber Bragg grating spectrometric interrogation systems. The accuracy, precision, and computational performance of currently used algorithms and those of a new proposed artificial neural network algorithm are compared. Centroid and gaussian fitting algorithms are shown to have the highest precision but produce systematic errors that depend on the FBG refractive index modulation profile. The proposed neural network displays relatively good precision with reduced systematic errors and improved computational performance when compared to other networks. Additionally, suitable algorithms may be chosen with the general guidelines presented. PMID:22163806
DOE Office of Scientific and Technical Information (OSTI.GOV)

Faillace, E.R.; Cheng, J.J.; Yu, C.

A series of benchmarking runs were conducted so that results obtained with the RESRAD code could be compared against those obtained with six pathway analysis models used to determine the radiation dose to an individual living on a radiologically contaminated site. The RESRAD computer code was benchmarked against five other computer codes - GENII-S, GENII, DECOM, PRESTO-EPA-CPG, and PATHRAE-EPA - and the uncodified methodology presented in the NUREG/CR-5512 report. Estimated doses for the external gamma pathway; the dust inhalation pathway; and the soil, food, and water ingestion pathways were calculated for each methodology by matching, to the extent possible, inputmore » parameters such as occupancy, shielding, and consumption factors.« less
A Comparison of Automatic Parallelization Tools/Compilers on the SGI Origin 2000 Using the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry

1998-01-01

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.
A Benchmark Dataset for SSVEP-Based Brain-Computer Interfaces.

PubMed

Wang, Yijun; Chen, Xiaogang; Gao, Xiaorong; Gao, Shangkai

2017-10-01

This paper presents a benchmark steady-state visual evoked potential (SSVEP) dataset acquired with a 40-target brain- computer interface (BCI) speller. The dataset consists of 64-channel Electroencephalogram (EEG) data from 35 healthy subjects (8 experienced and 27 naïve) while they performed a cue-guided target selecting task. The virtual keyboard of the speller was composed of 40 visual flickers, which were coded using a joint frequency and phase modulation (JFPM) approach. The stimulation frequencies ranged from 8 Hz to 15.8 Hz with an interval of 0.2 Hz. The phase difference between two adjacent frequencies was . For each subject, the data included six blocks of 40 trials corresponding to all 40 flickers indicated by a visual cue in a random order. The stimulation duration in each trial was five seconds. The dataset can be used as a benchmark dataset to compare the methods for stimulus coding and target identification in SSVEP-based BCIs. Through offline simulation, the dataset can be used to design new system diagrams and evaluate their BCI performance without collecting any new data. The dataset also provides high-quality data for computational modeling of SSVEPs. The dataset is freely available fromhttp://bci.med.tsinghua.edu.cn/download.html.

Theory comparison and numerical benchmarking on neoclassical toroidal viscosity torque

NASA Astrophysics Data System (ADS)

Wang, Zhirui; Park, Jong-Kyu; Liu, Yueqiang; Logan, Nikolas; Kim, Kimin; Menard, Jonathan E.

2014-04-01

Systematic comparison and numerical benchmarking have been successfully carried out among three different approaches of neoclassical toroidal viscosity (NTV) theory and the corresponding codes: IPEC-PENT is developed based on the combined NTV theory but without geometric simplifications [Park et al., Phys. Rev. Lett. 102, 065002 (2009)]; MARS-Q includes smoothly connected NTV formula [Shaing et al., Nucl. Fusion 50, 025022 (2010)] based on Shaing's analytic formulation in various collisionality regimes; MARS-K, originally computing the drift kinetic energy, is upgraded to compute the NTV torque based on the equivalence between drift kinetic energy and NTV torque [J.-K. Park, Phys. Plasma 18, 110702 (2011)]. The derivation and numerical results both indicate that the imaginary part of drift kinetic energy computed by MARS-K is equivalent to the NTV torque in IPEC-PENT. In the benchmark of precession resonance between MARS-Q and MARS-K/IPEC-PENT, the agreement and correlation between the connected NTV formula and the combined NTV theory in different collisionality regimes are shown for the first time. Additionally, both IPEC-PENT and MARS-K indicate the importance of the bounce harmonic resonance which can greatly enhance the NTV torque when E ×B drift frequency reaches the bounce resonance condition.
a Dosimetry Assessment for the Core Restraint of AN Advanced Gas Cooled Reactor

NASA Astrophysics Data System (ADS)

Thornton, D. A.; Allen, D. A.; Tyrrell, R. J.; Meese, T. C.; Huggon, A. P.; Whiley, G. S.; Mossop, J. R.

2009-08-01

This paper describes calculations of neutron damage rates within the core restraint structures of Advanced Gas Cooled Reactors (AGRs). Using advanced features of the Monte Carlo radiation transport code MCBEND, and neutron source data from core follow calculations performed with the reactor physics code PANTHER, a detailed model of the reactor cores of two of British Energy's AGR power plants has been developed for this purpose. Because there are no relevant neutron fluence measurements directly supporting this assessment, results of benchmark comparisons and successful validation of MCBEND for Magnox reactors have been used to estimate systematic and random uncertainties on the predictions. In particular, it has been necessary to address the known under-prediction of lower energy fast neutron responses associated with the penetration of large thicknesses of graphite.
Deterministic Modeling of the High Temperature Test Reactor with DRAGON-HEXPEDITE

DOE Office of Scientific and Technical Information (OSTI.GOV)

J. Ortensi; M.A. Pope; R.M. Ferrer

2010-10-01

The Idaho National Laboratory (INL) is tasked with the development of reactor physics analysis capability of the Next Generation Nuclear Power (NGNP) project. In order to examine the INL’s current prismatic reactor analysis tools, the project is conducting a benchmark exercise based on modeling the High Temperature Test Reactor (HTTR). This exercise entails the development of a model for the initial criticality, a 19 fuel column thin annular core, and the fully loaded core critical condition with 30 fuel columns. Special emphasis is devoted to physical phenomena and artifacts in HTTR that are similar to phenomena and artifacts in themore » NGNP base design. The DRAGON code is used in this study since it offers significant ease and versatility in modeling prismatic designs. DRAGON can generate transport solutions via Collision Probability (CP), Method of Characteristics (MOC) and Discrete Ordinates (Sn). A fine group cross-section library based on the SHEM 281 energy structure is used in the DRAGON calculations. The results from this study show reasonable agreement in the calculation of the core multiplication factor with the MC methods, but a consistent bias of 2–3% with the experimental values is obtained. This systematic error has also been observed in other HTTR benchmark efforts and is well documented in the literature. The ENDF/B VII graphite and U235 cross sections appear to be the main source of the error. The isothermal temperature coefficients calculated with the fully loaded core configuration agree well with other benchmark participants but are 40% higher than the experimental values. This discrepancy with the measurement partially stems from the fact that during the experiments the control rods were adjusted to maintain criticality, whereas in the model, the rod positions were fixed. In addition, this work includes a brief study of a cross section generation approach that seeks to decouple the domain in order to account for neighbor effects. This spectral interpenetration is a dominant effect in annular HTR physics. This analysis methodology should be further explored in order to reduce the error that is systematically propagated in the traditional generation of cross sections.« less
Constructing Neuronal Network Models in Massively Parallel Environments.

PubMed

Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus

2017-01-01

Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments

PubMed Central

Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus

2017-01-01

Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
HyspIRI Low Latency Concept and Benchmarks

NASA Technical Reports Server (NTRS)

Mandl, Dan

2010-01-01

Topics include HyspIRI low latency data ops concept, HyspIRI data flow, ongoing efforts, experiment with Web Coverage Processing Service (WCPS) approach to injecting new algorithms into SensorWeb, low fidelity HyspIRI IPM testbed, compute cloud testbed, open cloud testbed environment, Global Lambda Integrated Facility (GLIF) and OCC collaboration with Starlight, delay tolerant network (DTN) protocol benchmarking, and EO-1 configuration for preliminary DTN prototype.
PFLOTRAN Verification: Development of a Testing Suite to Ensure Software Quality

NASA Astrophysics Data System (ADS)

Hammond, G. E.; Frederick, J. M.

2016-12-01

In scientific computing, code verification ensures the reliability and numerical accuracy of a model simulation by comparing the simulation results to experimental data or known analytical solutions. The model is typically defined by a set of partial differential equations with initial and boundary conditions, and verification ensures whether the mathematical model is solved correctly by the software. Code verification is especially important if the software is used to model high-consequence systems which cannot be physically tested in a fully representative environment [Oberkampf and Trucano (2007)]. Justified confidence in a particular computational tool requires clarity in the exercised physics and transparency in its verification process with proper documentation. We present a quality assurance (QA) testing suite developed by Sandia National Laboratories that performs code verification for PFLOTRAN, an open source, massively-parallel subsurface simulator. PFLOTRAN solves systems of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport processes in porous media. PFLOTRAN's QA test suite compares the numerical solutions of benchmark problems in heat and mass transport against known, closed-form, analytical solutions, including documentation of the exercised physical process models implemented in each PFLOTRAN benchmark simulation. The QA test suite development strives to follow the recommendations given by Oberkampf and Trucano (2007), which describes four essential elements in high-quality verification benchmark construction: (1) conceptual description, (2) mathematical description, (3) accuracy assessment, and (4) additional documentation and user information. Several QA tests within the suite will be presented, including details of the benchmark problems and their closed-form analytical solutions, implementation of benchmark problems in PFLOTRAN simulations, and the criteria used to assess PFLOTRAN's performance in the code verification procedure. References Oberkampf, W. L., and T. G. Trucano (2007), Verification and Validation Benchmarks, SAND2007-0853, 67 pgs., Sandia National Laboratories, Albuquerque, NM.
Methodology of full-core Monte Carlo calculations with leakage parameter evaluations for benchmark critical experiment analysis

NASA Astrophysics Data System (ADS)

Sboev, A. G.; Ilyashenko, A. S.; Vetrova, O. A.

1997-02-01

The method of bucking evaluation, realized in the MOnte Carlo code MCS, is described. This method was applied for calculational analysis of well known light water experiments TRX-1 and TRX-2. The analysis of this comparison shows, that there is no coincidence between Monte Carlo calculations, obtained by different ways: the MCS calculations with given experimental bucklings; the MCS calculations with given bucklings evaluated on base of full core MCS direct simulations; the full core MCNP and MCS direct simulations; the MCNP and MCS calculations, where the results of cell calculations are corrected by the coefficients taking into the account the leakage from the core. Also the buckling values evaluated by full core MCS calculations have differed from experimental ones, especially in the case of TRX-1, when this difference has corresponded to 0.5 percent increase of Keff value.
Computational Modeling of Micro-Crack Induced Attenuation in CFRP Composites

NASA Technical Reports Server (NTRS)

Roberts, R. A.; Leckey, C. A. C.

2012-01-01

A computational study is performed to determine the contribution to ultrasound attenuation in carbon fiber reinforced polymer composite laminates of linear elastic scattering by matrix micro-cracking. Multiple scattering approximations are benchmarked against exact computational approaches. Results support linear scattering as the source of observed increased attenuation in the presence of micro-cracking.
Radiation Detection Computational Benchmark Scenarios

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shaver, Mark W.; Casella, Andrew M.; Wittman, Richard S.

2013-09-24

Modeling forms an important component of radiation detection development, allowing for testing of new detector designs, evaluation of existing equipment against a wide variety of potential threat sources, and assessing operation performance of radiation detection systems. This can, however, result in large and complex scenarios which are time consuming to model. A variety of approaches to radiation transport modeling exist with complementary strengths and weaknesses for different problems. This variety of approaches, and the development of promising new tools (such as ORNL’s ADVANTG) which combine benefits of multiple approaches, illustrates the need for a means of evaluating or comparing differentmore » techniques for radiation detection problems. This report presents a set of 9 benchmark problems for comparing different types of radiation transport calculations, identifying appropriate tools for classes of problems, and testing and guiding the development of new methods. The benchmarks were drawn primarily from existing or previous calculations with a preference for scenarios which include experimental data, or otherwise have results with a high level of confidence, are non-sensitive, and represent problem sets of interest to NA-22. From a technical perspective, the benchmarks were chosen to span a range of difficulty and to include gamma transport, neutron transport, or both and represent different important physical processes and a range of sensitivity to angular or energy fidelity. Following benchmark identification, existing information about geometry, measurements, and previous calculations were assembled. Monte Carlo results (MCNP decks) were reviewed or created and re-run in order to attain accurate computational times and to verify agreement with experimental data, when present. Benchmark information was then conveyed to ORNL in order to guide testing and development of hybrid calculations. The results of those ADVANTG calculations were then sent to PNNL for compilation. This is a report describing the details of the selected Benchmarks and results from various transport codes.« less
U.S. IOOS coastal and ocean modeling testbed: Inter-model evaluation of tides, waves, and hurricane surge in the Gulf of Mexico

NASA Astrophysics Data System (ADS)

Kerr, P. C.; Donahue, A. S.; Westerink, J. J.; Luettich, R. A.; Zheng, L. Y.; Weisberg, R. H.; Huang, Y.; Wang, H. V.; Teng, Y.; Forrest, D. R.; Roland, A.; Haase, A. T.; Kramer, A. W.; Taylor, A. A.; Rhome, J. R.; Feyen, J. C.; Signell, R. P.; Hanson, J. L.; Hope, M. E.; Estes, R. M.; Dominguez, R. A.; Dunbar, R. P.; Semeraro, L. N.; Westerink, H. J.; Kennedy, A. B.; Smith, J. M.; Powell, M. D.; Cardone, V. J.; Cox, A. T.

2013-10-01

A Gulf of Mexico performance evaluation and comparison of coastal circulation and wave models was executed through harmonic analyses of tidal simulations, hindcasts of Hurricane Ike (2008) and Rita (2005), and a benchmarking study. Three unstructured coastal circulation models (ADCIRC, FVCOM, and SELFE) validated with similar skill on a new common Gulf scale mesh (ULLR) with identical frictional parameterization and forcing for the tidal validation and hurricane hindcasts. Coupled circulation and wave models, SWAN+ADCIRC and WWMII+SELFE, along with FVCOM loosely coupled with SWAN, also validated with similar skill. NOAA's official operational forecast storm surge model (SLOSH) was implemented on local and Gulf scale meshes with the same wind stress and pressure forcing used by the unstructured models for hindcasts of Ike and Rita. SLOSH's local meshes failed to capture regional processes such as Ike's forerunner and the results from the Gulf scale mesh further suggest shortcomings may be due to a combination of poor mesh resolution, missing internal physics such as tides and nonlinear advection, and SLOSH's internal frictional parameterization. In addition, these models were benchmarked to assess and compare execution speed and scalability for a prototypical operational simulation. It was apparent that a higher number of computational cores are needed for the unstructured models to meet similar operational implementation requirements to SLOSH, and that some of them could benefit from improved parallelization and faster execution speed.
OECD/NEA expert group on uncertainty analysis for criticality safety assessment: Results of benchmark on sensitivity calculation (phase III)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ivanova, T.; Laville, C.; Dyrda, J.

2012-07-01

The sensitivities of the k{sub eff} eigenvalue to neutron cross sections have become commonly used in similarity studies and as part of the validation algorithm for criticality safety assessments. To test calculations of the sensitivity coefficients, a benchmark study (Phase III) has been established by the OECD-NEA/WPNCS/EG UACSA (Expert Group on Uncertainty Analysis for Criticality Safety Assessment). This paper presents some sensitivity results generated by the benchmark participants using various computational tools based upon different computational methods: SCALE/TSUNAMI-3D and -1D, MONK, APOLLO2-MORET 5, DRAGON-SUSD3D and MMKKENO. The study demonstrates the performance of the tools. It also illustrates how model simplificationsmore » impact the sensitivity results and demonstrates the importance of 'implicit' (self-shielding) sensitivities. This work has been a useful step towards verification of the existing and developed sensitivity analysis methods. (authors)« less
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

PubMed

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Nuclear Data Needs for Generation IV Nuclear Energy Systems

NASA Astrophysics Data System (ADS)

Rullhusen, Peter

2006-04-01

Nuclear data needs for generation IV systems. Future of nuclear energy and the role of nuclear data / P. Finck. Nuclear data needs for generation IV nuclear energy systems-summary of U.S. workshop / T. A. Taiwo, H. S. Khalil. Nuclear data needs for the assessment of gen. IV systems / G. Rimpault. Nuclear data needs for generation IV-lessons from benchmarks / S. C. van der Marck, A. Hogenbirk, M. C. Duijvestijn. Core design issues of the supercritical water fast reactor / M. Mori ... [et al.]. GFR core neutronics studies at CEA / J. C. Bosq ... [et al]. Comparative study on different phonon frequency spectra of graphite in GCR / Young-Sik Cho ... [et al.]. Innovative fuel types for minor actinides transmutation / D. Haas, A. Fernandez, J. Somers. The importance of nuclear data in modeling and designing generation IV fast reactors / K. D. Weaver. The GIF and Mexico-"everything is possible" / C. Arrenondo Sánchez -- Benmarks, sensitivity calculations, uncertainties. Sensitivity of advanced reactor and fuel cycle performance parameters to nuclear data uncertainties / G. Aliberti ... [et al.]. Sensitivity and uncertainty study for thermal molten salt reactors / A. Biduad ... [et al.]. Integral reactor physics benchmarks- The International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPHEP) / J. B. Briggs, D. W. Nigg, E. Sartori. Computer model of an error propagation through micro-campaign of fast neutron gas cooled nuclear reactor / E. Ivanov. Combining differential and integral experiments on [symbol] for reducing uncertainties in nuclear data applications / T. Kawano ... [et al.]. Sensitivity of activation cross sections of the Hafnium, Tanatalum and Tungsten stable isotopes to nuclear reaction mechanisms / V. Avrigeanu ... [et al.]. Generating covariance data with nuclear models / A. J. Koning. Sensitivity of Candu-SCWR reactors physics calculations to nuclear data files / K. S. Kozier, G. R. Dyck. The lead cooled fast reactor benchmark BREST-300: analysis with sensitivity method / V. Smirnov ... [et al.]. Sensitivity analysis of neutron cross-sections considered for design and safety studies of LFR and SFR generation IV systems / K. Tucek, J. Carlsson, H. Wider -- Experiments. INL capabilities for nuclear data measurements using the Argonne intense pulsed neutron source facility / J. D. Cole ... [et al.]. Cross-section measurements in the fast neutron energy range / A. Plompen. Recent measurements of neutron capture cross sections for minor actinides by a JNC and Kyoto University Group / H. Harada ... [et al.]. Determination of minor actinides fission cross sections by means of transfer reactions / M. Aiche ... [et al.] -- Evaluated data libraries. Nuclear data services from the NEA / H. Henriksson, Y. Rugama. Nuclear databases for energy applications: an IAEA perspective / R. Capote Noy, A. L. Nichols, A. Trkov. Nuclear data evaluation for generation IV / G. Noguère ... [et al.]. Improved evaluations of neutron-induced reactions on americium isotopes / P. Talou ... [et al.]. Using improved ENDF-based nuclear data for candu reactor calculations / J. Prodea. A comparative study on the graphite-moderated reactors using different evaluated nuclear data / Do Heon Kim ... [et al.].
Software platform virtualization in chemistry research and university teaching

PubMed Central

2009-01-01

Background Modern chemistry laboratories operate with a wide range of software applications under different operating systems, such as Windows, LINUX or Mac OS X. Instead of installing software on different computers it is possible to install those applications on a single computer using Virtual Machine software. Software platform virtualization allows a single guest operating system to execute multiple other operating systems on the same computer. We apply and discuss the use of virtual machines in chemistry research and teaching laboratories. Results Virtual machines are commonly used for cheminformatics software development and testing. Benchmarking multiple chemistry software packages we have confirmed that the computational speed penalty for using virtual machines is low and around 5% to 10%. Software virtualization in a teaching environment allows faster deployment and easy use of commercial and open source software in hands-on computer teaching labs. Conclusion Software virtualization in chemistry, mass spectrometry and cheminformatics is needed for software testing and development of software for different operating systems. In order to obtain maximum performance the virtualization software should be multi-core enabled and allow the use of multiprocessor configurations in the virtual machine environment. Server consolidation, by running multiple tasks and operating systems on a single physical machine, can lead to lower maintenance and hardware costs especially in small research labs. The use of virtual machines can prevent software virus infections and security breaches when used as a sandbox system for internet access and software testing. Complex software setups can be created with virtual machines and are easily deployed later to multiple computers for hands-on teaching classes. We discuss the popularity of bioinformatics compared to cheminformatics as well as the missing cheminformatics education at universities worldwide. PMID:20150997
Software platform virtualization in chemistry research and university teaching.

PubMed

Kind, Tobias; Leamy, Tim; Leary, Julie A; Fiehn, Oliver

2009-11-16

Modern chemistry laboratories operate with a wide range of software applications under different operating systems, such as Windows, LINUX or Mac OS X. Instead of installing software on different computers it is possible to install those applications on a single computer using Virtual Machine software. Software platform virtualization allows a single guest operating system to execute multiple other operating systems on the same computer. We apply and discuss the use of virtual machines in chemistry research and teaching laboratories. Virtual machines are commonly used for cheminformatics software development and testing. Benchmarking multiple chemistry software packages we have confirmed that the computational speed penalty for using virtual machines is low and around 5% to 10%. Software virtualization in a teaching environment allows faster deployment and easy use of commercial and open source software in hands-on computer teaching labs. Software virtualization in chemistry, mass spectrometry and cheminformatics is needed for software testing and development of software for different operating systems. In order to obtain maximum performance the virtualization software should be multi-core enabled and allow the use of multiprocessor configurations in the virtual machine environment. Server consolidation, by running multiple tasks and operating systems on a single physical machine, can lead to lower maintenance and hardware costs especially in small research labs. The use of virtual machines can prevent software virus infections and security breaches when used as a sandbox system for internet access and software testing. Complex software setups can be created with virtual machines and are easily deployed later to multiple computers for hands-on teaching classes. We discuss the popularity of bioinformatics compared to cheminformatics as well as the missing cheminformatics education at universities worldwide.
Core-shell Au-Pd nanoparticles as cathode catalysts for microbial fuel cell applications

PubMed Central

Yang, Gaixiu; Chen, Dong; Lv, Pengmei; Kong, Xiaoying; Sun, Yongming; Wang, Zhongming; Yuan, Zhenhong; Liu, Hui; Yang, Jun

2016-01-01

Bimetallic nanoparticles with core-shell structures usually display enhanced catalytic properties due to the lattice strain created between the core and shell regions. In this study, we demonstrate the application of bimetallic Au-Pd nanoparticles with an Au core and a thin Pd shell as cathode catalysts in microbial fuel cells, which represent a promising technology for wastewater treatment, while directly generating electrical energy. In specific, in comparison with the hollow structured Pt nanoparticles, a benchmark for the electrocatalysis, the bimetallic core-shell Au-Pd nanoparticles are found to have superior activity and stability for oxygen reduction reaction in a neutral condition due to the strong electronic interaction and lattice strain effect between the Au core and the Pd shell domains. The maximum power density generated in a membraneless single-chamber microbial fuel cell running on wastewater with core-shell Au-Pd as cathode catalysts is ca. 16.0 W m−3 and remains stable over 150 days, clearly illustrating the potential of core-shell nanostructures in the applications of microbial fuel cells. PMID:27734945
Highly Productive Application Development with ViennaCL for Accelerators

NASA Astrophysics Data System (ADS)

Rupp, K.; Weinbub, J.; Rudolf, F.

2012-12-01

The use of graphics processing units (GPUs) for the acceleration of general purpose computations has become very attractive over the last years, and accelerators based on many integrated CPU cores are about to hit the market. However, there are discussions about the benefit of GPU computing when comparing the reduction of execution times with the increased development effort [1]. To counter these concerns, our open-source linear algebra library ViennaCL [2,3] uses modern programming techniques such as generic programming in order to provide a convenient access layer for accelerator and GPU computing. Other GPU-accelerated libraries are primarily tuned for performance, but less tailored to productivity and portability: MAGMA [4] provides dense linear algebra operations via a LAPACK-comparable interface, but no dedicated matrix and vector types. Cusp [5] is closest in functionality to ViennaCL for sparse matrices, but is based on CUDA and thus restricted to devices from NVIDIA. However, no convenience layer for dense linear algebra is provided with Cusp. ViennaCL is written in C++ and uses OpenCL to access the resources of accelerators, GPUs and multi-core CPUs in a unified way. On the one hand, the library provides iterative solvers from the family of Krylov methods, including various preconditioners, for the solution of linear systems typically obtained from the discretization of partial differential equations. On the other hand, dense linear algebra operations are supported, including algorithms such as QR factorization and singular value decomposition. The user application interface of ViennaCL is compatible to uBLAS [6], which is part of the peer-reviewed Boost C++ libraries [7]. This allows to port existing applications based on uBLAS with a minimum of effort to ViennaCL. Conversely, the interface compatibility allows to use the iterative solvers from ViennaCL with uBLAS types directly, thus enabling code reuse beyond CPU-GPU boundaries. Out-of-the-box support for types from the Eigen library [8] and MTL 4 [9] are provided as well, enabling a seamless transition from single-core CPU to GPU and multi-core CPU computations. Case studies from the numerical solution of PDEs are given and isolated performance benchmarks are discussed. Also, pitfalls in scientific computing with GPUs and accelerators are addressed, allowing for a first evaluation of whether these novel devices can be mapped well to certain applications. References: [1] R. Bordawekar et al., Technical Report, IBM, 2010 [2] ViennaCL library. Online: http://viennacl.sourceforge.net/ [3] K. Rupp et al., GPUScA, 2010 [4] MAGMA library. Online: http://icl.cs.utk.edu/magma/ [5] Cusp library. Online: http://code.google.com/p/cusp-library/ [6] uBLAS library. Online: http://www.boost.org/libs/numeric/ublas/ [7] Boost C++ Libraries. Online: http://www.boost.org/ [8] Eigen library. Online: http://eigen.tuxfamily.org/ [9] MTL 4 Library. Online: http://www.mtl4.org/
Scale-4 Analysis of Pressurized Water Reactor Critical Configurations: Volume 2-Sequoyah Unit 2 Cycle 3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowman, S.M.

1995-01-01

The requirements of ANSI/ANS 8.1 specify that calculational methods for away-from-reactor criticality safety analyses be validated against experimental measurements. If credit for the negative reactivity of the depleted (or spent) fuel isotopics is desired, it is necessary to benchmark computational methods against spent fuel critical configurations. This report summarizes a portion of the ongoing effort to benchmark away-from-reactor criticality analysis methods using critical configurations from commercial pressurized-water reactors. The analysis methodology selected for all the calculations reported herein is based on the codes and data provided in the SCALE-4 code system. The isotopic densities for the spent fuel assemblies inmore » the critical configurations were calculated using the SAS2H analytical sequence of the SCALE-4 system. The sources of data and the procedures for deriving SAS2H input parameters are described in detail. The SNIKR code module was used to extract the necessary isotopic densities from the SAS2H results and provide the data in the format required by the SCALE criticality analysis modules. The CSASN analytical sequence in SCALE-4 was used to perform resonance processing of the cross sections. The KENO V.a module of SCALE-4 was used to calculate the effective multiplication factor (k{sub eff}) of each case. The SCALE-4 27-group burnup library containing ENDF/B-IV (actinides) and ENDF/B-V (fission products) data was used for all the calculations. This volume of the report documents the SCALE system analysis of three reactor critical configurations for the Sequoyah Unit 2 Cycle 3. This unit and cycle were chosen because of the relevance in spent fuel benchmark applications: (1) the unit had a significantly long downtime of 2.7 years during the middle of cycle (MOC) 3, and (2) the core consisted entirely of burned fuel at the MOC restart. The first benchmark critical calculation was the MOC restart at hot, full-power (HFP) critical conditions. The other two benchmark critical calculations were the beginning-of-cycle (BOC) startup at both hot, zero-power (HZP) and HFP critical conditions. These latter calculations were used to check for consistency in the calculated results for different burnups and downtimes. The k{sub eff} results were in the range of 1.00014 to 1.00259 with a standard deviation of less than 0.001.« less
ICASE/LaRC Workshop on Benchmark Problems in Computational Aeroacoustics (CAA)

NASA Technical Reports Server (NTRS)

Hardin, Jay C. (Editor); Ristorcelli, J. Ray (Editor); Tam, Christopher K. W. (Editor)

1995-01-01

The proceedings of the Benchmark Problems in Computational Aeroacoustics Workshop held at NASA Langley Research Center are the subject of this report. The purpose of the Workshop was to assess the utility of a number of numerical schemes in the context of the unusual requirements of aeroacoustical calculations. The schemes were assessed from the viewpoint of dispersion and dissipation -- issues important to long time integration and long distance propagation in aeroacoustics. Also investigated were the effect of implementation of different boundary conditions. The Workshop included a forum in which practical engineering problems related to computational aeroacoustics were discussed. This discussion took the form of a dialogue between an industrial panel and the workshop participants and was an effort to suggest the direction of evolution of this field in the context of current engineering needs.

I Know What You Did Last Summer

ERIC Educational Resources Information Center

Opalinski, Gail; Ellers, Sherry; Goodman, Amy

2004-01-01

This article describes the revised summer school program developed by the Anchorage (AK) School District for students who received poor grades in their core classes or low scores in the Alaska Benchmark Examinations or California Achievement Tests. More than 500 middle school students from the district spent five weeks during the summer honing…
A Privacy-Preserving Platform for User-Centric Quantitative Benchmarking

NASA Astrophysics Data System (ADS)

Herrmann, Dominik; Scheuer, Florian; Feustel, Philipp; Nowey, Thomas; Federrath, Hannes

We propose a centralised platform for quantitative benchmarking of key performance indicators (KPI) among mutually distrustful organisations. Our platform offers users the opportunity to request an ad-hoc benchmarking for a specific KPI within a peer group of their choice. Architecture and protocol are designed to provide anonymity to its users and to hide the sensitive KPI values from other clients and the central server. To this end, we integrate user-centric peer group formation, exchangeable secure multi-party computation protocols, short-lived ephemeral key pairs as pseudonyms, and attribute certificates. We show by empirical evaluation of a prototype that the performance is acceptable for reasonably sized peer groups.
Inclusion and Human Rights in Health Policies: Comparative and Benchmarking Analysis of 51 Policies from Malawi, Sudan, South Africa and Namibia

PubMed Central

MacLachlan, Malcolm; Amin, Mutamad; Mannan, Hasheem; El Tayeb, Shahla; Bedri, Nafisa; Swartz, Leslie; Munthali, Alister; Van Rooy, Gert; McVeigh, Joanne

2012-01-01

While many health services strive to be equitable, accessible and inclusive, peoples’ right to health often goes unrealized, particularly among vulnerable groups. The extent to which health policies explicitly seek to achieve such goals sets the policy context in which services are delivered and evaluated. An analytical framework was developed – EquiFrame – to evaluate 1) the extent to which 21 Core Concepts of human rights were addressed in policy documents, and 2) coverage of 12 Vulnerable Groups who might benefit from such policies. Using this framework, analysis of 51 policies across Malawi, Namibia, South Africa and Sudan, confirmed the relevance of all Core Concepts and Vulnerable Groups. Further, our analysis highlighted some very strong policies, serious shortcomings in others as well as country-specific patterns. If social inclusion and human rights do not underpin policy formation, it is unlikely they will be inculcated in service delivery. EquiFrame facilitates policy analysis and benchmarking, and provides a means for evaluating policy revision and development. PMID:22649488
SU-D-BRD-03: A Gateway for GPU Computing in Cancer Radiotherapy Research

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jia, X; Folkerts, M; Shi, F

Purpose: Graphics Processing Unit (GPU) has become increasingly important in radiotherapy. However, it is still difficult for general clinical researchers to access GPU codes developed by other researchers, and for developers to objectively benchmark their codes. Moreover, it is quite often to see repeated efforts spent on developing low-quality GPU codes. The goal of this project is to establish an infrastructure for testing GPU codes, cross comparing them, and facilitating code distributions in radiotherapy community. Methods: We developed a system called Gateway for GPU Computing in Cancer Radiotherapy Research (GCR2). A number of GPU codes developed by our group andmore » other developers can be accessed via a web interface. To use the services, researchers first upload their test data or use the standard data provided by our system. Then they can select the GPU device on which the code will be executed. Our system offers all mainstream GPU hardware for code benchmarking purpose. After the code running is complete, the system automatically summarizes and displays the computing results. We also released a SDK to allow the developers to build their own algorithm implementation and submit their binary codes to the system. The submitted code is then systematically benchmarked using a variety of GPU hardware and representative data provided by our system. The developers can also compare their codes with others and generate benchmarking reports. Results: It is found that the developed system is fully functioning. Through a user-friendly web interface, researchers are able to test various GPU codes. Developers also benefit from this platform by comprehensively benchmarking their codes on various GPU platforms and representative clinical data sets. Conclusion: We have developed an open platform allowing the clinical researchers and developers to access the GPUs and GPU codes. This development will facilitate the utilization of GPU in radiation therapy field.« less
Algorithm and Architecture Independent Benchmarking with SEAK

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tallent, Nathan R.; Manzano Franco, Joseph B.; Gawande, Nitin A.

2016-05-23

Many applications of high performance embedded computing are limited by performance or power bottlenecks. We have designed the Suite for Embedded Applications & Kernels (SEAK), a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions; and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) and goal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user black-box evaluation that can capture tradeoffs between performance, power, accuracy, size, andmore » weight. The tradeoffs are especially informative for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2000-01-01

A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.
Nonlinear model updating applied to the IMAC XXXII Round Robin benchmark system

NASA Astrophysics Data System (ADS)

Kurt, Mehmet; Moore, Keegan J.; Eriten, Melih; McFarland, D. Michael; Bergman, Lawrence A.; Vakakis, Alexander F.

2017-05-01

We consider the application of a new nonlinear model updating strategy to a computational benchmark system. The approach relies on analyzing system response time series in the frequency-energy domain by constructing both Hamiltonian and forced and damped frequency-energy plots (FEPs). The system parameters are then characterized and updated by matching the backbone branches of the FEPs with the frequency-energy wavelet transforms of experimental and/or computational time series. The main advantage of this method is that no nonlinearity model is assumed a priori, and the system model is updated solely based on simulation and/or experimental measured time series. By matching the frequency-energy plots of the benchmark system and its reduced-order model, we show that we are able to retrieve the global strongly nonlinear dynamics in the frequency and energy ranges of interest, identify bifurcations, characterize local nonlinearities, and accurately reconstruct time series. We apply the proposed methodology to a benchmark problem, which was posed to the system identification community prior to the IMAC XXXII (2014) and XXXIII (2015) Conferences as a "Round Robin Exercise on Nonlinear System Identification". We show that we are able to identify the parameters of the non-linear element in the problem with a priori knowledge about its position.
Flexible Tagged Architecture for Trustworthy Multi-core Platforms

DTIC Science & Technology

2015-06-01

well as two kernel benchmarks for SHA - 256 and GMAC, which are popular cryptographic standards. We compared the execution time of these benchmarks...UMC UMC on Flex fabric (FPGA) 266 90,384 10.8% 21 5.8% DIFT DIFT on Flex fabric (FPGA) 256 123,471 14.8% 23 6.3% BC BC on Flex fabric (FPGA) 229...0.25X) (1X) (0.5X) (0.25X) (1X) (0.5X) (0.25X) (1X) (0.5X) (0.25X) sha 1.01 1.01 1.01 1.01 1.06 1.16 1.03 1.07 1.15 1.00 1.33 1.50 gmac 1.01 1.01 1.09
Performance Studies on Distributed Virtual Screening

PubMed Central

Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.

2014-01-01

Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219
Two-component relativistic coupled-cluster methods using mean-field spin-orbit integrals

NASA Astrophysics Data System (ADS)

Liu, Junzi; Shen, Yue; Asthana, Ayush; Cheng, Lan

2018-01-01

A novel implementation of the two-component spin-orbit (SO) coupled-cluster singles and doubles (CCSD) method and the CCSD augmented with the perturbative inclusion of triple excitations [CCSD(T)] method using mean-field SO integrals is reported. The new formulation of SO-CCSD(T) features an atomic-orbital-based algorithm for the particle-particle ladder term in the CCSD equation, which not only removes the computational bottleneck associated with the large molecular-orbital integral file but also accelerates the evaluation of the particle-particle ladder term by around a factor of 4 by taking advantage of the spin-free nature of the instantaneous electron-electron Coulomb interaction. Benchmark calculations of the SO splittings for the thallium atom and a set of diatomic 2Π radicals as well as of the bond lengths and harmonic frequencies for a set of closed-shell diatomic molecules are presented. The basis-set and core-correlation effects in the calculations of these properties have been carefully analyzed.
Scalable randomized benchmarking of non-Clifford gates

NASA Astrophysics Data System (ADS)

Cross, Andrew; Magesan, Easwar; Bishop, Lev; Smolin, John; Gambetta, Jay

Randomized benchmarking is a widely used experimental technique to characterize the average error of quantum operations. Benchmarking procedures that scale to enable characterization of n-qubit circuits rely on efficient procedures for manipulating those circuits and, as such, have been limited to subgroups of the Clifford group. However, universal quantum computers require additional, non-Clifford gates to approximate arbitrary unitary transformations. We define a scalable randomized benchmarking procedure over n-qubit unitary matrices that correspond to protected non-Clifford gates for a class of stabilizer codes. We present efficient methods for representing and composing group elements, sampling them uniformly, and synthesizing corresponding poly (n) -sized circuits. The procedure provides experimental access to two independent parameters that together characterize the average gate fidelity of a group element. We acknowledge support from ARO under Contract W911NF-14-1-0124.
Deepthi Vaidhynathan | NREL

Science.gov Websites

Complex Systems Simulation and Optimization Group on performance analysis and benchmarking latest . Research Interests High Performance Computing|Embedded System |Microprocessors & Microcontrollers
Investigating the Transonic Flutter Boundary of the Benchmark Supercritical Wing

NASA Technical Reports Server (NTRS)

Heeg, Jennifer; Chwalowski, Pawel

2017-01-01

This paper builds on the computational aeroelastic results published previously and generated in support of the second Aeroelastic Prediction Workshop for the NASA Benchmark Supercritical Wing configuration. The computational results are obtained using FUN3D, an unstructured grid Reynolds-Averaged Navier-Stokes solver developed at the NASA Langley Research Center. The analysis results focus on understanding the dip in the transonic flutter boundary at a single Mach number (0.74), exploring an angle of attack range of ??1 to 8 and dynamic pressures from wind off to beyond flutter onset. The rigid analysis results are examined for insights into the behavior of the aeroelastic system. Both static and dynamic aeroelastic simulation results are also examined.
12 CFR 567.12 - Purchased credit card relationships, servicing assets, intangible assets (other than purchased...

Code of Federal Regulations, 2011 CFR

2011-01-01

... and core capital. (b) Computation of core and tangible capital. (1) Purchased credit card relationships may be included (that is, not deducted) in computing core capital in accordance with the... restrictions in this section, mortgage servicing assets may be included in computing core and tangible capital...
Synapse-Centric Mapping of Cortical Models to the SpiNNaker Neuromorphic Architecture

PubMed Central

Knight, James C.; Furber, Steve B.

2016-01-01

While the adult human brain has approximately 8.8 × 1010 neurons, this number is dwarfed by its 1 × 1015 synapses. From the point of view of neuromorphic engineering and neural simulation in general this makes the simulation of these synapses a particularly complex problem. SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Current solutions for simulating spiking neural networks on SpiNNaker are heavily inspired by work on distributed high-performance computing. However, while SpiNNaker shares many characteristics with such distributed systems, its component nodes have much more limited resources and, as the system lacks global synchronization, the computation performed on each node must complete within a fixed time step. We first analyze the performance of the current SpiNNaker neural simulation software and identify several problems that occur when it is used to simulate networks of the type often used to model the cortex which contain large numbers of sparsely connected synapses. We then present a new, more flexible approach for mapping the simulation of such networks to SpiNNaker which solves many of these problems. Finally we analyze the performance of our new approach using both benchmarks, designed to represent cortical connectivity, and larger, functional cortical models. In a benchmark network where neurons receive input from 8000 STDP synapses, our new approach allows 4× more neurons to be simulated on each SpiNNaker core than has been previously possible. We also demonstrate that the largest plastic neural network previously simulated on neuromorphic hardware can be run in real time using our new approach: double the speed that was previously achieved. Additionally this network contains two types of plastic synapse which previously had to be trained separately but, using our new approach, can be trained simultaneously. PMID:27683540
Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI.

PubMed

An, Yi; Wang, Jiawei; Li, Chen; Leier, André; Marquez-Lago, Tatiana; Wilksch, Jonathan; Zhang, Yang; Webb, Geoffrey I; Song, Jiangning; Lithgow, Trevor

2018-01-01

Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperform all the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp. We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
An imaging-based computational model for simulating angiogenesis and tumour oxygenation dynamics

NASA Astrophysics Data System (ADS)

Adhikarla, Vikram; Jeraj, Robert

2016-05-01

Tumour growth, angiogenesis and oxygenation vary substantially among tumours and significantly impact their treatment outcome. Imaging provides a unique means of investigating these tumour-specific characteristics. Here we propose a computational model to simulate tumour-specific oxygenation changes based on the molecular imaging data. Tumour oxygenation in the model is reflected by the perfused vessel density. Tumour growth depends on its doubling time (T d) and the imaged proliferation. Perfused vessel density recruitment rate depends on the perfused vessel density around the tumour (sMVDtissue) and the maximum VEGF concentration for complete vessel dysfunctionality (VEGFmax). The model parameters were benchmarked to reproduce the dynamics of tumour oxygenation over its entire lifecycle, which is the most challenging test. Tumour oxygenation dynamics were quantified using the peak pO2 (pO2peak) and the time to peak pO2 (t peak). Sensitivity of tumour oxygenation to model parameters was assessed by changing each parameter by 20%. t peak was found to be more sensitive to tumour cell line related doubling time (~30%) as compared to tissue vasculature density (~10%). On the other hand, pO2peak was found to be similarly influenced by the above tumour- and vasculature-associated parameters (~30-40%). Interestingly, both pO2peak and t peak were only marginally affected by VEGFmax (~5%). The development of a poorly oxygenated (hypoxic) core with tumour growth increased VEGF accumulation, thus disrupting the vessel perfusion as well as further increasing hypoxia with time. The model with its benchmarked parameters, is applied to hypoxia imaging data obtained using a [64Cu]Cu-ATSM PET scan of a mouse tumour and the temporal development of the vasculature and hypoxia maps are shown. The work underscores the importance of using tumour-specific input for analysing tumour evolution. An extended model incorporating therapeutic effects can serve as a powerful tool for analysing tumour response to anti-angiogenic therapies.
HELIOS: A new open-source radiative transfer code

NASA Astrophysics Data System (ADS)

Malik, Matej; Grosheintz, Luc; Lukas Grimm, Simon; Mendonça, João; Kitzmann, Daniel; Heng, Kevin

2015-12-01

I present the new open-source code HELIOS, developed to accurately describe radiative transfer in a wide variety of irradiated atmospheres. We employ a one-dimensional multi-wavelength two-stream approach with scattering. Written in Cuda C++, HELIOS uses the GPU’s potential of massive parallelization and is able to compute the TP-profile of an atmosphere in radiative equilibrium and the subsequent emission spectrum in a few minutes on a single computer (for 60 layers and 1000 wavelength bins).The required molecular opacities are obtained with the recently published code HELIOS-K [1], which calculates the line shapes from an input line list and resamples the numerous line-by-line data into a manageable k-distribution format. Based on simple equilibrium chemistry theory [2] we combine the k-distribution functions of the molecules H2O, CO2, CO & CH4 to generate a k-table, which we then employ in HELIOS.I present our results of the following: (i) Various numerical tests, e.g. isothermal vs. non-isothermal treatment of layers. (ii) Comparison of iteratively determined TP-profiles with their analytical parametric prescriptions [3] and of the corresponding spectra. (iii) Benchmarks of TP-profiles & spectra for various elemental abundances. (iv) Benchmarks of averaged TP-profiles & spectra for the exoplanets GJ1214b, HD189733b & HD209458b. (v) Comparison with secondary eclipse data for HD189733b, XO-1b & Corot-2b.HELIOS is being developed, together with the dynamical core THOR and the chemistry solver VULCAN, in the group of Kevin Heng at the University of Bern as part of the Exoclimes Simulation Platform (ESP) [4], which is an open-source project aimed to provide community tools to model exoplanetary atmospheres.-----------------------------[1] Grimm & Heng 2015, ArXiv, 1503.03806[2] Heng, Lyons & Tsai, Arxiv, 1506.05501Heng & Lyons, ArXiv, 1507.01944[3] e.g. Heng, Mendonca & Lee, 2014, ApJS, 215, 4H[4] exoclime.net
Developing a Shuffled Complex-Self Adaptive Hybrid Evolution (SC-SAHEL) Framework for Water Resources Management and Water-Energy System Optimization

NASA Astrophysics Data System (ADS)

Rahnamay Naeini, M.; Sadegh, M.; AghaKouchak, A.; Hsu, K. L.; Sorooshian, S.; Yang, T.

2017-12-01

Meta-Heuristic optimization algorithms have gained a great deal of attention in a wide variety of fields. Simplicity and flexibility of these algorithms, along with their robustness, make them attractive tools for solving optimization problems. Different optimization methods, however, hold algorithm-specific strengths and limitations. Performance of each individual algorithm obeys the "No-Free-Lunch" theorem, which means a single algorithm cannot consistently outperform all possible optimization problems over a variety of problems. From users' perspective, it is a tedious process to compare, validate, and select the best-performing algorithm for a specific problem or a set of test cases. In this study, we introduce a new hybrid optimization framework, entitled Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), which combines the strengths of different evolutionary algorithms (EAs) in a parallel computing scheme, and allows users to select the most suitable algorithm tailored to the problem at hand. The concept of SC-SAHEL is to execute different EAs as separate parallel search cores, and let all participating EAs to compete during the course of the search. The newly developed SC-SAHEL algorithm is designed to automatically select, the best performing algorithm for the given optimization problem. This algorithm is rigorously effective in finding the global optimum for several strenuous benchmark test functions, and computationally efficient as compared to individual EAs. We benchmark the proposed SC-SAHEL algorithm over 29 conceptual test functions, and two real-world case studies - one hydropower reservoir model and one hydrological model (SAC-SMA). Results show that the proposed framework outperforms individual EAs in an absolute majority of the test problems, and can provide competitive results to the fittest EA algorithm with more comprehensive information during the search. The proposed framework is also flexible for merging additional EAs, boundary-handling techniques, and sampling schemes, and has good potential to be used in Water-Energy system optimal operation and management.
Edge localized linear ideal magnetohydrodynamic instability studies in an extended-magnetohydrodynamic code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burke, B. J.; Kruger, S. E.; Hegna, C. C.

A linear benchmark between the linear ideal MHD stability codes ELITE [H. R. Wilson et al., Phys. Plasmas 9, 1277 (2002)], GATO [L. Bernard et al., Comput. Phys. Commun. 24, 377 (1981)], and the extended nonlinear magnetohydrodynamic (MHD) code, NIMROD [C. R. Sovinec et al.., J. Comput. Phys. 195, 355 (2004)] is undertaken for edge-localized (MHD) instabilities. Two ballooning-unstable, shifted-circle tokamak equilibria are compared where the stability characteristics are varied by changing the equilibrium plasma profiles. The equilibria model an H-mode plasma with a pedestal pressure profile and parallel edge currents. For both equilibria, NIMROD accurately reproduces the transition tomore » instability (the marginally unstable mode), as well as the ideal growth spectrum for a large range of toroidal modes (n=1-20). The results use the compressible MHD model and depend on a precise representation of 'ideal-like' and 'vacuumlike' or 'halo' regions within the code. The halo region is modeled by the introduction of a Lundquist-value profile that transitions from a large to a small value at a flux surface location outside of the pedestal region. To model an ideal-like MHD response in the core and a vacuumlike response outside the transition, separate criteria on the plasma and halo Lundquist values are required. For the benchmarked equilibria the critical Lundquist values are 10{sup 8} and 10{sup 3} for the ideal-like and halo regions, respectively. Notably, this gives a ratio on the order of 10{sup 5}, which is much larger than experimentally measured values using T{sub e} values associated with the top of the pedestal and separatrix. Excellent agreement with ELITE and GATO calculations are made when sharp boundary transitions in the resistivity are used and a small amount of physical dissipation is added for conditions very near and below marginal ideal stability.« less

Computer ray tracing speeds.

PubMed

Robb, P; Pawlowski, B

1990-05-01

The results of measuring the ray trace speed and compilation speed of thirty-nine computers in fifty-seven configurations, ranging from personal computers to super computers, are described. A correlation of ray trace speed has been made with the LINPACK benchmark which allows the ray trace speed to be estimated using LINPACK performance data. The results indicate that the latest generation of workstations, using CPUs based on RISC (Reduced Instruction Set Computer) technology, are as fast or faster than mainframe computers in compute-bound situations.
Can Humans Fly Action Understanding with Multiple Classes of Actors

DTIC Science & Technology

2015-06-08

recognition using structure from motion point clouds. In European Conference on Computer Vision, 2008. [5] R. Caruana. Multitask learning. Machine Learning...tonomous driving ? the kitti vision benchmark suite. In IEEE Conference on Computer Vision and Pattern Recognition, 2012. [12] L. Gorelick, M. Blank
Quantum computing applied to calculations of molecular energies: CH2 benchmark.

PubMed

Veis, Libor; Pittner, Jiří

2010-11-21

Quantum computers are appealing for their ability to solve some tasks much faster than their classical counterparts. It was shown in [Aspuru-Guzik et al., Science 309, 1704 (2005)] that they, if available, would be able to perform the full configuration interaction (FCI) energy calculations with a polynomial scaling. This is in contrast to conventional computers where FCI scales exponentially. We have developed a code for simulation of quantum computers and implemented our version of the quantum FCI algorithm. We provide a detailed description of this algorithm and the results of the assessment of its performance on the four lowest lying electronic states of CH(2) molecule. This molecule was chosen as a benchmark, since its two lowest lying (1)A(1) states exhibit a multireference character at the equilibrium geometry. It has been shown that with a suitably chosen initial state of the quantum register, one is able to achieve the probability amplification regime of the iterative phase estimation algorithm even in this case.
Tinker-HP: a massively parallel molecular dynamics package for multiscale simulations of large complex systems with advanced point dipole polarizable force fields† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc04531j

PubMed Central

Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F.; Harger, Matthew; Torabifard, Hedieh; Cisneros, G. Andrés; Schnieders, Michael J.; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y.; Ponder, Jay W.

2017-01-01

We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over a single-core computation is observed for the largest systems. The extension of the present CPU implementation of Tinker-HP to other computational platforms is discussed. PMID:29732110
Large eddy simulation of the FDA benchmark nozzle for a Reynolds number of 6500.

PubMed

Janiga, Gábor

2014-04-01

This work investigates the flow in a benchmark nozzle model of an idealized medical device proposed by the FDA using computational fluid dynamics (CFD). It was in particular shown that a proper modeling of the transitional flow features is particularly challenging, leading to large discrepancies and inaccurate predictions from the different research groups using Reynolds-averaged Navier-Stokes (RANS) modeling. In spite of the relatively simple, axisymmetric computational geometry, the resulting turbulent flow is fairly complex and non-axisymmetric, in particular due to the sudden expansion. The resulting flow cannot be well predicted with simple modeling approaches. Due to the varying diameters and flow velocities encountered in the nozzle, different typical flow regions and regimes can be distinguished, from laminar to transitional and to weakly turbulent. The purpose of the present work is to re-examine the FDA-CFD benchmark nozzle model at a Reynolds number of 6500 using large eddy simulation (LES). The LES results are compared with published experimental data obtained by Particle Image Velocimetry (PIV) and an excellent agreement can be observed considering the temporally averaged flow velocities. Different flow regimes are characterized by computing the temporal energy spectra at different locations along the main axis. Copyright © 2014 Elsevier Ltd. All rights reserved.
Theory comparison and numerical benchmarking on neoclassical toroidal viscosity torque

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Zhirui; Park, Jong-Kyu; Logan, Nikolas

Systematic comparison and numerical benchmarking have been successfully carried out among three different approaches of neoclassical toroidal viscosity (NTV) theory and the corresponding codes: IPEC-PENT is developed based on the combined NTV theory but without geometric simplifications [Park et al., Phys. Rev. Lett. 102, 065002 (2009)]; MARS-Q includes smoothly connected NTV formula [Shaing et al., Nucl. Fusion 50, 025022 (2010)] based on Shaing's analytic formulation in various collisionality regimes; MARS-K, originally computing the drift kinetic energy, is upgraded to compute the NTV torque based on the equivalence between drift kinetic energy and NTV torque [J.-K. Park, Phys. Plasma 18, 110702more » (2011)]. The derivation and numerical results both indicate that the imaginary part of drift kinetic energy computed by MARS-K is equivalent to the NTV torque in IPEC-PENT. In the benchmark of precession resonance between MARS-Q and MARS-K/IPEC-PENT, the agreement and correlation between the connected NTV formula and the combined NTV theory in different collisionality regimes are shown for the first time. Additionally, both IPEC-PENT and MARS-K indicate the importance of the bounce harmonic resonance which can greatly enhance the NTV torque when E×B drift frequency reaches the bounce resonance condition.« less
LUMA: A many-core, Fluid-Structure Interaction solver based on the Lattice-Boltzmann Method

NASA Astrophysics Data System (ADS)

Harwood, Adrian R. G.; O'Connor, Joseph; Sanchez Muñoz, Jonathan; Camps Santasmasas, Marta; Revell, Alistair J.

2018-01-01

The Lattice-Boltzmann Method at the University of Manchester (LUMA) project was commissioned to build a collaborative research environment in which researchers of all abilities can study fluid-structure interaction (FSI) problems in engineering applications from aerodynamics to medicine. It is built on the principles of accessibility, simplicity and flexibility. The LUMA software at the core of the project is a capable FSI solver with turbulence modelling and many-core scalability as well as a wealth of input/output and pre- and post-processing facilities. The software has been validated and several major releases benchmarked on supercomputing facilities internationally. The software architecture is modular and arranged logically using a minimal amount of object-orientation to maintain a simple and accessible software.
Core Competencies for Injury and Violence Prevention

PubMed Central

Stephens-Stidham, Shelli; Peek-Asa, Corinne; Bou-Saada, Ingrid; Hunter, Wanda; Lindemer, Kristen; Runyan, Carol

2009-01-01

Efforts to reduce the burden of injury and violence require a workforce that is knowledgeable and skilled in prevention. However, there has been no systematic process to ensure that professionals possess the necessary competencies. To address this deficiency, we developed a set of core competencies for public health practitioners in injury and violence prevention programs. The core competencies address domains including public health significance, data, the design and implementation of prevention activities, evaluation, program management, communication, stimulating change, and continuing education. Specific learning objectives establish goals for training in each domain. The competencies assist in efforts to reduce the burden of injury and violence and can provide benchmarks against which to assess progress in professional capacity for injury and violence prevention. PMID:19197083
Test Scheduling for Core-Based SOCs Using Genetic Algorithm Based Heuristic Approach

NASA Astrophysics Data System (ADS)

Giri, Chandan; Sarkar, Soumojit; Chattopadhyay, Santanu

This paper presents a Genetic algorithm (GA) based solution to co-optimize test scheduling and wrapper design for core based SOCs. Core testing solutions are generated as a set of wrapper configurations, represented as rectangles with width equal to the number of TAM (Test Access Mechanism) channels and height equal to the corresponding testing time. A locally optimal best-fit heuristic based bin packing algorithm has been used to determine placement of rectangles minimizing the overall test times, whereas, GA has been utilized to generate the sequence of rectangles to be considered for placement. Experimental result on ITC'02 benchmark SOCs shows that the proposed method provides better solutions compared to the recent works reported in the literature.
Decoys Selection in Benchmarking Datasets: Overview and Perspectives

PubMed Central

Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

2018-01-01

Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509
Performing a local reduction operation on a parallel computer

DOEpatents

Blocksome, Michael A; Faraj, Daniel A

2013-06-04

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Performing a local reduction operation on a parallel computer

DOEpatents

Blocksome, Michael A.; Faraj, Daniel A.

2012-12-11

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
High Performance Computing at NASA

NASA Technical Reports Server (NTRS)

Bailey, David H.; Cooper, D. M. (Technical Monitor)

1994-01-01

The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
Integrating CFD, CAA, and Experiments Towards Benchmark Datasets for Airframe Noise Problems

NASA Technical Reports Server (NTRS)

Choudhari, Meelan M.; Yamamoto, Kazuomi

2012-01-01

Airframe noise corresponds to the acoustic radiation due to turbulent flow in the vicinity of airframe components such as high-lift devices and landing gears. The combination of geometric complexity, high Reynolds number turbulence, multiple regions of separation, and a strong coupling with adjacent physical components makes the problem of airframe noise highly challenging. Since 2010, the American Institute of Aeronautics and Astronautics has organized an ongoing series of workshops devoted to Benchmark Problems for Airframe Noise Computations (BANC). The BANC workshops are aimed at enabling a systematic progress in the understanding and high-fidelity predictions of airframe noise via collaborative investigations that integrate state of the art computational fluid dynamics, computational aeroacoustics, and in depth, holistic, and multifacility measurements targeting a selected set of canonical yet realistic configurations. This paper provides a brief summary of the BANC effort, including its technical objectives, strategy, and selective outcomes thus far.
The Equivalent Thermal Resistance of Tile Roofs with and without Batten Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, William A

Clay and concrete tile roofs were installed on a fully instrumented attic test facility operating in East Tennessee s climate. Roof, attic and deck temperatures and heat flows were recorded for each of the tile roofs and also on an adjacent attic cavity covered with a conventionally pigmented and direct-nailed asphalt shingle roof. The data were used to benchmark a computer tool for simulation of roofs and attics and the tool used to develop an approach for computing an equivalent seasonal R-value for sub-tile venting. The approach computed equal heat fluxes through the ceilings of roofs having different combinations ofmore » surface radiation properties and or building constructions. A direct nailed shingle roof served as a control for estimating the equivalent thermal resistance of the air space. Simulations were benchmarked to data in the ASHRAE Fundamentals for the thermal resistance of inclined and closed air spaces.« less
Results of the Simulation of the HTR-Proteus Core 4.2 Using PEBBED-COMBINE: FY10 Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hans Gougar

2010-07-01

ABSTRACT The Idaho National Laboratory’s deterministic neutronics analysis codes and methods were applied to the computation of the core multiplication factor of the HTR-Proteus pebble bed reactor critical facility. This report is a follow-on to INL/EXT-09-16620 in which the same calculation was performed but using earlier versions of the codes and less developed methods. In that report, results indicated that the cross sections generated using COMBINE-7.0 did not yield satisfactory estimates of keff. It was concluded in the report that the modeling of control rods was not satisfactory. In the past year, improvements to the homogenization capability in COMBINE havemore » enabled the explicit modeling of TRIS particles, pebbles, and heterogeneous core zones including control rod regions using a new multi-scale version of COMBINE in which the 1-dimensional discrete ordinate transport code ANISN has been integrated. The new COMBINE is shown to yield benchmark quality results for pebble unit cell models, the first step in preparing few-group diffusion parameters for core simulations. In this report, the full critical core is modeled once again but with cross sections generated using the capabilities and physics of the improved COMBINE code. The new PEBBED-COMBINE model enables the exact modeling of the pebbles and control rod region along with better approximation to structures in the reflector. Initial results for the core multiplication factor indicate significant improvement in the INL’s tools for modeling the neutronic properties of a pebble bed reactor. Errors on the order of 1.6-2.5% in keff are obtained; a significant improvement over the 5-6% error observed in the earlier This is acceptable for a code system and model in the early stages of development but still too high for a production code. Analysis of a simpler core model indicates an over-prediction of the flux in the low end of the thermal spectrum. Causes of this discrepancy are under investigation. New homogenization techniques and assumptions were used in this analysis and as such, they require further confirmation and validation. Further refinement and review of the complex Proteus core model are likely to reduce the errors even further.« less
Note-Taking Interventions to Assist Students with Disabilities in Content Area Classes

ERIC Educational Resources Information Center

Boyle, Joseph R.; Forchelli, Gina A.; Cariss, Kaitlyn

2015-01-01

As high-stakes testing, Common Core, and state standards become the new norms in schools, teachers are tasked with helping all students meet specific benchmarks. In conjunction with the influx of more students with disabilities being included in inclusive and general education classrooms where lectures with note-taking comprise a majority of…
A Qualitative Study of Urban and Suburban Elementary Student Understandings of Pest-Related Science and Agricultural Education Benchmarks.

ERIC Educational Resources Information Center

Trexler, Cary J.

2000-01-01

Clinical interviews with nine fifth graders revealed that experiences play a pivotal role in their understanding of pests. They lack well-developed schema and language to discuss pest management. A foundation of core biological concepts was necessary for understanding pests and pest management. (Conatains 34 references.) (SK)
A Psychometric Analysis of Teacher-Made Benchmark Assessments in English Language Arts

ERIC Educational Resources Information Center

Milligan, Andrea

2017-01-01

The implementation of the Common Core State Standards (CCSS) has placed increased accountability for outcomes on both students and teachers. To address the current youth literacy crisis in the United States, the CCSS call for students to read increasingly complex informational and literary texts. Since teachers are held accountable for students'…
Computer simulation to predict energy use, greenhouse gas emissions and costs for production of fluid milk using alternative processing methods

USDA-ARS?s Scientific Manuscript database

Computer simulation is a useful tool for benchmarking the electrical and fuel energy consumption and water use in a fluid milk plant. In this study, a computer simulation model of the fluid milk process based on high temperature short time (HTST) pasteurization was extended to include models for pr...

Experimental flutter boundaries with unsteady pressure distributions for the NACA 0012 Benchmark Model

NASA Technical Reports Server (NTRS)

Rivera, Jose A., Jr.; Dansberry, Bryan E.; Farmer, Moses G.; Eckstrom, Clinton V.; Seidel, David A.; Bennett, Robert M.

1991-01-01

The Structural Dynamics Div. at NASA-Langley has started a wind tunnel activity referred to as the Benchmark Models Program. The objective is to acquire test data that will be useful for developing and evaluating aeroelastic type Computational Fluid Dynamics codes currently in use or under development. The progress is described which was achieved in testing the first model in the Benchmark Models Program. Experimental flutter boundaries are presented for a rigid semispan model (NACA 0012 airfoil section) mounted on a flexible mount system. Also, steady and unsteady pressure measurements taken at the flutter condition are presented. The pressure data were acquired over the entire model chord located at the 60 pct. span station.
A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerbyson, Darren J.; Barker, Kevin J.; Vishnu, Abhinav

2014-01-01

We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the waymore » in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q.« less
WarpEngine, a Flexible Platform for Distributed Computing Implemented in the VEGA Program and Specially Targeted for Virtual Screening Studies.

PubMed

Pedretti, Alessandro; Mazzolari, Angelica; Vistoli, Giulio

2018-05-21

The manuscript describes WarpEngine, a novel platform implemented within the VEGA ZZ suite of software for performing distributed simulations both in local and wide area networks. Despite being tailored for structure-based virtual screening campaigns, WarpEngine possesses the required flexibility to carry out distributed calculations utilizing various pieces of software, which can be easily encapsulated within this platform without changing their source codes. WarpEngine takes advantages of all cheminformatics features implemented in the VEGA ZZ program as well as of its largely customizable scripting architecture thus allowing an efficient distribution of various time-demanding simulations. To offer an example of the WarpEngine potentials, the manuscript includes a set of virtual screening campaigns based on the ACE data set of the DUD-E collections using PLANTS as the docking application. Benchmarking analyses revealed a satisfactory linearity of the WarpEngine performances, the speed-up values being roughly equal to the number of utilized cores. Again, the computed scalability values emphasized that a vast majority (i.e., >90%) of the performed simulations benefit from the distributed platform presented here. WarpEngine can be freely downloaded along with the VEGA ZZ program at www.vegazz.net .
RF Wave Simulation Using the MFEM Open Source FEM Package

NASA Astrophysics Data System (ADS)

Stillerman, J.; Shiraiwa, S.; Bonoli, P. T.; Wright, J. C.; Green, D. L.; Kolev, T.

2016-10-01

A new plasma wave simulation environment based on the finite element method is presented. MFEM, a scalable open-source FEM library, is used as the basis for this capability. MFEM allows for assembling an FEM matrix of arbitrarily high order in a parallel computing environment. A 3D frequency domain RF physics layer was implemented using a python wrapper for MFEM and a cold collisional plasma model was ported. This physics layer allows for defining the plasma RF wave simulation model without user knowledge of the FEM weak-form formulation. A graphical user interface is built on πScope, a python-based scientific workbench, such that a user can build a model definition file interactively. Benchmark cases have been ported to this new environment, with results being consistent with those obtained using COMSOL multiphysics, GENRAY, and TORIC/TORLH spectral solvers. This work is a first step in bringing to bear the sophisticated computational tool suite that MFEM provides (e.g., adaptive mesh refinement, solver suite, element types) to the linear plasma-wave interaction problem, and within more complicated integrated workflows, such as coupling with core spectral solver, or incorporating additional physics such as an RF sheath potential model or kinetic effects. USDoE Awards DE-FC02-99ER54512, DE-FC02-01ER54648.
QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

PubMed Central

Gudyś, Adam; Deorowicz, Sebastian

2014-01-01

Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors. PMID:24586435
Graph Theoretic and Motif Analyses of the Hippocampal Neuron Type Potential Connectome.

PubMed

Rees, Christopher L; Wheeler, Diek W; Hamilton, David J; White, Charise M; Komendantov, Alexander O; Ascoli, Giorgio A

2016-01-01

We computed the potential connectivity map of all known neuron types in the rodent hippocampal formation by supplementing scantly available synaptic data with spatial distributions of axons and dendrites from the open-access knowledge base Hippocampome.org. The network that results from this endeavor, the broadest and most complete for a mammalian cortical region at the neuron-type level to date, contains more than 3200 connections among 122 neuron types across six subregions. Analyses of these data using graph theory metrics unveil the fundamental architectural principles of the hippocampal circuit. Globally, we identify a highly specialized topology minimizing communication cost; a modular structure underscoring the prominence of the trisynaptic loop; a core set of neuron types serving as information-processing hubs as well as a distinct group of particular antihub neurons; a nested, two-tier rich club managing much of the network traffic; and an innate resilience to random perturbations. At the local level, we uncover the basic building blocks, or connectivity patterns, that combine to produce complex global functionality, and we benchmark their utilization in the circuit relative to random networks. Taken together, these results provide a comprehensive connectivity profile of the hippocampus, yielding novel insights on its functional operations at the computationally crucial level of neuron types.
: A Scalable and Transparent System for Simulating MPI Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2010-01-01

is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

DOE PAGES

Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...

2015-12-21

This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Experimental power density distribution benchmark in the TRIGA Mark II reactor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snoj, L.; Stancar, Z.; Radulovic, V.

2012-07-01

In order to improve the power calibration process and to benchmark the existing computational model of the TRIGA Mark II reactor at the Josef Stefan Inst. (JSI), a bilateral project was started as part of the agreement between the French Commissariat a l'energie atomique et aux energies alternatives (CEA) and the Ministry of higher education, science and technology of Slovenia. One of the objectives of the project was to analyze and improve the power calibration process of the JSI TRIGA reactor (procedural improvement and uncertainty reduction) by using absolutely calibrated CEA fission chambers (FCs). This is one of the fewmore » available power density distribution benchmarks for testing not only the fission rate distribution but also the absolute values of the fission rates. Our preliminary calculations indicate that the total experimental uncertainty of the measured reaction rate is sufficiently low that the experiments could be considered as benchmark experiments. (authors)« less
Time-Accurate Local Time Stepping and High-Order Time CESE Methods for Multi-Dimensional Flows Using Unstructured Meshes

NASA Technical Reports Server (NTRS)

Chang, Chau-Lyan; Venkatachari, Balaji Shankar; Cheng, Gary

2013-01-01

With the wide availability of affordable multiple-core parallel supercomputers, next generation numerical simulations of flow physics are being focused on unsteady computations for problems involving multiple time scales and multiple physics. These simulations require higher solution accuracy than most algorithms and computational fluid dynamics codes currently available. This paper focuses on the developmental effort for high-fidelity multi-dimensional, unstructured-mesh flow solvers using the space-time conservation element, solution element (CESE) framework. Two approaches have been investigated in this research in order to provide high-accuracy, cross-cutting numerical simulations for a variety of flow regimes: 1) time-accurate local time stepping and 2) highorder CESE method. The first approach utilizes consistent numerical formulations in the space-time flux integration to preserve temporal conservation across the cells with different marching time steps. Such approach relieves the stringent time step constraint associated with the smallest time step in the computational domain while preserving temporal accuracy for all the cells. For flows involving multiple scales, both numerical accuracy and efficiency can be significantly enhanced. The second approach extends the current CESE solver to higher-order accuracy. Unlike other existing explicit high-order methods for unstructured meshes, the CESE framework maintains a CFL condition of one for arbitrarily high-order formulations while retaining the same compact stencil as its second-order counterpart. For large-scale unsteady computations, this feature substantially enhances numerical efficiency. Numerical formulations and validations using benchmark problems are discussed in this paper along with realistic examples.
The concerted calculation of the BN-600 reactor for the deterministic and stochastic codes

NASA Astrophysics Data System (ADS)

Bogdanova, E. V.; Kuznetsov, A. N.

2017-01-01

The solution of the problem of increasing the safety of nuclear power plants implies the existence of complete and reliable information about the processes occurring in the core of a working reactor. Nowadays the Monte-Carlo method is the most general-purpose method used to calculate the neutron-physical characteristic of the reactor. But it is characterized by large time of calculation. Therefore, it may be useful to carry out coupled calculations with stochastic and deterministic codes. This article presents the results of research for possibility of combining stochastic and deterministic algorithms in calculation the reactor BN-600. This is only one part of the work, which was carried out in the framework of the graduation project at the NRC “Kurchatov Institute” in cooperation with S. S. Gorodkov and M. A. Kalugin. It is considering the 2-D layer of the BN-600 reactor core from the international benchmark test, published in the report IAEA-TECDOC-1623. Calculations of the reactor were performed with MCU code and then with a standard operative diffusion algorithm with constants taken from the Monte - Carlo computation. Macro cross-section, diffusion coefficients, the effective multiplication factor and the distribution of neutron flux and power were obtained in 15 energy groups. The reasonable agreement between stochastic and deterministic calculations of the BN-600 is observed.
Enhancement of DFT-calculations at petascale: Nuclear Magnetic Resonance, Hybrid Density Functional Theory and Car-Parrinello calculations

NASA Astrophysics Data System (ADS)

Varini, Nicola; Ceresoli, Davide; Martin-Samos, Layla; Girotto, Ivan; Cavazzoni, Carlo

2013-08-01

One of the most promising techniques used for studying the electronic properties of materials is based on Density Functional Theory (DFT) approach and its extensions. DFT has been widely applied in traditional solid state physics problems where periodicity and symmetry play a crucial role in reducing the computational workload. With growing compute power capability and the development of improved DFT methods, the range of potential applications is now including other scientific areas such as Chemistry and Biology. However, cross disciplinary combinations of traditional Solid-State Physics, Chemistry and Biology drastically improve the system complexity while reducing the degree of periodicity and symmetry. Large simulation cells containing of hundreds or even thousands of atoms are needed to model these kind of physical systems. The treatment of those systems still remains a computational challenge even with modern supercomputers. In this paper we describe our work to improve the scalability of Quantum ESPRESSO (Giannozzi et al., 2009 [3]) for treating very large cells and huge numbers of electrons. To this end we have introduced an extra level of parallelism, over electronic bands, in three kernels for solving computationally expensive problems: the Sternheimer equation solver (Nuclear Magnetic Resonance, package QE-GIPAW), the Fock operator builder (electronic ground-state, package PWscf) and most of the Car-Parrinello routines (Car-Parrinello dynamics, package CP). Final benchmarks show our success in computing the Nuclear Magnetic Response (NMR) chemical shift of a large biological assembly, the electronic structure of defected amorphous silica with hybrid exchange-correlation functionals and the equilibrium atomic structure of height Porphyrins anchored to a Carbon Nanotube, on many thousands of CPU cores.
Algorithms of GPU-enabled reactive force field (ReaxFF) molecular dynamics.

PubMed

Zheng, Mo; Li, Xiaoxia; Guo, Li

2013-04-01

Reactive force field (ReaxFF), a recent and novel bond order potential, allows for reactive molecular dynamics (ReaxFF MD) simulations for modeling larger and more complex molecular systems involving chemical reactions when compared with computation intensive quantum mechanical methods. However, ReaxFF MD can be approximately 10-50 times slower than classical MD due to its explicit modeling of bond forming and breaking, the dynamic charge equilibration at each time-step, and its one order smaller time-step than the classical MD, all of which pose significant computational challenges in simulation capability to reach spatio-temporal scales of nanometers and nanoseconds. The very recent advances of graphics processing unit (GPU) provide not only highly favorable performance for GPU enabled MD programs compared with CPU implementations but also an opportunity to manage with the computing power and memory demanding nature imposed on computer hardware by ReaxFF MD. In this paper, we present the algorithms of GMD-Reax, the first GPU enabled ReaxFF MD program with significantly improved performance surpassing CPU implementations on desktop workstations. The performance of GMD-Reax has been benchmarked on a PC equipped with a NVIDIA C2050 GPU for coal pyrolysis simulation systems with atoms ranging from 1378 to 27,283. GMD-Reax achieved speedups as high as 12 times faster than Duin et al.'s FORTRAN codes in Lammps on 8 CPU cores and 6 times faster than the Lammps' C codes based on PuReMD in terms of the simulation time per time-step averaged over 100 steps. GMD-Reax could be used as a new and efficient computational tool for exploiting very complex molecular reactions via ReaxFF MD simulation on desktop workstations. Copyright © 2013 Elsevier Inc. All rights reserved.
Benchmarking Multilayer-HySEA model for landslide generated tsunami. HTHMP validation process.

NASA Astrophysics Data System (ADS)

Macias, J.; Escalante, C.; Castro, M. J.

2017-12-01

Landslide tsunami hazard may be dominant along significant parts of the coastline around the world, in particular in the USA, as compared to hazards from other tsunamigenic sources. This fact motivated NTHMP about the need of benchmarking models for landslide generated tsunamis, following the same methodology already used for standard tsunami models when the source is seismic. To perform the above-mentioned validation process, a set of candidate benchmarks were proposed. These benchmarks are based on a subset of available laboratory data sets for solid slide experiments and deformable slide experiments, and include both submarine and subaerial slides. A benchmark based on a historic field event (Valdez, AK, 1964) close the list of proposed benchmarks. A total of 7 benchmarks. The Multilayer-HySEA model including non-hydrostatic effects has been used to perform all the benchmarking problems dealing with laboratory experiments proposed in the workshop that was organized at Texas A&M University - Galveston, on January 9-11, 2017 by NTHMP. The aim of this presentation is to show some of the latest numerical results obtained with the Multilayer-HySEA (non-hydrostatic) model in the framework of this validation effort.Acknowledgements. This research has been partially supported by the Spanish Government Research project SIMURISK (MTM2015-70490-C02-01-R) and University of Malaga, Campus de Excelencia Internacional Andalucía Tech. The GPU computations were performed at the Unit of Numerical Methods (University of Malaga).
Development of a New 47-Group Library for the CASL Neutronics Simulators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Kang Seog; Williams, Mark L; Wiarda, Dorothea

The CASL core simulator MPACT is under development for the neutronics and thermal-hydraulics coupled simulation for the pressurized light water reactors. The key characteristics of the MPACT code include a subgroup method for resonance self-shielding, and a whole core solver with a 1D/2D synthesis method. The ORNL AMPX/SCALE code packages have been significantly improved to support various intermediate resonance self-shielding approximations such as the subgroup and embedded self-shielding methods. New 47-group AMPX and MPACT libraries based on ENDF/B-VII.0 have been generated for the CASL core simulator MPACT of which group structure comes from the HELIOS library. The new 47-group MPACTmore » library includes all nuclear data required for static and transient core simulations. This study discusses a detailed procedure to generate the 47-group AMPX and MPACT libraries and benchmark results for the VERA progression problems.« less
Optimization and benchmarking of a perturbative Metropolis Monte Carlo quantum mechanics/molecular mechanics program

NASA Astrophysics Data System (ADS)

Feldt, Jonas; Miranda, Sebastião; Pratas, Frederico; Roma, Nuno; Tomás, Pedro; Mata, Ricardo A.

2017-12-01

In this work, we present an optimized perturbative quantum mechanics/molecular mechanics (QM/MM) method for use in Metropolis Monte Carlo simulations. The model adopted is particularly tailored for the simulation of molecular systems in solution but can be readily extended to other applications, such as catalysis in enzymatic environments. The electrostatic coupling between the QM and MM systems is simplified by applying perturbation theory to estimate the energy changes caused by a movement in the MM system. This approximation, together with the effective use of GPU acceleration, leads to a negligible added computational cost for the sampling of the environment. Benchmark calculations are carried out to evaluate the impact of the approximations applied and the overall computational performance.
Optimization and benchmarking of a perturbative Metropolis Monte Carlo quantum mechanics/molecular mechanics program.

PubMed

Feldt, Jonas; Miranda, Sebastião; Pratas, Frederico; Roma, Nuno; Tomás, Pedro; Mata, Ricardo A

2017-12-28

In this work, we present an optimized perturbative quantum mechanics/molecular mechanics (QM/MM) method for use in Metropolis Monte Carlo simulations. The model adopted is particularly tailored for the simulation of molecular systems in solution but can be readily extended to other applications, such as catalysis in enzymatic environments. The electrostatic coupling between the QM and MM systems is simplified by applying perturbation theory to estimate the energy changes caused by a movement in the MM system. This approximation, together with the effective use of GPU acceleration, leads to a negligible added computational cost for the sampling of the environment. Benchmark calculations are carried out to evaluate the impact of the approximations applied and the overall computational performance.
To Lease or Not To Lease.

ERIC Educational Resources Information Center

Hamilton, William A.

1998-01-01

Thanks to previous bond issues, the Walled Lake (Michigan) Schools had a well-defined technology plan featuring staff development, student performance benchmarks, and rooms of outdated computers. After three bond issues failed, the district adopted leasing as an alternative. Their present three-year contract supplies 154 used computers and a…
Evolutionary Optimization of a Geometrically Refined Truss

NASA Technical Reports Server (NTRS)

Hull, P. V.; Tinker, M. L.; Dozier, G. V.

2007-01-01

Structural optimization is a field of research that has experienced noteworthy growth for many years. Researchers in this area have developed optimization tools to successfully design and model structures, typically minimizing mass while maintaining certain deflection and stress constraints. Numerous optimization studies have been performed to minimize mass, deflection, and stress on a benchmark cantilever truss problem. Predominantly traditional optimization theory is applied to this problem. The cross-sectional area of each member is optimized to minimize the aforementioned objectives. This Technical Publication (TP) presents a structural optimization technique that has been previously applied to compliant mechanism design. This technique demonstrates a method that combines topology optimization, geometric refinement, finite element analysis, and two forms of evolutionary computation: genetic algorithms and differential evolution to successfully optimize a benchmark structural optimization problem. A nontraditional solution to the benchmark problem is presented in this TP, specifically a geometrically refined topological solution. The design process begins with an alternate control mesh formulation, multilevel geometric smoothing operation, and an elastostatic structural analysis. The design process is wrapped in an evolutionary computing optimization toolset.
Neutron Reference Benchmark Field Specification: ACRR Free-Field Environment (ACRR-FF-CC-32-CL).

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vega, Richard Manuel; Parma, Edward J.; Griffin, Patrick J.

2015-07-01

This report was put together to support the International Atomic Energy Agency (IAEA) REAL- 2016 activity to validate the dosimetry community’s ability to use a consistent set of activation data and to derive consistent spectral characterizations. The report captures details of integral measurements taken in the Annular Core Research Reactor (ACRR) central cavity free-field reference neutron benchmark field. The field is described and an “a priori” calculated neutron spectrum is reported, based on MCNP6 calculations, and a subject matter expert (SME) based covariance matrix is given for this “a priori” spectrum. The results of 31 integral dosimetry measurements in themore » neutron field are reported.« less

Notes on numerical reliability of several statistical analysis programs

USGS Publications Warehouse

Landwehr, J.M.; Tasker, Gary D.

1999-01-01

This report presents a benchmark analysis of several statistical analysis programs currently in use in the USGS. The benchmark consists of a comparison between the values provided by a statistical analysis program for variables in the reference data set ANASTY and their known or calculated theoretical values. The ANASTY data set is an amendment of the Wilkinson NASTY data set that has been used in the statistical literature to assess the reliability (computational correctness) of calculated analytical results.
Affinity-aware checkpoint restart

DOE PAGES

Saini, Ajay; Rezaei, Arash; Mueller, Frank; ...

2014-12-08

Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may be restarted on different cores, and in case of non-uniform memory architectures (NUMA), quite common today, memory pages associated with tasks on a NUMA node may be associated with a different NUMA node after restart. Here, this work contributes a novel design technique for C/R mechanisms to preserve task-to-core mapsmore » and NUMA node specific page affinities across restarts. Experimental results with BLCR, a C/R mechanism, enhanced with affinity awareness demonstrate significant performance benefits of 37%-73% for the NAS Parallel Benchmark codes and 6-12% for NAMD with negligible overheads instead of up to nearly four times longer an execution times without affinity-aware restarts on 16 cores.« less
Affinity-aware checkpoint restart

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saini, Ajay; Rezaei, Arash; Mueller, Frank

Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may be restarted on different cores, and in case of non-uniform memory architectures (NUMA), quite common today, memory pages associated with tasks on a NUMA node may be associated with a different NUMA node after restart. Here, this work contributes a novel design technique for C/R mechanisms to preserve task-to-core mapsmore » and NUMA node specific page affinities across restarts. Experimental results with BLCR, a C/R mechanism, enhanced with affinity awareness demonstrate significant performance benefits of 37%-73% for the NAS Parallel Benchmark codes and 6-12% for NAMD with negligible overheads instead of up to nearly four times longer an execution times without affinity-aware restarts on 16 cores.« less
Fingerprinting sea-level variations in response to continental ice loss: a benchmark exercise

NASA Astrophysics Data System (ADS)

Barletta, Valentina R.; Spada, Giorgio; Riva, Riccardo E. M.; James, Thomas S.; Simon, Karen M.; van der Wal, Wouter; Martinec, Zdenek; Klemann, Volker; Olsson, Per-Anders; Hagedoorn, Jan; Stocchi, Paolo; Vermeersen, Bert

2013-04-01

Understanding the response of the Earth to the waxing and waning ice sheets is crucial in various contexts, ranging from the interpretation of modern satellite geodetic measurements to the projections of future sea level trends in response to climate change. All the processes accompanying Glacial Isostatic Adjustment (GIA) can be described solving the so-called Sea Level Equation (SLE), an integral equation that accounts for the interactions between the ice sheets, the solid Earth, and the oceans. Modern approaches to the SLE are based on various techniques that range from purely analytical formulations to fully numerical methods. Here we present the results of a benchmark exercise of independently developed codes designed to solve the SLE. The study involves predictions of current sea level changes due to present-day ice mass loss. In spite of the differences in the methods employed, the comparison shows that a significant number of GIA modellers can reproduce their sea-level computations within 2% for well defined, large-scale present-day ice mass changes. Smaller and more detailed loads need further and dedicated benchmarking and high resolution computation. This study shows how the details of the implementation and the inputs specifications are an important, and often underappreciated, aspect. Hence this represents a step toward the assessment of reliability of sea level projections obtained with benchmarked SLE codes.
I-NERI Quarterly Technical Report (April 1 to June 30, 2005)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang Oh; Prof. Hee Cheon NO; Prof. John Lee

2005-06-01

The objective of this Korean/United States/laboratory/university collaboration is to develop new advanced computational methods for safety analysis codes for very-high-temperature gas-cooled reactors (VHTGRs) and numerical and experimental validation of these computer codes. This study consists of five tasks for FY-03: (1) development of computational methods for the VHTGR, (2) theoretical modification of aforementioned computer codes for molecular diffusion (RELAP5/ATHENA) and modeling CO and CO2 equilibrium (MELCOR), (3) development of a state-of-the-art methodology for VHTGR neutronic analysis and calculation of accurate power distributions and decay heat deposition rates, (4) reactor cavity cooling system experiment, and (5) graphite oxidation experiment. Second quartermore » of Year 3: (A) Prof. NO and Kim continued Task 1. As a further plant application of GAMMA code, we conducted two analyses: IAEA GT-MHR benchmark calculation for LPCC and air ingress analysis for PMR 600MWt. The GAMMA code shows comparable peak fuel temperature trend to those of other country codes. The analysis results for air ingress show much different trend from that of previous PBR analysis: later onset of natural circulation and less significant rise in graphite temperature. (B) Prof. Park continued Task 2. We have designed new separate effect test device having same heat transfer area and different diameter and total number of U-bands of air cooling pipe. New design has smaller pressure drop in the air cooling pipe than the previous one as designed with larger diameter and less number of U-bands. With the device, additional experiments have been performed to obtain temperature distributions of the water tank, the surface and the center of cooling pipe on axis. The results will be used to optimize the design of SNU-RCCS. (C) Prof. NO continued Task 3. The experimental work of air ingress is going on without any concern: With nuclear graphite IG-110, various kinetic parameters and reaction rates for the C/CO2 reaction were measured. Then, the rates of C/CO2 reaction were compared to the ones of C/O2 reaction. The rate equation for C/CO2 has been developed. (D) INL added models to RELAP5/ATHENA to cacilate the chemical reactions in a VHTR during an air ingress accident. Limited testing of the models indicate that they are calculating a correct special distribution in gas compositions. (E) INL benchmarked NACOK natural circulation data. (F) Professor Lee et al at the University of Michigan (UM) Task 5. The funding was received from the DOE Richland Office at the end of May and the subcontract paperwork was delivered to the UM on the sixth of June. The objective of this task is to develop a state of the art neutronics model for determining power distributions and decay heat deposition rates in a VHTGR core. Our effort during the reporting period covered reactor physics analysis of coated particles and coupled nuclear-thermal-hydraulic (TH) calculations, together with initial calculations for decay heat deposition rates in the core.« less
Evaluation of the Pool Critical Assembly Benchmark with Explicitly-Modeled Geometry using MCNP6

DOE PAGES

Kulesza, Joel A.; Martz, Roger Lee

2017-03-01

Despite being one of the most widely used benchmarks for qualifying light water reactor (LWR) radiation transport methods and data, no benchmark calculation of the Oak Ridge National Laboratory (ORNL) Pool Critical Assembly (PCA) pressure vessel wall benchmark facility (PVWBF) using MCNP6 with explicitly modeled core geometry exists. As such, this paper provides results for such an analysis. First, a criticality calculation is used to construct the fixed source term. Next, ADVANTG-generated variance reduction parameters are used within the final MCNP6 fixed source calculations. These calculations provide unadjusted dosimetry results using three sets of dosimetry reaction cross sections of varyingmore » ages (those packaged with MCNP6, from the IRDF-2002 multi-group library, and from the ACE-formatted IRDFF v1.05 library). These results are then compared to two different sets of measured reaction rates. The comparison agrees in an overall sense within 2% and on a specific reaction- and dosimetry location-basis within 5%. Except for the neptunium dosimetry, the individual foil raw calculation-to-experiment comparisons usually agree within 10% but is typically greater than unity. Finally, in the course of developing these calculations, geometry that has previously not been completely specified is provided herein for the convenience of future analysts.« less
Higher-Order Adaptive Finite-Element Methods for Kohn-Sham Density Functional Theory

DTIC Science & Technology

2012-07-03

systems studied, we observe diminishing returns in computational savings beyond the sixth-order for accuracies commensurate with chemi- cal accuracy...calculations. Further, we demonstrate the capability of the proposed approach to compute the electronic structure of materials systems contain- ing a...benchmark systems studied, we observe diminishing returns in computational savings beyond the sixth-order for accuracies commensurate with chemical accuracy
Classical and modern control strategies for the deployment, reconfiguration, and station-keeping of the National Aeronautics and Space Administration (NASA) Benchmark Tetrahedron Constellation

NASA Astrophysics Data System (ADS)

Capo-Lugo, Pedro A.

Formation flying consists of multiple spacecraft orbiting in a required configuration about a planet or through Space. The National Aeronautics and Space Administration (NASA) Benchmark Tetrahedron Constellation is one of the proposed constellations to be launched in the year 2009 and provides the motivation for this investigation. The problem that will be researched here consists of three stages. The first stage contains the deployment of the satellites; the second stage is the reconfiguration process to transfer the satellites through different specific sizes of the NASA benchmark problem; and, the third stage is the station-keeping procedure for the tetrahedron constellation. Every stage contains different control schemes and transfer procedures to obtain/maintain the proposed tetrahedron constellation. In the first stage, the deployment procedure will depend on a combination of two techniques in which impulsive maneuvers and a digital controller are used to deploy the satellites and to maintain the tetrahedron constellation at the following apogee point. The second stage that corresponds to the reconfiguration procedure shows a different control scheme in which the intelligent control systems are implemented to perform this procedure. In this research work, intelligent systems will eliminate the use of complex mathematical models and will reduce the computational time to perform different maneuvers. Finally, the station-keeping process, which is the third stage of this research problem, will be implemented with a two-level hierarchical control scheme to maintain the separation distance constraints of the NASA Benchmark Tetrahedron Constellation. For this station-keeping procedure, the system of equations defining the dynamics of a pair of satellites is transformed to take in account the perturbation due to the oblateness of the Earth and the disturbances due to solar pressure. The control procedures used in this research will be transformed from a continuous control system to a digital control system which will simplify the implementation into the computer onboard the satellite. In addition, this research will show an introductory chapter on attitude dynamics that can be used to maintain the orientation of the satellites, and an adaptive intelligent control scheme will be proposed to maintain the desired orientation of the spacecraft. In conclusion, a solution for the dynamics of the NASA Benchmark Tetrahedron Constellation will be presented in this research work. The main contribution of this work is the use of discrete control schemes, impulsive maneuvers, and intelligent control schemes that can be used to reduce the computational time in which these control schemes can be easily implemented in the computer onboard the satellite. These contributions are explained through the deployment, reconfiguration, and station-keeping process of the proposed NASA Benchmark Tetrahedron Constellation.
Assessment of competency in endoscopy: establishing and validating generalizable competency benchmarks for colonoscopy.

PubMed

Sedlack, Robert E; Coyle, Walter J

2016-03-01

The Mayo Colonoscopy Skills Assessment Tool (MCSAT) has previously been used to describe learning curves and competency benchmarks for colonoscopy; however, these data were limited to a single training center. The newer Assessment of Competency in Endoscopy (ACE) tool is a refinement of the MCSAT tool put forth by the Training Committee of the American Society for Gastrointestinal Endoscopy, intended to include additional important quality metrics. The goal of this study is to validate the changes made by updating this tool and establish more generalizable and reliable learning curves and competency benchmarks for colonoscopy by examining a larger national cohort of trainees. In a prospective, multicenter trial, gastroenterology fellows at all stages of training had their core cognitive and motor skills in colonoscopy assessed by staff. Evaluations occurred at set intervals of every 50 procedures throughout the 2013 to 2014 academic year. Skills were graded by using the ACE tool, which uses a 4-point grading scale defining the continuum from novice to competent. Average learning curves for each skill were established at each interval in training and competency benchmarks for each skill were established using the contrasting groups method. Ninety-three gastroenterology fellows at 10 U.S. academic institutions had 1061 colonoscopies assessed by using the ACE tool. Average scores of 3.5 were found to be inclusive of all minimal competency thresholds identified for each core skill. Cecal intubation times of less than 15 minutes and independent cecal intubation rates of 90% were also identified as additional competency thresholds during analysis. The average fellow achieved all cognitive and motor skill endpoints by 250 procedures, with >90% surpassing these thresholds by 300 procedures. Nationally generalizable learning curves for colonoscopy skills in gastroenterology fellows are described. Average ACE scores of 3.5, cecal intubation rates of 90%, and intubation times less than 15 minutes are recommended as minimal competency criteria. On average, it takes 250 procedures to achieve competence in colonoscopy. The thresholds found in this multicenter cohort by using the ACE tool are nearly identical to the previously established MCSAT benchmarks and are consistent with recent gastroenterology training recommendations but far higher than current training requirements in other specialties. Copyright © 2016 American Society for Gastrointestinal Endoscopy. Published by Elsevier Inc. All rights reserved.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Social Studies on the Outside Looking In: Redeeming the Neglected Curriculum

ERIC Educational Resources Information Center

Hermeling, Andrew Dyrli

2013-01-01

Many social studies teachers are nervous about the coming of Common Core State Standards. With so much emphasis placed on literacy, social studies teachers fear they will see content slashed to leave time for meeting English's non-fiction standards. Already reeling from a lack of attention from the benchmarks put in place by No Child Left Behind,…
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad

2013-07-09

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad

2013-02-12

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.
FDA Benchmark Medical Device Flow Models for CFD Validation.

PubMed

Malinauskas, Richard A; Hariharan, Prasanna; Day, Steven W; Herbertson, Luke H; Buesen, Martin; Steinseifer, Ulrich; Aycock, Kenneth I; Good, Bryan C; Deutsch, Steven; Manning, Keefe B; Craven, Brent A

Computational fluid dynamics (CFD) is increasingly being used to develop blood-contacting medical devices. However, the lack of standardized methods for validating CFD simulations and blood damage predictions limits its use in the safety evaluation of devices. Through a U.S. Food and Drug Administration (FDA) initiative, two benchmark models of typical device flow geometries (nozzle and centrifugal blood pump) were tested in multiple laboratories to provide experimental velocities, pressures, and hemolysis data to support CFD validation. In addition, computational simulations were performed by more than 20 independent groups to assess current CFD techniques. The primary goal of this article is to summarize the FDA initiative and to report recent findings from the benchmark blood pump model study. Discrepancies between CFD predicted velocities and those measured using particle image velocimetry most often occurred in regions of flow separation (e.g., downstream of the nozzle throat, and in the pump exit diffuser). For the six pump test conditions, 57% of the CFD predictions of pressure head were within one standard deviation of the mean measured values. Notably, only 37% of all CFD submissions contained hemolysis predictions. This project aided in the development of an FDA Guidance Document on factors to consider when reporting computational studies in medical device regulatory submissions. There is an accompanying podcast available for this article. Please visit the journal's Web site (www.asaiojournal.com) to listen.
ZPR-6 assembly 7 high {sup 240}Pu core experiments : a fast reactor core with mixed (Pu,U)-oxide fuel and a centeral high{sup 240}Pu zone.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lell, R. M.; Morman, J. A.; Schaefer, R.W.

ZPR-6 Assembly 7 (ZPR-6/7) encompasses a series of experiments performed at the ZPR-6 facility at Argonne National Laboratory in 1970 and 1971 as part of the Demonstration Reactor Benchmark Program (Reference 1). Assembly 7 simulated a large sodium-cooled LMFBR with mixed oxide fuel, depleted uranium radial and axial blankets, and a core H/D near unity. ZPR-6/7 was designed to test fast reactor physics data and methods, so configurations in the Assembly 7 program were as simple as possible in terms of geometry and composition. ZPR-6/7 had a very uniform core assembled from small plates of depleted uranium, sodium, iron oxide,more » U{sub 3}O{sub 8} and Pu-U-Mo alloy loaded into stainless steel drawers. The steel drawers were placed in square stainless steel tubes in the two halves of a split table machine. ZPR-6/7 had a simple, symmetric core unit cell whose neutronic characteristics were dominated by plutonium and {sup 238}U. The core was surrounded by thick radial and axial regions of depleted uranium to simulate radial and axial blankets and to isolate the core from the surrounding room. The ZPR-6/7 program encompassed 139 separate core loadings which include the initial approach to critical and all subsequent core loading changes required to perform specific experiments and measurements. In this context a loading refers to a particular configuration of fueled drawers, radial blanket drawers and experimental equipment (if present) in the matrix of steel tubes. Two principal core configurations were established. The uniform core (Loadings 1-84) had a relatively uniform core composition. The high {sup 240}Pu core (Loadings 85-139) was a variant on the uniform core. The plutonium in the Pu-U-Mo fuel plates in the uniform core contains 11% {sup 240}Pu. In the high {sup 240}Pu core, all Pu-U-Mo plates in the inner core region (central 61 matrix locations per half of the split table machine) were replaced by Pu-U-Mo plates containing 27% {sup 240}Pu in the plutonium component to construct a central core zone with a composition closer to that in an LMFBR core with high burnup. The high {sup 240}Pu configuration was constructed for two reasons. First, the composition of the high {sup 240}Pu zone more closely matched the composition of LMFBR cores anticipated in design work in 1970. Second, comparison of measurements in the ZPR-6/7 uniform core with corresponding measurements in the high {sup 240}Pu zone provided an assessment of some of the effects of long-term {sup 240}Pu buildup in LMFBR cores. The uniform core version of ZPR-6/7 is evaluated in ZPR-LMFR-EXP-001. This document only addresses measurements in the high {sup 240}Pu core version of ZPR-6/7. Many types of measurements were performed as part of the ZPR-6/7 program. Measurements of criticality, sodium void worth, control rod worth and reaction rate distributions in the high {sup 240}Pu core configuration are evaluated here. For each category of measurements, the uncertainties are evaluated, and benchmark model data are provided.« less
Evaluating the Information Power Grid using the NAS Grid Benchmarks

NASA Technical Reports Server (NTRS)

VanderWijngaartm Rob F.; Frumkin, Michael A.

2004-01-01

The NAS Grid Benchmarks (NGB) are a collection of synthetic distributed applications designed to rate the performance and functionality of computational grids. We compare several implementations of the NGB to determine programmability and efficiency of NASA's Information Power Grid (IPG), whose services are mostly based on the Globus Toolkit. We report on the overheads involved in porting existing NGB reference implementations to the IPG. No changes were made to the component tasks of the NGB can still be improved.
Computer System Resource Requirements of Novice Programming Students.

ERIC Educational Resources Information Center

Nutt, Gary J.

The characteristics of jobs that constitute the mix for lower division FORTRAN classes in a university were investigated. Samples of these programs were also benchmarked on a larger central site computer and two minicomputer systems. It was concluded that a carefully chosen minicomputer system could offer service at least the equivalent of the…
Benchmarking fully analytic DFT force fields for vibrational spectroscopy: A study on halogenated compounds

NASA Astrophysics Data System (ADS)

Pietropolli Charmet, Andrea; Cornaton, Yann

2018-05-01

This work presents an investigation of the theoretical predictions yielded by anharmonic force fields having the cubic and quartic force constants are computed analytically by means of density functional theory (DFT) using the recursive scheme developed by M. Ringholm et al. (J. Comput. Chem. 35 (2014) 622). Different functionals (namely B3LYP, PBE, PBE0 and PW86x) and basis sets were used for calculating the anharmonic vibrational spectra of two halomethanes. The benchmark analysis carried out demonstrates the reliability and overall good performances offered by hybrid approaches, where the harmonic data obtained at the coupled cluster with single and double excitations level of theory augmented by a perturbational estimate of the effects of connected triple excitations, CCSD(T), are combined with the fully analytic higher order force constants yielded by DFT functionals. These methods lead to reliable and computationally affordable calculations of anharmonic vibrational spectra with an accuracy comparable to that yielded by hybrid force fields having the anharmonic force fields computed at second order Møller-Plesset perturbation theory (MP2) level of theory using numerical differentiation but without the corresponding potential issues related to computational costs and numerical errors.
Development and Implementation of CFD-Informed Models for the Advanced Subchannel Code CTF

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blyth, Taylor S.; Avramova, Maria

The research described in this PhD thesis contributes to the development of efficient methods for utilization of high-fidelity models and codes to inform low-fidelity models and codes in the area of nuclear reactor core thermal-hydraulics. The objective is to increase the accuracy of predictions of quantities of interests using high-fidelity CFD models while preserving the efficiency of low-fidelity subchannel core calculations. An original methodology named Physics- based Approach for High-to-Low Model Information has been further developed and tested. The overall physical phenomena and corresponding localized effects, which are introduced by the presence of spacer grids in light water reactor (LWR)more » cores, are dissected in corresponding four building basic processes, and corresponding models are informed using high-fidelity CFD codes. These models are a spacer grid-directed cross-flow model, a grid-enhanced turbulent mixing model, a heat transfer enhancement model, and a spacer grid pressure loss model. The localized CFD-models are developed and tested using the CFD code STAR-CCM+, and the corresponding global model development and testing in sub-channel formulation is performed in the thermal- hydraulic subchannel code CTF. The improved CTF simulations utilize data-files derived from CFD STAR-CCM+ simulation results covering the spacer grid design desired for inclusion in the CTF calculation. The current implementation of these models is examined and possibilities for improvement and further development are suggested. The validation experimental database is extended by including the OECD/NRC PSBT benchmark data. The outcome is an enhanced accuracy of CTF predictions while preserving the computational efficiency of a low-fidelity subchannel code.« less
Development and Implementation of CFD-Informed Models for the Advanced Subchannel Code CTF

NASA Astrophysics Data System (ADS)

Blyth, Taylor S.

The research described in this PhD thesis contributes to the development of efficient methods for utilization of high-fidelity models and codes to inform low-fidelity models and codes in the area of nuclear reactor core thermal-hydraulics. The objective is to increase the accuracy of predictions of quantities of interests using high-fidelity CFD models while preserving the efficiency of low-fidelity subchannel core calculations. An original methodology named Physics-based Approach for High-to-Low Model Information has been further developed and tested. The overall physical phenomena and corresponding localized effects, which are introduced by the presence of spacer grids in light water reactor (LWR) cores, are dissected in corresponding four building basic processes, and corresponding models are informed using high-fidelity CFD codes. These models are a spacer grid-directed cross-flow model, a grid-enhanced turbulent mixing model, a heat transfer enhancement model, and a spacer grid pressure loss model. The localized CFD-models are developed and tested using the CFD code STAR-CCM+, and the corresponding global model development and testing in sub-channel formulation is performed in the thermal-hydraulic subchannel code CTF. The improved CTF simulations utilize data-files derived from CFD STAR-CCM+ simulation results covering the spacer grid design desired for inclusion in the CTF calculation. The current implementation of these models is examined and possibilities for improvement and further development are suggested. The validation experimental database is extended by including the OECD/NRC PSBT benchmark data. The outcome is an enhanced accuracy of CTF predictions while preserving the computational efficiency of a low-fidelity subchannel code.

Ultrafast light matter interaction in CdSe/ZnS core-shell quantum dots

NASA Astrophysics Data System (ADS)

Yadav, Rajesh Kumar; Sharma, Rituraj; Mondal, Anirban; Adarsh, K. V.

2018-04-01

Core-shell quantum dot are imperative for carrier (electron and holes) confinement in core/shell, which provides a stage to explore the linear and nonlinear optical phenomena at the nanoscalelimit. Here we present a comprehensive study of ultrafast excitation dynamics and nonlinear optical absorption of CdSe/ZnS core shell quantum dot with the help of ultrafast spectroscopy. Pump-probe and time-resolved measurements revealed the drop of trapping at CdSe surface due to the presence of the ZnS shell, which makes more efficient photoluminescence. We have carried out femtosecond transient absorption studies of the CdSe/ZnS core-shell quantum dot by irradiation with 400 nm laser light, monitoring the transients in the visible region. The optical nonlinearity of the core-shell quantum dot studied by using the Z-scan technique with 120 fs pulses at the wavelengths of 800 nm. The value of two photon absorption coefficients (β) of core-shell QDs extracted as80cm/GW, and it shows excellent benchmark for the optical limiting onset of 2.5GW/cm2 with the low limiting differential transmittance of 0.10, that is an order of magnitude better than graphene based materials.
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

NASA Astrophysics Data System (ADS)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
Towards the development of a consensual chronostratigraphy for Arctic Ocean sedimentary records

NASA Astrophysics Data System (ADS)

Hillaire-Marcel, Claude; de Vernal, Anne; Polyak, Leonid; Stein, Rüdiger; Maccali, Jenny; Jacobel, Allison; Cuny, Kristan

2017-04-01

Deciphering Arctic paleoceanograpy and paleoclimate, and linking it to global marine and atmospheric records is much needed for comprehending the Earth's climate history. However, this task is hampered by multiple problems with dating Arctic Ocean sedimentary records related notably to low and highly variable sedimentation rates, scarce and discontinuous biogenic proxies due to low productivity and/or poor preservation, and difficulties correlating regional records to global stacks (e.g., paleomagnetic). Despite recent advances in developing an Arctic Ocean sedimentary stratigraphy, and attempts at setting radiometric benchmark ages of respectively 300 and 150 ka, based on the final decay of 230Th and 231Pa excesses (Thxs, Paxs) (Not et al., 2008), consensual age models are still missing, preventing reliable integration of Arctic records in a global paleoclimatic scheme. Here, we intend to illustrate these issues by comparing consistent Thxs-Paxs chronostratigraphic records from the Mendeleev-Alpha and Lomonosov ridges with the currently used age model based on climatostratigraphic interpretation of sedimentary records (e.g., Polyak et al., 2009; Stein et al., 2010). Data used were collected from the 2005 HOTRAX core MC-11 (northern Mendeleev Ridge) and the 2014 Polarstern core PS87-30 (Lomonosov Ridge). Total collapse depths of Thxs and Paxs are observed by a factor of 3 deeper in core PS87-30 vs core MC-11, indicating average sedimentation rates 3 times higher at the Lomonosov Ridge site. Litho-biostratigraphic markers, such as foraminiferal peaks and manganese-enriched layers, show a similar pattern, with their occurrence 3 times deeper in core PS87-30 than in core MC-11. These very consistent downcore features highlight a gaping difference between the benchmark ages assigned to the total decay of Paxs and Thxs and the current age model based on climatostratigraphic approach involving significantly higher sedimentation rates. This discrepancy begs for its in-depth investigation that would potentially result in a development of the consensual chronostratigraphy for Quaternary Arctic Ocean sediments, critical for integrating the Arctic into global paleoclimatic history.
Relevance of East African Drill Cores to Human Evolution: the Case of the Olorgesailie Drilling Project

NASA Astrophysics Data System (ADS)

Potts, R.

2016-12-01

Drill cores reaching the local basement of the East African Rift were obtained in 2012 south of the Olorgesailie Basin, Kenya, 20 km from excavations that document key benchmarks in the origin of Homo sapiens. Sediments totaling 216 m were obtained from two drilling locations representing the past 1 million years. The cores were acquired to build a detailed environmental record spatially associated with the transition from Acheulean to Middle Stone Age technology and extensive turnover in mammalian species. The project seeks precise tests of how climate dynamics and tectonic events were linked with these transitions. Core lithology (A.K. Behrensmeyer), geochronology (A. Deino), diatoms (R.B. Owen), phytoliths (R. Kinyanjui), geochemistry (N. Rabideaux, D. Deocampo), among other indicators, show evidence of strong environmental variability in agreement with predicted high-eccentricity modulation of climate during the evolutionary transitions. Increase in hominin mobility, elaboration of symbolic behavior, and concurrent turnover in mammalian species indicating heightened adaptability to unpredictable ecosystems, point to a direct link between the evolutionary transitions and the landscape dynamics reflected in the Olorgesailie drill cores. For paleoanthropologists and Earth scientists, any link between evolutionary transitions and environmental dynamics requires robust evolutionary datasets pertinent to how selection, extinction, population divergence, and other evolutionary processes were impacted by the dynamics uncovered in drill core studies. Fossil and archeological data offer a rich source of data and of robust environment-evolution explanations that must be integrated into efforts by Earth scientists who seek to examine high-resolution climate records of human evolution. Paleoanthropological examples will illustrate the opportunities that exist for connecting evolutionary benchmarks to the data obtained from drilled African muds. Project members: R. Potts, A.K. Behrensmeyer, E. Beverly, K. Brady, J. Bright, E. Brown, J. Clark, A. Cohen, A. Deino, P. deMenocal, D. Deocampo, R. Dommain, J.T. Faith, J. King, R. Kinyanjui, N. Levin, J. Moerman, V. Muiruri, A. Noren, R.B. Owen, N. Rabideaux, R. Renaut, S. Rucina, J. Russell, J. Scott, M. Stockhecke, K. Uno
CPE--A New Perspective: The Impact of the Technology Revolution. Proceedings of the Computer Performance Evaluation Users Group Meeting (19th, San Francisco, California, October 25-28, 1983). Final Report. Reports on Computer Science and Technology.

ERIC Educational Resources Information Center

Mobray, Deborah, Ed.

Papers on local area networks (LANs), modelling techniques, software improvement, capacity planning, software engineering, microcomputers and end user computing, cost accounting and chargeback, configuration and performance management, and benchmarking presented at this conference include: (1) "Theoretical Performance Analysis of Virtual…
The International Conference on Vector and Parallel Computing (2nd)

DTIC Science & Technology

1989-01-17

Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
High-Order Methods for Computational Physics

DTIC Science & Technology

1999-03-01

computation is running in 278 Ronald D. Henderson parallel. Instead we use the concept of a voxel database (VDB) of geometric positions in the mesh [85...processor 0 Fig. 4.19. Connectivity and communications axe established by building a voxel database (VDB) of positions. A VDB maps each position to a...studies such as the highly accurate stability computations considered help expand the database for this benchmark problem. The two-dimensional linear
Performance Benchmark for a Prismatic Flow Solver

DTIC Science & Technology

2007-03-26

Gauss- Seidel (LU-SGS) implicit method is used for time integration to reduce the computational time. A one-equation turbulence model by Goldberg and...numerical flux computations. The Lower-Upper-Symmetric Gauss- Seidel (LU-SGS) implicit method [1] is used for time integration to reduce the...Sharov, D. and Nakahashi, K., “Reordering of Hybrid Unstructured Grids for Lower-Upper Symmetric Gauss- Seidel Computations,” AIAA Journal, Vol. 36
USAR Recruiting Success Factors.

DTIC Science & Technology

1987-12-01

scores, were used to predict production scores for each recruiter. Benchmark Achievement Scores (BAS) were computed by dividing total production bv...performance compnred to thil avigle would b ,’asl,_cr t.o compute . SAS correlated high y witi IIAS r .96 . so tihe two scores were practically equivalent...asked to make a sales pitch to a prospective enlistee about the benefits of Army life. Presen- tations were scored by computing the ratio of the
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Turnkey CAD/CAM selection and evaluation

NASA Technical Reports Server (NTRS)

Moody, T.

1980-01-01

The methodology to be followed in evaluating and selecting a computer system for manufacturing applications is discussed. Main frames and minicomputers are considered. Benchmark evaluations, demonstrations, and contract negotiations are discussed.
Neutron Reference Benchmark Field Specification: ACRR 44 Inch Lead-Boron (LB44) Bucket Environment (ACRR-LB44-CC-32-CL).

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vega, Richard Manuel; Parma, Edward J.; Griffin, Patrick J.

2015-07-01

This report was put together to support the International Atomic Energy Agency (IAEA) REAL- 2016 activity to validate the dosimetry community’s ability to use a consistent set of activation data and to derive consistent spectral characterizations. The report captures details of integral measurements taken in the Annular Core Research Reactor (ACRR) central cavity with the 44 inch Lead-Boron (LB44) bucket, reference neutron benchmark field. The field is described and an “a priori” calculated neutron spectrum is reported, based on MCNP6 calculations, and a subject matter expert (SME) based covariance matrix is given for this “a priori” spectrum. The results ofmore » 31 integral dosimetry measurements in the neutron field are reported.« less
Neutron Reference Benchmark Field Specifications: ACRR Polyethylene-Lead-Graphite (PLG) Bucket Environment (ACRR-PLG-CC-32-CL).

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vega, Richard Manuel; Parm, Edward J.; Griffin, Patrick J.

2015-07-01

This report was put together to support the International Atomic Energy Agency (IAEA) REAL- 2016 activity to validate the dosimetry community’s ability to use a consistent set of activation data and to derive consistent spectral characterizations. The report captures details of integral measurements taken in the Annular Core Research Reactor (ACRR) central cavity with the Polyethylene-Lead-Graphite (PLG) bucket, reference neutron benchmark field. The field is described and an “a priori” calculated neutron spectrum is reported, based on MCNP6 calculations, and a subject matter expert (SME) based covariance matrix is given for this “a priori” spectrum. The results of 37 integralmore » dosimetry measurements in the neutron field are reported.« less
Simulation of Benchmark Cases with the Terminal Area Simulation System (TASS)

NASA Technical Reports Server (NTRS)

Ahmad, Nash'at; Proctor, Fred

2011-01-01

The hydrodynamic core of the Terminal Area Simulation System (TASS) is evaluated against different benchmark cases. In the absence of closed form solutions for the equations governing atmospheric flows, the models are usually evaluated against idealized test cases. Over the years, various authors have suggested a suite of these idealized cases which have become standards for testing and evaluating the dynamics and thermodynamics of atmospheric flow models. In this paper, simulations of three such cases are described. In addition, the TASS model is evaluated against a test case that uses an exact solution of the Navier-Stokes equations. The TASS results are compared against previously reported simulations of these banchmark cases in the literature. It is demonstrated that the TASS model is highly accurate, stable and robust.
A benchmark for subduction zone modeling

NASA Astrophysics Data System (ADS)

van Keken, P.; King, S.; Peacock, S.

2003-04-01

Our understanding of subduction zones hinges critically on the ability to discern its thermal structure and dynamics. Computational modeling has become an essential complementary approach to observational and experimental studies. The accurate modeling of subduction zones is challenging due to the unique geometry, complicated rheological description and influence of fluid and melt formation. The complicated physics causes problems for the accurate numerical solution of the governing equations. As a consequence it is essential for the subduction zone community to be able to evaluate the ability and limitations of various modeling approaches. The participants of a workshop on the modeling of subduction zones, held at the University of Michigan at Ann Arbor, MI, USA in 2002, formulated a number of case studies to be developed into a benchmark similar to previous mantle convection benchmarks (Blankenbach et al., 1989; Busse et al., 1991; Van Keken et al., 1997). Our initial benchmark focuses on the dynamics of the mantle wedge and investigates three different rheologies: constant viscosity, diffusion creep, and dislocation creep. In addition we investigate the ability of codes to accurate model dynamic pressure and advection dominated flows. Proceedings of the workshop and the formulation of the benchmark are available at www.geo.lsa.umich.edu/~keken/subduction02.html We strongly encourage interested research groups to participate in this benchmark. At Nice 2003 we will provide an update and first set of benchmark results. Interested researchers are encouraged to contact one of the authors for further details.
Performance of Landslide-HySEA tsunami model for NTHMP benchmarking validation process

NASA Astrophysics Data System (ADS)

Macias, Jorge

2017-04-01

In its FY2009 Strategic Plan, the NTHMP required that all numerical tsunami inundation models be verified as accurate and consistent through a model benchmarking process. This was completed in 2011, but only for seismic tsunami sources and in a limited manner for idealized solid underwater landslides. Recent work by various NTHMP states, however, has shown that landslide tsunami hazard may be dominant along significant parts of the US coastline, as compared to hazards from other tsunamigenic sources. To perform the above-mentioned validation process, a set of candidate benchmarks were proposed. These benchmarks are based on a subset of available laboratory date sets for solid slide experiments and deformable slide experiments, and include both submarine and subaerial slides. A benchmark based on a historic field event (Valdez, AK, 1964) close the list of proposed benchmarks. The Landslide-HySEA model has participated in the workshop that was organized at Texas A&M University - Galveston, on January 9-11, 2017. The aim of this presentation is to show some of the numerical results obtained for Landslide-HySEA in the framework of this benchmarking validation/verification effort. Acknowledgements. This research has been partially supported by the Junta de Andalucía research project TESELA (P11-RNM7069), the Spanish Government Research project SIMURISK (MTM2015-70490-C02-01-R) and Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech. The GPU computations were performed at the Unit of Numerical Methods (University of Malaga).
Transfluxor circuit amplifies sensing current for computer memories

NASA Technical Reports Server (NTRS)

Milligan, G. C.

1964-01-01

To transfer data from the magnetic memory core to an independent core, a reliable sensing amplifier has been developed. Later the data in the independent core is transferred to the arithmetical section of the computer.
Numerical Investigations of the Benchmark Supercritical Wing in Transonic Flow

NASA Technical Reports Server (NTRS)

Chwalowski, Pawel; Heeg, Jennifer; Biedron, Robert T.

2017-01-01

This paper builds on the computational aeroelastic results published previously and generated in support of the second Aeroelastic Prediction Workshop for the NASA Benchmark Supercritical Wing (BSCW) configuration. The computational results are obtained using FUN3D, an unstructured grid Reynolds-Averaged Navier-Stokes solver developed at the NASA Langley Research Center. The analysis results show the effects of the temporal and spatial resolution, the coupling scheme between the flow and the structural solvers, and the initial excitation conditions on the numerical flutter onset. Depending on the free stream condition and the angle of attack, the above parameters do affect the flutter onset. Two conditions are analyzed: Mach 0.74 with angle of attack 0 and Mach 0.85 with angle of attack 5. The results are presented in the form of the damping values computed from the wing pitch angle response as a function of the dynamic pressure or in the form of dynamic pressure as a function of the Mach number.
Space station operating system study

NASA Technical Reports Server (NTRS)

Horn, Albert E.; Harwell, Morris C.

1988-01-01

The current phase of the Space Station Operating System study is based on the analysis, evaluation, and comparison of the operating systems implemented on the computer systems and workstations in the software development laboratory. Primary emphasis has been placed on the DEC MicroVMS operating system as implemented on the MicroVax II computer, with comparative analysis of the SUN UNIX system on the SUN 3/260 workstation computer, and to a limited extent, the IBM PC/AT microcomputer running PC-DOS. Some benchmark development and testing was also done for the Motorola MC68010 (VM03 system) before the system was taken from the laboratory. These systems were studied with the objective of determining their capability to support Space Station software development requirements, specifically for multi-tasking and real-time applications. The methodology utilized consisted of development, execution, and analysis of benchmark programs and test software, and the experimentation and analysis of specific features of the system or compilers in the study.
Development and Applications of Benchmark Examples for Static Delamination Propagation Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2013-01-01

The development and application of benchmark examples for the assessment of quasistatic delamination propagation capabilities was demonstrated for ANSYS (TradeMark) and Abaqus/Standard (TradeMark). The examples selected were based on finite element models of Double Cantilever Beam (DCB) and Mixed-Mode Bending (MMB) specimens. First, quasi-static benchmark results were created based on an approach developed previously. Second, the delamination was allowed to propagate under quasi-static loading from its initial location using the automated procedure implemented in ANSYS (TradeMark) and Abaqus/Standard (TradeMark). Input control parameters were varied to study the effect on the computed delamination propagation. Overall, the benchmarking procedure proved valuable by highlighting the issues associated with choosing the appropriate input parameters for the VCCT implementations in ANSYS® and Abaqus/Standard®. However, further assessment for mixed-mode delamination fatigue onset and growth is required. Additionally studies should include the assessment of the propagation capabilities in more complex specimens and on a structural level.

The Suite for Embedded Applications and Kernels

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-05-10

Many applications of high performance embedded computing are limited by performance or power bottlenecks. We havedesigned SEAK, a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions to these bottlenecks? and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) andgoal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user blackbox evaluation that can capture tradeoffs between performance, power, accuracy, size, and weight. The tradeoffs are especially informativemore » for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
Properties of model-averaged BMDLs: a study of model averaging in dichotomous response risk estimation.

PubMed

Wheeler, Matthew W; Bailer, A John

2007-06-01

Model averaging (MA) has been proposed as a method of accounting for model uncertainty in benchmark dose (BMD) estimation. The technique has been used to average BMD dose estimates derived from dichotomous dose-response experiments, microbial dose-response experiments, as well as observational epidemiological studies. While MA is a promising tool for the risk assessor, a previous study suggested that the simple strategy of averaging individual models' BMD lower limits did not yield interval estimators that met nominal coverage levels in certain situations, and this performance was very sensitive to the underlying model space chosen. We present a different, more computationally intensive, approach in which the BMD is estimated using the average dose-response model and the corresponding benchmark dose lower bound (BMDL) is computed by bootstrapping. This method is illustrated with TiO(2) dose-response rat lung cancer data, and then systematically studied through an extensive Monte Carlo simulation. The results of this study suggest that the MA-BMD, estimated using this technique, performs better, in terms of bias and coverage, than the previous MA methodology. Further, the MA-BMDL achieves nominal coverage in most cases, and is superior to picking the "best fitting model" when estimating the benchmark dose. Although these results show utility of MA for benchmark dose risk estimation, they continue to highlight the importance of choosing an adequate model space as well as proper model fit diagnostics.
Student Progress to Graduation in New York City High Schools: A Metric Designed by New Visions for Public Schools. Part I: Core Components

ERIC Educational Resources Information Center

Fairchild, Susan; Gunton, Brad; Donohue, Beverly; Berry, Carolyn; Genn, Ruth; Knevals, Jessica

2011-01-01

Students who achieve critical academic benchmarks such as high attendance rates, continuous levels of credit accumulation, and high grades have a greater likelihood of success throughout high school and beyond. However, keeping students on track toward meeting graduation requirements and quickly identifying students who are at risk of falling off…
Using Localized Survey Items to Augment Standardized Benchmarking Measures: A LibQUAL+[TM] Study

ERIC Educational Resources Information Center

Thompson, Bruce; Cook, Colleen; Kyrillidou, Martha

2006-01-01

The LibQUAL+[TM] protocol solicits open-ended comments from users with regard to library service quality, gathers data on 22 core items, and, at the option of individual libraries, also garners ratings on five items drawn from a pool of more than 100 choices selected by libraries. In this article, the relationship of scores on these locally…
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu

2012-03-01

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

PubMed

Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

2014-01-01

A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.
Benchmarking Undedicated Cloud Computing Providers for Analysis of Genomic Datasets

PubMed Central

Yazar, Seyhan; Gooden, George E. C.; Mackey, David A.; Hewitt, Alex W.

2014-01-01

A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5–78.2) for E.coli and 53.5% (95% CI: 34.4–72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5–303.1) and 173.9% (95% CI: 134.6–213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298
Benchmarking multimedia performance

NASA Astrophysics Data System (ADS)

Zandi, Ahmad; Sudharsanan, Subramania I.

1998-03-01

With the introduction of faster processors and special instruction sets tailored to multimedia, a number of exciting applications are now feasible on the desktops. Among these is the DVD playback consisting, among other things, of MPEG-2 video and Dolby digital audio or MPEG-2 audio. Other multimedia applications such as video conferencing and speech recognition are also becoming popular on computer systems. In view of this tremendous interest in multimedia, a group of major computer companies have formed, Multimedia Benchmarks Committee as part of Standard Performance Evaluation Corp. to address the performance issues of multimedia applications. The approach is multi-tiered with three tiers of fidelity from minimal to full compliant. In each case the fidelity of the bitstream reconstruction as well as quality of the video or audio output are measured and the system is classified accordingly. At the next step the performance of the system is measured. In many multimedia applications such as the DVD playback the application needs to be run at a specific rate. In this case the measurement of the excess processing power, makes all the difference. All these make a system level, application based, multimedia benchmark very challenging. Several ideas and methodologies for each aspect of the problems will be presented and analyzed.
Pynamic: the Python Dynamic Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, G L; Ahn, D H; de Supinksi, B R

2007-07-10

Python is widely used in scientific computing to facilitate application development and to support features such as computational steering. Making full use of some of Python's popular features, which improve programmer productivity, leads to applications that access extremely high numbers of dynamically linked libraries (DLLs). As a result, some important Python-based applications severely stress a system's dynamic linking and loading capabilities and also cause significant difficulties for most development environment tools, such as debuggers. Furthermore, using the Python paradigm for large scale MPI-based applications can create significant file IO and further stress tools and operating systems. In this paper, wemore » present Pynamic, the first benchmark program to support configurable emulation of a wide-range of the DLL usage of Python-based applications for large scale systems. Pynamic has already accurately reproduced system software and tool issues encountered by important large Python-based scientific applications on our supercomputers. Pynamic provided insight for our system software and tool vendors, and our application developers, into the impact of several design decisions. As we describe the Pynamic benchmark, we will highlight some of the issues discovered in our large scale system software and tools using Pynamic.« less
Benchmark radar targets for the validation of computational electromagnetics programs

NASA Technical Reports Server (NTRS)

Woo, Alex C.; Wang, Helen T. G.; Schuh, Michael J.; Sanders, Michael L.

1993-01-01

Results are presented of a set of computational electromagnetics validation measurements referring to three-dimensional perfectly conducting smooth targets, performed for the Electromagnetic Code Consortium. Plots are presented for both the low- and high-frequency measurements of the NASA almond, an ogive, a double ogive, a cone-sphere, and a cone-sphere with a gap.
Benchmark of Atucha-2 PHWR RELAP5-3D control rod model by Monte Carlo MCNP5 core calculation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pecchia, M.; D'Auria, F.; Mazzantini, O.

2012-07-01

Atucha-2 is a Siemens-designed PHWR reactor under construction in the Republic of Argentina. Its geometrical complexity and peculiarities require the adoption of advanced Monte Carlo codes for performing realistic neutronic simulations. Therefore core models of Atucha-2 PHWR were developed using MCNP5. In this work a methodology was set up to collect the flux in the hexagonal mesh by which the Atucha-2 core is represented. The scope of this activity is to evaluate the effect of obliquely inserted control rod on neutron flux in order to validate the RELAP5-3D{sup C}/NESTLE three dimensional neutron kinetic coupled thermal-hydraulic model, applied by GRNSPG/UNIPI formore » performing selected transients of Chapter 15 FSAR of Atucha-2. (authors)« less
A theoretical and experimental benchmark study of core-excited states in nitrogen

DOE PAGES

Myhre, Rolf H.; Wolf, Thomas J. A.; Cheng, Lan; ...

2018-02-14

The high resolution near edge X-ray absorption fine structure spectrum of nitrogen displays the vibrational structure of the core-excited states. This makes nitrogen well suited for assessing the accuracy of different electronic structure methods for core excitations. We report high resolution experimental measurements performed at the SOLEIL synchrotron facility. These are compared with theoretical spectra calculated using coupled cluster theory and algebraic diagrammatic construction theory. The coupled cluster singles and doubles with perturbative triples model known as CC3 is shown to accurately reproduce the experimental excitation energies as well as the spacing of the vibrational transitions. In conclusion, the computationalmore » results are also shown to be systematically improved within the coupled cluster hierarchy, with the coupled cluster singles, doubles, triples, and quadruples method faithfully reproducing the experimental vibrational structure.« less
A theoretical and experimental benchmark study of core-excited states in nitrogen

DOE Office of Scientific and Technical Information (OSTI.GOV)

Myhre, Rolf H.; Wolf, Thomas J. A.; Cheng, Lan

The high resolution near edge X-ray absorption fine structure spectrum of nitrogen displays the vibrational structure of the core-excited states. This makes nitrogen well suited for assessing the accuracy of different electronic structure methods for core excitations. We report high resolution experimental measurements performed at the SOLEIL synchrotron facility. These are compared with theoretical spectra calculated using coupled cluster theory and algebraic diagrammatic construction theory. The coupled cluster singles and doubles with perturbative triples model known as CC3 is shown to accurately reproduce the experimental excitation energies as well as the spacing of the vibrational transitions. In conclusion, the computationalmore » results are also shown to be systematically improved within the coupled cluster hierarchy, with the coupled cluster singles, doubles, triples, and quadruples method faithfully reproducing the experimental vibrational structure.« less
SWPS3 - fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2.

PubMed

Szalkowski, Adam; Ledergerber, Christian; Krähenbühl, Philipp; Dessimoz, Christophe

2008-10-29

We present swps3, a vectorized implementation of the Smith-Waterman local alignment algorithm optimized for both the Cell/BE and x86 architectures. The paper describes swps3 and compares its performances with several other implementations. Our benchmarking results show that swps3 is currently the fastest implementation of a vectorized Smith-Waterman on the Cell/BE, outperforming the only other known implementation by a factor of at least 4: on a Playstation 3, it achieves up to 8.0 billion cell-updates per second (GCUPS). Using the SSE2 instruction set, a quad-core Intel Pentium can reach 15.7 GCUPS. We also show that swps3 on this CPU is faster than a recent GPU implementation. Finally, we note that under some circumstances, alignments are computed at roughly the same speed as BLAST, a heuristic method. The Cell/BE can be a powerful platform to align biological sequences. Besides, the performance gap between exact and heuristic methods has almost disappeared, especially for long protein sequences.
SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2

PubMed Central

Szalkowski, Adam; Ledergerber, Christian; Krähenbühl, Philipp; Dessimoz, Christophe

2008-01-01

Background We present swps3, a vectorized implementation of the Smith-Waterman local alignment algorithm optimized for both the Cell/BE and ×86 architectures. The paper describes swps3 and compares its performances with several other implementations. Findings Our benchmarking results show that swps3 is currently the fastest implementation of a vectorized Smith-Waterman on the Cell/BE, outperforming the only other known implementation by a factor of at least 4: on a Playstation 3, it achieves up to 8.0 billion cell-updates per second (GCUPS). Using the SSE2 instruction set, a quad-core Intel Pentium can reach 15.7 GCUPS. We also show that swps3 on this CPU is faster than a recent GPU implementation. Finally, we note that under some circumstances, alignments are computed at roughly the same speed as BLAST, a heuristic method. Conclusion The Cell/BE can be a powerful platform to align biological sequences. Besides, the performance gap between exact and heuristic methods has almost disappeared, especially for long protein sequences. PMID:18959793
OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

NASA Astrophysics Data System (ADS)

Ki, Seo Jin; Sugimura, Tak; Kim, Albert S.

2015-02-01

We developed a practical method to accelerate execution of Soil and Water Assessment Tool (SWAT) using open (free) computational resources. The SWAT source code (rev 622) was recompiled using a non-commercial Intel FORTRAN compiler in Ubuntu 12.04 LTS Linux platform, and newly named iOMP-SWAT in this study. GNU utilities of make, gprof, and diff were used to develop the iOMP-SWAT package, profile memory usage, and check identicalness of parallel and serial simulations. Among 302 SWAT subroutines, the slowest routines were identified using GNU gprof, and later modified using Open Multiple Processing (OpenMP) library in an 8-core shared memory system. In addition, a C wrapping function was used to rapidly set large arrays to zero by cross compiling with the original SWAT FORTRAN package. A universal speedup ratio of 2.3 was achieved using input data sets of a large number of hydrological response units. As we specifically focus on acceleration of a single SWAT run, the use of iOMP-SWAT for parameter calibrations will significantly improve the performance of SWAT optimization.
GeNeDA: An Open-Source Workflow for Design Automation of Gene Regulatory Networks Inspired from Microelectronics.

PubMed

Madec, Morgan; Pecheux, François; Gendrault, Yves; Rosati, Elise; Lallement, Christophe; Haiech, Jacques

2016-10-01

The topic of this article is the development of an open-source automated design framework for synthetic biology, specifically for the design of artificial gene regulatory networks based on a digital approach. In opposition to other tools, GeNeDA is an open-source online software based on existing tools used in microelectronics that have proven their efficiency over the last 30 years. The complete framework is composed of a computation core directly adapted from an Electronic Design Automation tool, input and output interfaces, a library of elementary parts that can be achieved with gene regulatory networks, and an interface with an electrical circuit simulator. Each of these modules is an extension of microelectronics tools and concepts: ODIN II, ABC, the Verilog language, SPICE simulator, and SystemC-AMS. GeNeDA is first validated on a benchmark of several combinatorial circuits. The results highlight the importance of the part library. Then, this framework is used for the design of a sequential circuit including a biological state machine.
COMPUTERIZED TRAINING OF CRYOSURGERY – A SYSTEM APPROACH

PubMed Central

Keelan, Robert; Yamakawa, Soji; Shimada, Kenji; Rabin, Yoed

2014-01-01

The objective of the current study is to provide the foundation for a computerized training platform for cryosurgery. Consistent with clinical practice, the training process targets the correlation of the frozen region contour with the target region shape, using medical imaging and accepted criteria for clinical success. The current study focuses on system design considerations, including a bioheat transfer model, simulation techniques, optimal cryoprobe layout strategy, and a simulation core framework. Two fundamentally different approaches were considered for the development of a cryosurgery simulator, based on a finite-elements (FE) commercial code (ANSYS) and a proprietary finite-difference (FD) code. Results of this study demonstrate that the FE simulator is superior in terms of geometric modeling, while the FD simulator is superior in terms of runtime. Benchmarking results further indicate that the FD simulator is superior in terms of usage of memory resources, pre-processing, parallel processing, and post-processing. It is envisioned that future integration of a human-interface module and clinical data into the proposed computer framework will make computerized training of cryosurgery a practical reality. PMID:23995400
Filtering Gene Ontology semantic similarity for identifying protein complexes in large protein interaction networks.

PubMed

Wang, Jian; Xie, Dong; Lin, Hongfei; Yang, Zhihao; Zhang, Yijia

2012-06-21

Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification. A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics. The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.
Predicting MHC-II binding affinity using multiple instance regression

PubMed Central

EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

2011-01-01

Reliably predicting the ability of antigen peptides to bind to major histocompatibility complex class II (MHC-II) molecules is an essential step in developing new vaccines. Uncovering the amino acid sequence correlates of the binding affinity of MHC-II binding peptides is important for understanding pathogenesis and immune response. The task of predicting MHC-II binding peptides is complicated by the significant variability in their length. Most existing computational methods for predicting MHC-II binding peptides focus on identifying a nine amino acids core region in each binding peptide. We formulate the problems of qualitatively and quantitatively predicting flexible length MHC-II peptides as multiple instance learning and multiple instance regression problems, respectively. Based on this formulation, we introduce MHCMIR, a novel method for predicting MHC-II binding affinity using multiple instance regression. We present results of experiments using several benchmark datasets that show that MHCMIR is competitive with the state-of-the-art methods for predicting MHC-II binding peptides. An online web server that implements the MHCMIR method for MHC-II binding affinity prediction is freely accessible at http://ailab.cs.iastate.edu/mhcmir. PMID:20855923

GELATIO: a general framework for modular digital analysis of high-purity Ge detector signals

NASA Astrophysics Data System (ADS)

Agostini, M.; Pandola, L.; Zavarise, P.; Volynets, O.

2011-08-01

GELATIO is a new software framework for advanced data analysis and digital signal processing developed for the GERDA neutrinoless double beta decay experiment. The framework is tailored to handle the full analysis flow of signals recorded by high purity Ge detectors and photo-multipliers from the veto counters. It is designed to support a multi-channel modular and flexible analysis, widely customizable by the user either via human-readable initialization files or via a graphical interface. The framework organizes the data into a multi-level structure, from the raw data up to the condensed analysis parameters, and includes tools and utilities to handle the data stream between the different levels. GELATIO is implemented in C++. It relies upon ROOT and its extension TAM, which provides compatibility with PROOF, enabling the software to run in parallel on clusters of computers or many-core machines. It was tested on different platforms and benchmarked in several GERDA-related applications. A stable version is presently available for the GERDA Collaboration and it is used to provide the reference analysis of the experiment data.
Optimizing legacy molecular dynamics software with directive-based offload

NASA Astrophysics Data System (ADS)

Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.

2015-10-01

Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.
Towards a Computational Framework for Modeling the Impact of Aortic Coarctations Upon Left Ventricular Load

PubMed Central

Karabelas, Elias; Gsell, Matthias A. F.; Augustin, Christoph M.; Marx, Laura; Neic, Aurel; Prassl, Anton J.; Goubergrits, Leonid; Kuehne, Titus; Plank, Gernot

2018-01-01

Computational fluid dynamics (CFD) models of blood flow in the left ventricle (LV) and aorta are important tools for analyzing the mechanistic links between myocardial deformation and flow patterns. Typically, the use of image-based kinematic CFD models prevails in applications such as predicting the acute response to interventions which alter LV afterload conditions. However, such models are limited in their ability to analyze any impacts upon LV load or key biomarkers known to be implicated in driving remodeling processes as LV function is not accounted for in a mechanistic sense. This study addresses these limitations by reporting on progress made toward a novel electro-mechano-fluidic (EMF) model that represents the entire physics of LV electromechanics (EM) based on first principles. A biophysically detailed finite element (FE) model of LV EM was coupled with a FE-based CFD solver for moving domains using an arbitrary Eulerian-Lagrangian (ALE) formulation. Two clinical cases of patients suffering from aortic coarctations (CoA) were built and parameterized based on clinical data under pre-treatment conditions. For one patient case simulations under post-treatment conditions after geometric repair of CoA by a virtual stenting procedure were compared against pre-treatment results. Numerical stability of the approach was demonstrated by analyzing mesh quality and solver performance under the significantly large deformations of the LV blood pool. Further, computational tractability and compatibility with clinical time scales were investigated by performing strong scaling benchmarks up to 1536 compute cores. The overall cost of the entire workflow for building, fitting and executing EMF simulations was comparable to those reported for image-based kinematic models, suggesting that EMF models show potential of evolving into a viable clinical research tool. PMID:29892227
Performance benchmark of LHCb code on state-of-the-art x86 architectures

NASA Astrophysics Data System (ADS)

Campora Perez, D. H.; Neufeld, N.; Schwemmer, R.

2015-12-01

For Run 2 of the LHC, LHCb is replacing a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late forking, NUMA balancing and optimal number of threads, i.e. it automatically optimises box-level performance. We have run this procedure on a wide range of Haswell-E CPUs and numerous other architectures from both Intel and AMD, including also the latest Intel micro-blade servers. We present results in terms of performance, power consumption, overheads and relative cost.
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)

1997-01-01

We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Stephen Johnson; Mehdi Salehi; Karl Eisert

This report describes the progress of our research during the first 30 months (10/01/2004 to 03/31/2007) of the original three-year project cycle. The project was terminated early due to DOE budget cuts. This was a joint project between the Tertiary Oil Recovery Project (TORP) at the University of Kansas and the Idaho National Laboratory (INL). The objective was to evaluate the use of low-cost biosurfactants produced from agriculture process waste streams to improve oil recovery in fractured carbonate reservoirs through wettability mediation. Biosurfactant for this project was produced using Bacillus subtilis 21332 and purified potato starch as the growth medium.more » The INL team produced the biosurfactant and characterized it as surfactin. INL supplied surfactin as required for the tests at KU as well as providing other microbiological services. Interfacial tension (IFT) between Soltrol 130 and both potential benchmark chemical surfactants and crude surfactin was measured over a range of concentrations. The performance of the crude surfactin preparation in reducing IFT was greater than any of the synthetic compounds throughout the concentration range studied but at low concentrations, sodium laureth sulfate (SLS) was closest to the surfactin, and was used as the benchmark in subsequent studies. Core characterization was carried out using both traditional flooding techniques to find porosity and permeability; and NMR/MRI to image cores and identify pore architecture and degree of heterogeneity. A cleaning regime was identified and developed to remove organic materials from cores and crushed carbonate rock. This allowed cores to be fully characterized and returned to a reproducible wettability state when coupled with a crude-oil aging regime. Rapid wettability assessments for crushed matrix material were developed, and used to inform slower Amott wettability tests. Initial static absorption experiments exposed limitations in the use of HPLC and TOC to determine surfactant concentrations. To reliably quantify both benchmark surfactants and surfactin, a surfactant ion-selective electrode was used as an indicator in the potentiometric titration of the anionic surfactants with Hyamine 1622. The wettability change mediated by dilute solutions of a commercial preparation of SLS (STEOL CS-330) and surfactin was assessed using two-phase separation, and water flotation techniques; and surfactant loss due to retention and adsorption on the rock was determined. Qualitative tests indicated that on a molar basis, surfactin is more effective than STEOL CS-330 in altering wettability of crushed Lansing-Kansas City carbonates from oil-wet to water-wet state. Adsorption isotherms of STEOL CS-330 and surfactin on crushed Lansing-Kansas City outcrop and reservoir material showed that surfactin has higher specific adsorption on these oomoldic carbonates. Amott wettability studies confirmed that cleaned cores are mixed-wet, and that the aging procedure renders them oil-wet. Tests of aged cores with no initial water saturation resulted in very little spontaneous oil production, suggesting that water-wet pathways into the matrix are required for wettability change to occur. Further investigation of spontaneous imbibition and forced imbibition of water and surfactant solutions into LKC cores under a variety of conditions--cleaned vs. crude oil-aged; oil saturated vs. initial water saturation; flooded with surfactant vs. not flooded--indicated that in water-wet or intermediate wet cores, sodium laureth sulfate is more effective at enhancing spontaneous imbibition through wettability change. However, in more oil-wet systems, surfactin at the same concentration performs significantly better.« less
istar: a web platform for large-scale protein-ligand docking.

PubMed

Li, Hongjian; Leung, Kwong-Sak; Ballester, Pedro J; Wong, Man-Hon

2014-01-01

Protein-ligand docking is a key computational method in the design of starting points for the drug discovery process. We are motivated by the desire to automate large-scale docking using our popular docking engine idock and thus have developed a publicly-accessible web platform called istar. Without tedious software installation, users can submit jobs using our website. Our istar website supports 1) filtering ligands by desired molecular properties and previewing the number of ligands to dock, 2) monitoring job progress in real time, and 3) visualizing ligand conformations and outputting free energy and ligand efficiency predicted by idock, binding affinity predicted by RF-Score, putative hydrogen bonds, and supplier information for easy purchase, three useful features commonly lacked on other online docking platforms like DOCK Blaster or iScreen. We have collected 17,224,424 ligands from the All Clean subset of the ZINC database, and revamped our docking engine idock to version 2.0, further improving docking speed and accuracy, and integrating RF-Score as an alternative rescoring function. To compare idock 2.0 with the state-of-the-art AutoDock Vina 1.1.2, we have carried out a rescoring benchmark and a redocking benchmark on the 2,897 and 343 protein-ligand complexes of PDBbind v2012 refined set and CSAR NRC HiQ Set 24Sept2010 respectively, and an execution time benchmark on 12 diverse proteins and 3,000 ligands of different molecular weight. Results show that, under various scenarios, idock achieves comparable success rates while outperforming AutoDock Vina in terms of docking speed by at least 8.69 times and at most 37.51 times. When evaluated on the PDBbind v2012 core set, our istar platform combining with RF-Score manages to reproduce Pearson's correlation coefficient and Spearman's correlation coefficient of as high as 0.855 and 0.859 respectively between the experimental binding affinity and the predicted binding affinity of the docked conformation. istar is freely available at http://istar.cse.cuhk.edu.hk/idock.
Matt: local flexibility aids protein multiple structure alignment.

PubMed

Menke, Matthew; Berger, Bonnie; Cowen, Lenore

2008-01-01

Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these "bent" alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of alpha-helices and beta-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins.
Hypersonic Experimental and Computational Capability, Improvement and Validation. Volume 2

NASA Technical Reports Server (NTRS)

Muylaert, Jean (Editor); Kumar, Ajay (Editor); Dujarric, Christian (Editor)

1998-01-01

The results of the phase 2 effort conducted under AGARD Working Group 18 on Hypersonic Experimental and Computational Capability, Improvement and Validation are presented in this report. The first volume, published in May 1996, mainly focused on the design methodology, plans and some initial results of experiments that had been conducted to serve as validation benchmarks. The current volume presents the detailed experimental and computational data base developed during this effort.
Interactive collision detection for deformable models using streaming AABBs.

PubMed

Zhang, Xinyu; Kim, Young J

2007-01-01

We present an interactive and accurate collision detection algorithm for deformable, polygonal objects based on the streaming computational model. Our algorithm can detect all possible pairwise primitive-level intersections between two severely deforming models at highly interactive rates. In our streaming computational model, we consider a set of axis aligned bounding boxes (AABBs) that bound each of the given deformable objects as an input stream and perform massively-parallel pairwise, overlapping tests onto the incoming streams. As a result, we are able to prevent performance stalls in the streaming pipeline that can be caused by expensive indexing mechanism required by bounding volume hierarchy-based streaming algorithms. At runtime, as the underlying models deform over time, we employ a novel, streaming algorithm to update the geometric changes in the AABB streams. Moreover, in order to get only the computed result (i.e., collision results between AABBs) without reading back the entire output streams, we propose a streaming en/decoding strategy that can be performed in a hierarchical fashion. After determining overlapped AABBs, we perform a primitive-level (e.g., triangle) intersection checking on a serial computational model such as CPUs. We implemented the entire pipeline of our algorithm using off-the-shelf graphics processors (GPUs), such as nVIDIA GeForce 7800 GTX, for streaming computations, and Intel Dual Core 3.4G processors for serial computations. We benchmarked our algorithm with different models of varying complexities, ranging from 15K up to 50K triangles, under various deformation motions, and the timings were obtained as 30 approximately 100 FPS depending on the complexity of models and their relative configurations. Finally, we made comparisons with a well-known GPU-based collision detection algorithm, CULLIDE [4] and observed about three times performance improvement over the earlier approach. We also made comparisons with a SW-based AABB culling algorithm [2] and observed about two times improvement.
Analysis of 2D Torus and Hub Topologies of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Pedretti, Kevin T.; Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)

1997-01-01

A variety of different network technologies and topologies are currently being evaluated as part of the Whitney Project. This paper reports on the implementation and performance of a Fast Ethernet network configured in a 4x4 2D torus topology in a testbed cluster of 'commodity' Pentium Pro PCs. Several benchmarks were used for performance evaluation: an MPI point to point message passing benchmark, an MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2). Our results show that for point to point communication on an unloaded network, the hub and 1 hop routes on the torus have about the same bandwidth and latency. However, the bandwidth decreases and the latency increases on the torus for each additional route hop. Collective communication benchmarks show that the torus provides roughly four times more aggregate bandwidth and eight times faster MPI barrier synchronizations than a hub based network for 16 processor systems. Finally, the SOAPBOX benchmarks, which simulate real-world CFD applications, generally demonstrated substantially better performance on the torus than on the hub. In the few cases the hub was faster, the difference was negligible. In total, our experimental results lead to the conclusion that for Fast Ethernet networks, the torus topology has better performance and scales better than a hub based network.
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Benchmark solutions for the galactic heavy-ion transport equations with energy and spatial coupling

NASA Technical Reports Server (NTRS)

Ganapol, Barry D.; Townsend, Lawrence W.; Lamkin, Stanley L.; Wilson, John W.

1991-01-01

Nontrivial benchmark solutions are developed for the galactic heavy ion transport equations in the straightahead approximation with energy and spatial coupling. Analytical representations of the ion fluxes are obtained for a variety of sources with the assumption that the nuclear interaction parameters are energy independent. The method utilizes an analytical LaPlace transform inversion to yield a closed form representation that is computationally efficient. The flux profiles are then used to predict ion dose profiles, which are important for shield design studies.
Diagnostic reference levels of paediatric computed tomography examinations performed at a dedicated Australian paediatric hospital.

PubMed

Bibbo, Giovanni; Brown, Scott; Linke, Rebecca

2016-08-01

Diagnostic Reference Levels (DRL) of procedures involving ionizing radiation are important tools to optimizing radiation doses delivered to patients and in identifying cases where the levels of doses are unusually high. This is particularly important for paediatric patients undergoing computed tomography (CT) examinations as these examinations are associated with relatively high-dose. Paediatric CT studies, performed at our institution from January 2010 to March 2014, have been retrospectively analysed to determine the 75th and 95th percentiles of both the volume computed tomography dose index (CTDIvol ) and dose-length product (DLP) for the most commonly performed studies to: establish local diagnostic reference levels for paediatric computed tomography examinations performed at our institution, benchmark our DRL with national and international published paediatric values, and determine the compliance of CT radiographer with established protocols. The derived local 75th percentile DRL have been found to be acceptable when compared with those published by the Australian National Radiation Dose Register and two national children's hospitals, and at the international level with the National Reference Doses for the UK. The 95th percentiles of CTDIvol for the various CT examinations have been found to be acceptable values for the CT scanner Dose-Check Notification. Benchmarking CT radiographers shows that they follow the set protocols for the various examinations without significant variations in the machine setting factors. The derivation of DRL has given us the tool to evaluate and improve the performance of our CT service by improved compliance and a reduction in radiation dose to our paediatric patients. We have also been able to benchmark our performance with similar national and international institutions. © 2016 The Royal Australian and New Zealand College of Radiologists.
A Combined Experimental and Computational Study on Selected Physical Properties of Aminosilicones

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perry, RJ; Genovese, SE; Farnum, RL

2014-01-29

A number of physical properties of aminosilicones have been determined experimentally and predicted computationally. It was found that COSMO-RS predicted the densities of the materials under study to within about 4% of the experimentally determined values. Vapor pressure measurements were performed, and all of the aminosilicones of interest were found to be significantly less volatile than the benchmark MEA material. COSMO-RS was reasonably accurate for predicting the vapor pressures for aminosilicones that were thermally stable. The heat capacities of all aminosilicones tested were between 2.0 and 2.3 J/(g.degrees C); again substantially lower than a benchmark 30% aqueous MEA solution. Surfacemore » energies for the aminosilicones were found to be 23.3-28.3 dyne/cm and were accurately predicted using the parachor method.« less
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Determination of binding affinity upon mutation for type I dockerin-cohesin complexes from Clostridium thermocellum and Clostridium cellulolyticum using deep sequencing.

PubMed

Kowalsky, Caitlin A; Whitehead, Timothy A

2016-12-01

The comprehensive sequence determinants of binding affinity for type I cohesin toward dockerin from Clostridium thermocellum and Clostridium cellulolyticum was evaluated using deep mutational scanning coupled to yeast surface display. We measured the relative binding affinity to dockerin for 2970 and 2778 single point mutants of C. thermocellum and C. cellulolyticum, respectively, representing over 96% of all possible single point mutants. The interface ΔΔG for each variant was reconstructed from sequencing counts and compared with the three independent experimental methods. This reconstruction results in a narrow dynamic range of -0.8-0.5 kcal/mol. The computational software packages FoldX and Rosetta were used to predict mutations that disrupt binding by more than 0.4 kcal/mol. The area under the curve of receiver operator curves was 0.82 for FoldX and 0.77 for Rosetta, showing reasonable agreements between predictions and experimental results. Destabilizing mutations to core and rim positions were predicted with higher accuracy than support positions. This benchmark dataset may be useful for developing new computational prediction tools for the prediction of the mutational effect on binding affinities for protein-protein interactions. Experimental considerations to improve precision and range of the reconstruction method are discussed. Proteins 2016; 84:1914-1928. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
The molecular basis of conformational instability of the ecdysone receptor DNA binding domain studied by in silico and in vitro experiments.

PubMed

Szamborska-Gbur, Agnieszka; Rymarczyk, Grzegorz; Orłowski, Marek; Kuzynowski, Tomasz; Jakób, Michał; Dziedzic-Letka, Agnieszka; Górecki, Andrzej; Dobryszycki, Piotr; Ożyhar, Andrzej

2014-01-01

The heterodimer of the ecdysone receptor (EcR) and ultraspiracle (Usp), members of the nuclear receptors superfamily, regulates gene expression associated with molting and metamorphosis in insects. The DNA binding domains (DBDs) of the Usp and EcR play an important role in their DNA-dependent heterodimerization. Analysis of the crystal structure of the UspDBD/EcRDBD heterocomplex from Drosophila melanogaster on the hsp27 gene response element, suggested an appreciable similarity between both DBDs. However, the chemical denaturation experiments showed a categorically lower stability for the EcRDBD in contrast to the UspDBD. The aim of our study was an elucidation of the molecular basis of this intriguing instability. Toward this end, we mapped the EcRDBD amino acid sequence positions which have an impact on the stability of the EcRDBD. The computational protein design and in vitro analyses of the EcRDBD mutants indicate that non-conserved residues within the α-helix 2, forming the EcRDBD hydrophobic core, represent a specific structural element that contributes to instability. In particular, the L58 appears to be a key residue which differentiates the hydrophobic cores of UspDBD and EcRDBD and is the main reason for the low stability of the EcRDBD. Our results might serve as a benchmark for further studies of the intricate nature of the EcR molecule.
Five- and six-electron harmonium atoms: Highly accurate electronic properties and their application to benchmarking of approximate 1-matrix functionals

NASA Astrophysics Data System (ADS)

Cioslowski, Jerzy; Strasburger, Krzysztof

2018-04-01

Electronic properties of several states of the five- and six-electron harmonium atoms are obtained from large-scale calculations employing explicitly correlated basis functions. The high accuracy of the computed energies (including their components), natural spinorbitals, and their occupation numbers makes them suitable for testing, calibration, and benchmarking of approximate formalisms of quantum chemistry and solid state physics. In the case of the five-electron species, the availability of the new data for a wide range of the confinement strengths ω allows for confirmation and generalization of the previously reached conclusions concerning the performance of the presently known approximations for the electron-electron repulsion energy in terms of the 1-matrix that are at heart of the density matrix functional theory (DMFT). On the other hand, the properties of the three low-lying states of the six-electron harmonium atom, computed at ω = 500 and ω = 1000, uncover deficiencies of the 1-matrix functionals not revealed by previous studies. In general, the previously published assessment of the present implementations of DMFT being of poor accuracy is found to hold. Extending the present work to harmonically confined systems with even more electrons is most likely counterproductive as the steep increase in computational cost required to maintain sufficient accuracy of the calculated properties is not expected to be matched by the benefits of additional information gathered from the resulting benchmarks.
Benchmarking of Neutron Flux Parameters at the USGS TRIGA Reactor in Lakewood, Colorado

NASA Astrophysics Data System (ADS)

Alzaabi, Osama E.

The USGS TRIGA Reactor (GSTR) located at the Denver Federal Center in Lakewood Colorado provides opportunities to Colorado School of Mines students to do experimental research in the field of neutron activation analysis. The scope of this thesis is to obtain precise knowledge of neutron flux parameters at the GSTR. The Colorado School of Mines Nuclear Physics group intends to develop several research projects at the GSTR, which requires the precise knowledge of neutron fluxes and energy distributions in several irradiation locations. The fuel burn-up of the new GSTR fuel configuration and the thermal neutron flux of the core were recalculated since the GSTR core configuration had been changed with the addition of two new fuel elements. Therefore, a MCNP software package was used to incorporate the burn up of reactor fuel and to determine the neutron flux at different irradiation locations and at flux monitoring bores. These simulation results were compared with neutron activation analysis results using activated diluted gold wires. A well calibrated and stable germanium detector setup as well as fourteen samplers were designed and built to achieve accuracy in the measurement of the neutron flux. Furthermore, the flux monitoring bores of the GSTR core were used for the first time to measure neutron flux experimentally and to compare to MCNP simulation. In addition, International Atomic Energy Agency (IAEA) standard materials were used along with USGS national standard materials in a previously well calibrated irradiation location to benchmark simulation, germanium detector calibration and sample measurements to international standards.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.