DOE Office of Scientific and Technical Information (OSTI.GOV)
Cohen, J; Dossa, D; Gokhale, M
Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
Federated data storage system prototype for LHC experiments and data intensive science
NASA Astrophysics Data System (ADS)
Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Ryabinkin, E.; Zarochentsev, A.
2017-10-01
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted physics computing community to evaluate new data handling and processing solutions. Russian grid sites and universities’ clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics collaborations. In our project we address the fundamental problem of designing a computing architecture to integrate distributed storage resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. Studies include development and implementation of federated data storage prototype for Worldwide LHC Computing Grid (WLCG) centres of different levels and University clusters within one National Cloud. The prototype is based on computing resources located in Moscow, Dubna, Saint Petersburg, Gatchina and Geneva. This project intends to implement a federated distributed storage for all kind of operations such as read/write/transfer and access via WAN from Grid centres, university clusters, supercomputers, academic and commercial clouds. The efficiency and performance of the system are demonstrated using synthetic and experiment-specific tests including real data processing and analysis workflows from ATLAS and ALICE experiments, as well as compute-intensive bioinformatics applications (PALEOMIX) running on supercomputers. We present topology and architecture of the designed system, report performance and statistics for different access patterns and show how federated data storage can be used efficiently by physicists and biologists. We also describe how sharing data on a widely distributed storage system can lead to a new computing model and reformations of computing style, for instance how bioinformatics program running on supercomputers can read/write data from the federated storage.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yilk, Todd
The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL
Yilk, Todd
2018-02-17
The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks
NASA Technical Reports Server (NTRS)
Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;
2006-01-01
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.
TOP500 Supercomputers for November 2003
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
2003-11-16
22nd Edition of TOP500 List of World s Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 22nd edition of the TOP500 list of the worlds fastest supercomputers was released today (November 16, 2003). The Earth Simulator supercomputer retains the number one position with its Linpack benchmark performance of 35.86 Tflop/s (''teraflops'' or trillions of calculations per second). It was built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan.
Integrating the Apache Big Data Stack with HPC for Big Data
NASA Astrophysics Data System (ADS)
Fox, G. C.; Qiu, J.; Jha, S.
2014-12-01
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However, the same is not so true for data intensive computing, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations. We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures. We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks and use these to identify a few key classes of hardware/software architectures. Our analysis builds on combining HPC and ABDS the Apache big data software stack that is well used in modern cloud computing. Initial results on clouds and HPC systems are encouraging. We propose the development of SPIDAL - Scalable Parallel Interoperable Data Analytics Library -- built on system aand data abstractions suggested by the HPC-ABDS architecture. We discuss how it can be used in several application areas including Polar Science.
Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Yang; Gunasekaran, Raghul; Ma, Xiaosong
2014-01-01
Competing workloads on a shared storage system cause I/O resource contention and application performance vagaries. This problem is already evident in today s HPC storage systems and is likely to become acute at exascale. We need more interaction between application I/O requirements and system software tools to help alleviate the I/O bottleneck, moving towards I/O-aware job scheduling. However, this requires rich techniques to capture application I/O characteristics, which remain evasive in production systems. Traditionally, I/O characteristics have been obtained using client-side tracing tools, with drawbacks such as non-trivial instrumentation/development costs, large trace traffic, and inconsistent adoption. We present a novelmore » approach, I/O Signature Identifier (IOSI), to characterize the I/O behavior of data-intensive applications. IOSI extracts signatures from noisy, zero-overhead server-side I/O throughput logs that are already collected on today s supercomputers, without interfering with the compiling/execution of applications. We evaluated IOSI using the Spider storage system at Oak Ridge National Laboratory, the S3D turbulence application (running on 18,000 Titan nodes), and benchmark-based pseudo-applications. Through our ex- periments we confirmed that IOSI effectively extracts an application s I/O signature despite significant server-side noise. Compared to client-side tracing tools, IOSI is transparent, interface-agnostic, and incurs no overhead. Compared to alternative data alignment techniques (e.g., dynamic time warping), it offers higher signature accuracy and shorter processing time.« less
NASA Technical Reports Server (NTRS)
1991-01-01
Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
The NAS kernel benchmark program
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barton, J. T.
1985-01-01
A collection of benchmark test kernels that measure supercomputer performance has been developed for the use of the NAS (Numerical Aerodynamic Simulation) program at the NASA Ames Research Center. This benchmark program is described in detail and the specific ground rules are given for running the program as a performance test.
Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...
2015-12-21
This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Y.; Gunasekaran, Raghul; Ma, Xiaosong
2016-01-01
Inter-application I/O contention and performance interference have been recognized as severe problems. In this work, we demonstrate, through measurement from Titan (world s No. 3 supercomputer), that high I/O variance co-exists with the fact that individual storage units remain under-utilized for the majority of the time. This motivates us to propose AID, a system that performs automatic application I/O characterization and I/O-aware job scheduling. AID analyzes existing I/O traffic and batch job history logs, without any prior knowledge on applications or user/developer involvement. It identifies the small set of I/O-intensive candidates among all applications running on a supercomputer and subsequentlymore » mines their I/O patterns, using more detailed per-I/O-node traffic logs. Based on such auto- extracted information, AID provides online I/O-aware scheduling recommendations to steer I/O-intensive applications away from heavy ongoing I/O activities. We evaluate AID on Titan, using both real applications (with extracted I/O patterns validated by contacting users) and our own pseudo-applications. Our results confirm that AID is able to (1) identify I/O-intensive applications and their detailed I/O characteristics, and (2) significantly reduce these applications I/O performance degradation/variance by jointly evaluating out- standing applications I/O pattern and real-time system l/O load.« less
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.
1991-01-01
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H.
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less
Data-intensive computing on numerically-insensitive supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahrens, James P; Fasel, Patricia K; Habib, Salman
2010-12-03
With the advent of the era of petascale supercomputing, via the delivery of the Roadrunner supercomputing platform at Los Alamos National Laboratory, there is a pressing need to address the problem of visualizing massive petascale-sized results. In this presentation, I discuss progress on a number of approaches including in-situ analysis, multi-resolution out-of-core streaming and interactive rendering on the supercomputing platform. These approaches are placed in context by the emerging area of data-intensive supercomputing.
LASL benchmark performance 1978. [CDC STAR-100, 6600, 7600, Cyber 73, and CRAY-1
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKnight, A.L.
1979-08-01
This report presents the results of running several benchmark programs on a CDC STAR-100, a Cray Research CRAY-1, a CDC 6600, a CDC 7600, and a CDC Cyber 73. The benchmark effort included CRAY-1's at several installations running different operating systems and compilers. This benchmark is part of an ongoing program at Los Alamos Scientific Laboratory to collect performance data and monitor the development trend of supercomputers. 3 tables.
Towards the Interoperability of Web, Database, and Mass Storage Technologies for Petabyte Archives
NASA Technical Reports Server (NTRS)
Moore, Reagan; Marciano, Richard; Wan, Michael; Sherwin, Tom; Frost, Richard
1996-01-01
At the San Diego Supercomputer Center, a massive data analysis system (MDAS) is being developed to support data-intensive applications that manipulate terabyte sized data sets. The objective is to support scientific application access to data whether it is located at a Web site, stored as an object in a database, and/or storage in an archival storage system. We are developing a suite of demonstration programs which illustrate how Web, database (DBMS), and archival storage (mass storage) technologies can be integrated. An application presentation interface is being designed that integrates data access to all of these sources. We have developed a data movement interface between the Illustra object-relational database and the NSL UniTree archival storage system running in a production mode at the San Diego Supercomputer Center. With this interface, an Illustra client can transparently access data on UniTree under the control of the Illustr DBMS server. The current implementation is based on the creation of a new DBMS storage manager class, and a set of library functions that allow the manipulation and migration of data stored as Illustra 'large objects'. We have extended this interface to allow a Web client application to control data movement between its local disk, the Web server, the DBMS Illustra server, and the UniTree mass storage environment. This paper describes some of the current approaches successfully integrating these technologies. This framework is measured against a representative sample of environmental data extracted from the San Diego Ba Environmental Data Repository. Practical lessons are drawn and critical research areas are highlighted.
A mass storage system for supercomputers based on Unix
NASA Technical Reports Server (NTRS)
Richards, J.; Kummell, T.; Zarlengo, D. G.
1988-01-01
The authors present the design, implementation, and utilization of a large mass storage subsystem (MSS) for the numerical aerodynamics simulation. The MSS supports a large networked, multivendor Unix-based supercomputing facility. The MSS at Ames Research Center provides all processors on the numerical aerodynamics system processing network, from workstations to supercomputers, the ability to store large amounts of data in a highly accessible, long-term repository. The MSS uses Unix System V and is capable of storing hundreds of thousands of files ranging from a few bytes to 2 Gb in size.
A Layered Solution for Supercomputing Storage
Grider, Gary
2018-06-13
To solve the supercomputing challenge of memory keeping up with processing speed, a team at Los Alamos National Laboratory developed two innovative memory management and storage technologies. Burst buffers peel off data onto flash memory to support the checkpoint/restart paradigm of large simulations. MarFS adds a thin software layer enabling a new tier for campaign storageâbased on inexpensive, failure-prone disk drivesâbetween disk drives and tape archives.
Implementation of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Performance and Scalability of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
A Layered Solution for Supercomputing Storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grider, Gary
To solve the supercomputing challenge of memory keeping up with processing speed, a team at Los Alamos National Laboratory developed two innovative memory management and storage technologies. Burst buffers peel off data onto flash memory to support the checkpoint/restart paradigm of large simulations. MarFS adds a thin software layer enabling a new tier for campaign storage—based on inexpensive, failure-prone disk drives—between disk drives and tape archives.
Characterizing output bottlenecks in a supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Bing; Chase, Jeffrey; Dillow, David A
2012-01-01
Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic,more » contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.« less
Supercomputer applications in molecular modeling.
Gund, T M
1988-01-01
An overview of the functions performed by molecular modeling is given. Molecular modeling techniques benefiting from supercomputing are described, namely, conformation, search, deriving bioactive conformations, pharmacophoric pattern searching, receptor mapping, and electrostatic properties. The use of supercomputers for problems that are computationally intensive, such as protein structure prediction, protein dynamics and reactivity, protein conformations, and energetics of binding is also examined. The current status of supercomputing and supercomputer resources are discussed.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
TOP500 Supercomputers for June 2003
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
2003-06-23
21st Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 21st edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2003). The Earth Simulator supercomputer built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan, with its Linpack benchmark performance of 35.86 Tflop/s (teraflops or trillions of calculations per second), retains the number one position. The number 2 position is held by the re-measured ASCI Q system at Los Alamosmore » National Laboratory. With 13.88 Tflop/s, it is the second system ever to exceed the 10 Tflop/smark. ASCIQ was built by Hewlett-Packard and is based on the AlphaServerSC computer system.« less
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
NASA Astrophysics Data System (ADS)
Watari, S.; Morikawa, Y.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Kato, H.; Shimojo, S.; Murata, K. T.
2010-12-01
In the Solar-Terrestrial Physics (STP) field, spatio-temporal resolution of computer simulations is getting higher and higher because of tremendous advancement of supercomputers. A more advanced technology is Grid Computing that integrates distributed computational resources to provide scalable computing resources. In the simulation research, it is effective that a researcher oneself designs his physical model, performs calculations with a supercomputer, and analyzes and visualizes for consideration by a familiar method. A supercomputer is far from an analysis and visualization environment. In general, a researcher analyzes and visualizes in the workstation (WS) managed at hand because the installation and the operation of software in the WS are easy. Therefore, it is necessary to copy the data from the supercomputer to WS manually. Time necessary for the data transfer through long delay network disturbs high-accuracy simulations actually. In terms of usefulness, integrating a supercomputer and an analysis and visualization environment seamlessly with a researcher's familiar method is important. NICT has been developing a cloud computing environment (NICT Space Weather Cloud). In the NICT Space Weather Cloud, disk servers are located near its supercomputer and WSs for data analysis and visualization. They are connected to JGN2plus that is high-speed network for research and development. Distributed virtual high-capacity storage is also constructed by Grid Datafarm (Gfarm v2). Huge-size data output from the supercomputer is transferred to the virtual storage through JGN2plus. A researcher can concentrate on the research by a familiar method without regard to distance between a supercomputer and an analysis and visualization environment. Now, total 16 disk servers are setup in NICT headquarters (at Koganei, Tokyo), JGN2plus NOC (at Otemachi, Tokyo), Okinawa Subtropical Environment Remote-Sensing Center, and Cybermedia Center, Osaka University. They are connected on JGN2plus, and they constitute 1PB (physical size) virtual storage by Gfarm v2. These disk servers are connected with supercomputers of NICT and Osaka University. A system that data output from the supercomputers are automatically transferred to the virtual storage had been built up. Transfer rate is about 50 GB/hrs by actual measurement. It is estimated that the performance is reasonable for a certain simulation and analysis for reconstruction of coronal magnetic field. This research is assumed an experiment of the system, and the verification of practicality is advanced at the same time. Herein we introduce an overview of the space weather cloud system so far we have developed. We also demonstrate several scientific results using the space weather cloud system. We also introduce several web applications of the cloud as a service of the space weather cloud, which is named as "e-SpaceWeather" (e-SW). The e-SW provides with a variety of space weather online services from many aspects.
A performance comparison of the Cray-2 and the Cray X-MP
NASA Technical Reports Server (NTRS)
Schmickley, Ronald; Bailey, David H.
1986-01-01
A suite of thirteen large Fortran benchmark codes were run on Cray-2 and Cray X-MP supercomputers. These codes were a mix of compute-intensive scientific application programs (mostly Computational Fluid Dynamics) and some special vectorized computation exercise programs. For the general class of programs tested on the Cray-2, most of which were not specially tuned for speed, the floating point operation rates varied under a variety of system load configurations from 40 percent up to 125 percent of X-MP performance rates. It is concluded that the Cray-2, in the original system configuration studied (without memory pseudo-banking) will run untuned Fortran code, on average, about 70 percent of X-MP speeds.
Parallel-Vector Algorithm For Rapid Structural Anlysis
NASA Technical Reports Server (NTRS)
Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.
1993-01-01
New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
High Performance Computing at NASA
NASA Technical Reports Server (NTRS)
Bailey, David H.; Cooper, D. M. (Technical Monitor)
1994-01-01
The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Timothy J.
2016-03-01
While benchmarking software is useful for testing the performance limits and stability of Argonne National Laboratory’s new Theta supercomputer, there is no substitute for running real applications to explore the system’s potential. The Argonne Leadership Computing Facility’s Theta Early Science Program, modeled after its highly successful code migration program for the Mira supercomputer, has one primary aim: to deliver science on day one. Here is a closer look at the type of science problems that will be getting early access to Theta, a next-generation machine being rolled out this year.
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.
Benchmarking and tuning the MILC code on clusters and supercomputers
NASA Astrophysics Data System (ADS)
Gottlieb, Steven
2002-03-01
Recently, we have benchmarked and tuned the MILC code on a number of architectures including Intel Itanium and Pentium IV (PIV), dual-CPU Athlon, and the latest Compaq Alpha nodes. Results will be presented for many of these, and we shall discuss some simple code changes that can result in a very dramatic speedup of the KS conjugate gradient on processors with more advanced memory systems such as PIV, IBM SP and Alpha.
Benchmarking and tuning the MILC code on clusters and supercomputers
NASA Astrophysics Data System (ADS)
Gottlieb, Steven
Recently, we have benchmarked and tuned the MILC code on a number of architectures including Intel Itanium and Pentium IV (PIV), dual-CPU Athlon, and the latest Compaq Alpha nodes. Results will be presented for many of these, and we shall discuss some simple code changes that can result in a very dramatic speedup of the KS conjugate gradient on processors with more advanced memory systems such as PIV, IBM SP and Alpha.
Present Status and Extensions of the Monte Carlo Performance Benchmark
NASA Astrophysics Data System (ADS)
Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.
2014-06-01
The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
20th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 20th edition of the TOP500 list of the world's fastest supercomputers was released today (November 15, 2002). The Earth Simulator supercomputer installed earlier this year at the Earth Simulator Center in Yokohama, Japan, is with its Linpack benchmark performance of 35.86 Tflop/s (trillions of calculations per second) retains the number one position. The No.2 and No.3 positions are held by two new, identical ASCI Q systems at Los Alamos National Laboratorymore » (7.73Tflop/s each). These systems are built by Hewlett-Packard and based on the Alpha Server SC computer system.« less
Federated data storage and management infrastructure
NASA Astrophysics Data System (ADS)
Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.
2016-10-01
The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.
NASA Astrophysics Data System (ADS)
Belyaev, A.; Berezhnaya, A.; Betev, L.; Buncic, P.; De, K.; Drizhuk, D.; Klimentov, A.; Lazin, Y.; Lyalin, I.; Mashinistov, R.; Novikov, A.; Oleynik, D.; Polyakov, A.; Poyda, A.; Ryabinkin, E.; Teslyuk, A.; Tkachenko, I.; Yasnopolskiy, L.
2015-12-01
The LHC experiments are preparing for the precision measurements and further discoveries that will be made possible by higher LHC energies from April 2015 (LHC Run2). The need for simulation, data processing and analysis would overwhelm the expected capacity of grid infrastructure computing facilities deployed by the Worldwide LHC Computing Grid (WLCG). To meet this challenge the integration of the opportunistic resources into LHC computing model is highly important. The Tier-1 facility at Kurchatov Institute (NRC-KI) in Moscow is a part of WLCG and it will process, simulate and store up to 10% of total data obtained from ALICE, ATLAS and LHCb experiments. In addition Kurchatov Institute has supercomputers with peak performance 0.12 PFLOPS. The delegation of even a fraction of supercomputing resources to the LHC Computing will notably increase total capacity. In 2014 the development a portal combining a Tier-1 and a supercomputer in Kurchatov Institute was started to provide common interfaces and storage. The portal will be used not only for HENP experiments, but also by other data- and compute-intensive sciences like biology with genome sequencing analysis; astrophysics with cosmic rays analysis, antimatter and dark matter search, etc.
Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun
2012-01-01
Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
Long-Term file activity patterns in a UNIX workstation environment
NASA Technical Reports Server (NTRS)
Gibson, Timothy J.; Miller, Ethan L.
1998-01-01
As mass storage technology becomes more affordable for sites smaller than supercomputer centers, understanding their file access patterns becomes crucial for developing systems to store rarely used data on tertiary storage devices such as tapes and optical disks. This paper presents a new way to collect and analyze file system statistics for UNIX-based file systems. The collection system runs in user-space and requires no modification of the operating system kernel. The statistics package provides details about file system operations at the file level: creations, deletions, modifications, etc. The paper analyzes four months of file system activity on a university file system. The results confirm previously published results gathered from supercomputer file systems, but differ in several important areas. Files in this study were considerably smaller than those at supercomputer centers, and they were accessed less frequently. Additionally, the long-term creation rate on workstation file systems is sufficiently low so that all data more than a day old could be cheaply saved on a mass storage device, allowing the integration of time travel into every file system.
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schulz, Roland; Lindner, Benjamin; Petridis, Loukas
2009-01-01
A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors,more » other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million atom biological systems scale well up to 30k cores, producing 30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.« less
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer.
Schulz, Roland; Lindner, Benjamin; Petridis, Loukas; Smith, Jeremy C
2009-10-13
A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors, other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million-atom biological systems scale well up to ∼30k cores, producing ∼30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
NAS-current status and future plans
NASA Technical Reports Server (NTRS)
Bailey, F. R.
1987-01-01
The Numerical Aerodynamic Simulation (NAS) has met its first major milestone, the NAS Processing System Network (NPSN) Initial Operating Configuration (IOC). The program has met its goal of providing a national supercomputer facility capable of greatly enhancing the Nation's research and development efforts. Furthermore, the program is fulfilling its pathfinder role by defining and implementing a paradigm for supercomputing system environments. The IOC is only the begining and the NAS Program will aggressively continue to develop and implement emerging supercomputer, communications, storage, and software technologies to strengthen computations as a critical element in supporting the Nation's leadership role in aeronautics.
Scaling of data communications for an advanced supercomputer network
NASA Technical Reports Server (NTRS)
Levin, E.; Eaton, C. K.; Young, Bruce
1986-01-01
The goal of NASA's Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations and by remote communication to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. The implications of a projected 20-fold increase in processing power on the data communications requirements are described.
Rethinking key–value store for parallel I/O optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kougkas, Anthony; Eslami, Hassan; Sun, Xian-He
2015-01-26
Key-value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key-value stores. We propose using key-value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application.more » We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key-value stores.« less
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
A vectorized Lanczos eigensolver for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1990-01-01
The computational strategies used to implement a Lanczos-based-method eigensolver on the latest generation of supercomputers are described. Several examples of structural vibration and buckling problems are presented that show the effects of using optimization techniques to increase the vectorization of the computational steps. The data storage and access schemes and the tools and strategies that best exploit the computer resources are presented. The method is implemented on the Convex C220, the Cray 2, and the Cray Y-MP computers. Results show that very good computation rates are achieved for the most computationally intensive steps of the Lanczos algorithm and that the Lanczos algorithm is many times faster than other methods extensively used in the past.
PDS: A Performance Database Server
Berry, Michael W.; Dongarra, Jack J.; Larose, Brian H.; ...
1994-01-01
The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly available central depository of performance data for all ranges of machines from personal computers to supercomputers. We present an Internet-accessible performance database server (PDS) that can be used to extract current benchmark data and literature. As an extension to the X-Windows-based user interface (Xnetlib) to the Netlib archival system, PDS provides an on-line catalog of public domain computer benchmarks such as the LINPACK benchmark, Perfect benchmarks, and the NAS parallelmore » benchmarks. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. We believe that all branches (research laboratories, academia, and industry) of the general computing community can use this facility to archive performance metrics and make them readily available to the public. PDS can provide a more manageable approach to the development and support of a large dynamic database of published performance metrics.« less
Template Interfaces for Agile Parallel Data-Intensive Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramakrishnan, Lavanya; Gunter, Daniel; Pastorello, Gilerto Z.
Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. DOE User Facilities and large science collaborations are increasingly generating large enough data sets that it is no longer practical to download them to a desktop to operate on them. They are instead stored at centralized compute and storage resources such as high performance computing (HPC) centers. Analysis of this data requires an ability to run on these facilities, but with current technologies, scaling an analysis to an HPC center and to a large data set is difficult even for experts. Tigres ismore » addressing the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable "templates" that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.« less
NASA Astrophysics Data System (ADS)
Fukazawa, K.; Walker, R. J.; Kimura, T.; Tsuchiya, F.; Murakami, G.; Kita, H.; Tao, C.; Murata, K. T.
2016-12-01
Planetary magnetospheres are very large, while phenomena within them occur on meso- and micro-scales. These scales range from 10s of planetary radii to kilometers. To understand dynamics in these multi-scale systems, numerical simulations have been performed by using the supercomputer systems. We have studied the magnetospheres of Earth, Jupiter and Saturn by using 3-dimensional magnetohydrodynamic (MHD) simulations for a long time, however, we have not obtained the phenomena near the limits of the MHD approximation. In particular, we have not studied meso-scale phenomena that can be addressed by using MHD.Recently we performed our MHD simulation of Earth's magnetosphere by using the K-computer which is the first 10PFlops supercomputer and obtained multi-scale flow vorticity for the both northward and southward IMF. Furthermore, we have access to supercomputer systems which have Xeon, SPARC64, and vector-type CPUs and can compare simulation results between the different systems. Finally, we have compared the results of our parameter survey of the magnetosphere with observations from the HISAKI spacecraft.We have encountered a number of difficulties effectively using the latest supercomputer systems. First the size of simulation output increases greatly. Now a simulation group produces over 1PB of output. Storage and analysis of this much data is difficult. The traditional way to analyze simulation results is to move the results to the investigator's home computer. This takes over three months using an end-to-end 10Gbps network. In reality, there are problems at some nodes such as firewalls that can increase the transfer time to over one year. Another issue is post-processing. It is hard to treat a few TB of simulation output due to the memory limitations of a post-processing computer. To overcome these issues, we have developed and introduced the parallel network storage, the highly efficient network protocol and the CUI based visualization tools.In this study, we will show the latest simulation results using the petascale supercomputer and problems from the use of these supercomputer systems.
NASA Technical Reports Server (NTRS)
Babrauckas, Theresa
2000-01-01
The Affordable High Performance Computing (AHPC) project demonstrated that high-performance computing based on a distributed network of computer workstations is a cost-effective alternative to vector supercomputers for running CPU and memory intensive design and analysis tools. The AHPC project created an integrated system called a Network Supercomputer. By connecting computer work-stations through a network and utilizing the workstations when they are idle, the resulting distributed-workstation environment has the same performance and reliability levels as the Cray C90 vector Supercomputer at less than 25 percent of the C90 cost. In fact, the cost comparison between a Cray C90 Supercomputer and Sun workstations showed that the number of distributed networked workstations equivalent to a C90 costs approximately 8 percent of the C90.
Predicting Cost/Performance Trade-Offs for Whitney: A Commodity Computing Cluster
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.; Nitzberg, Bill; VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)
1997-01-01
Recent advances in low-end processor and network technology have made it possible to build a "supercomputer" out of commodity components. We develop simple models of the NAS Parallel Benchmarks version 2 (NPB 2) to explore the cost/performance trade-offs involved in building a balanced parallel computer supporting a scientific workload. We develop closed form expressions detailing the number and size of messages sent by each benchmark. Coupling these with measured single processor performance, network latency, and network bandwidth, our models predict benchmark performance to within 30%. A comparison based on total system cost reveals that current commodity technology (200 MHz Pentium Pros with 100baseT Ethernet) is well balanced for the NPBs up to a total system cost of around $1,000,000.
Investigation of Storage Options for Scientific Computing on Grid and Cloud Facilities
NASA Astrophysics Data System (ADS)
Garzoglio, Gabriele
2012-12-01
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storage server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on “bare metal” nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Turney, Raymond D.
2001-01-01
This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
NAS technical summaries: Numerical aerodynamic simulation program, March 1991 - February 1992
NASA Technical Reports Server (NTRS)
1992-01-01
NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefiting other supercomputer centers in Government and industry. This report contains selected scientific results from the 1991-92 NAS Operational Year, March 4, 1991 to March 3, 1992, which is the fifth year of operation. During this year, the scientific community was given access to a Cray-2 and a Cray Y-MP. The Cray-2, the first generation supercomputer, has four processors, 256 megawords of central memory, and a total sustained speed of 250 million floating point operations per second. The Cray Y-MP, the second generation supercomputer, has eight processors and a total sustained speed of one billion floating point operations per second. Additional memory was installed this year, doubling capacity from 128 to 256 megawords of solid-state storage-device memory. Because of its higher performance, the Cray Y-MP delivered approximately 77 percent of the total number of supercomputer hours used during this year.
Unstructured Adaptive Meshes: Bad for Your Memory?
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob
2003-01-01
This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
A CPU benchmark for protein crystallographic refinement.
Bourne, P E; Hendrickson, W A
1990-01-01
The CPU time required to complete a cycle of restrained least-squares refinement of a protein structure from X-ray crystallographic data using the FORTRAN codes PROTIN and PROLSQ are reported for 48 different processors, ranging from single-user workstations to supercomputers. Sequential, vector, VLIW, multiprocessor, and RISC hardware architectures are compared using both a small and a large protein structure. Representative compile times for each hardware type are also given, and the improvement in run-time when coding for a specific hardware architecture considered. The benchmarks involve scalar integer and vector floating point arithmetic and are representative of the calculations performed in many scientific disciplines.
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
A Look at the Impact of High-End Computing Technologies on NASA Missions
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Dunbar, Jill; Hardman, John; Bailey, F. Ron; Wheeler, Lorien; Rogers, Stuart
2012-01-01
From its bold start nearly 30 years ago and continuing today, the NASA Advanced Supercomputing (NAS) facility at Ames Research Center has enabled remarkable breakthroughs in the space agency s science and engineering missions. Throughout this time, NAS experts have influenced the state-of-the-art in high-performance computing (HPC) and related technologies such as scientific visualization, system benchmarking, batch scheduling, and grid environments. We highlight the pioneering achievements and innovations originating from and made possible by NAS resources and know-how, from early supercomputing environment design and software development, to long-term simulation and analyses critical to design safe Space Shuttle operations and associated spinoff technologies, to the highly successful Kepler Mission s discovery of new planets now capturing the world s imagination.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; NIkam, Akshay; Lai, Pai-Wei
Tensor contractions represent the most compute-intensive core kernels in ab initio computational quantum chemistry and nuclear physics. Symmetries in these tensor contractions makes them difficult to load balance and scale to large distributed systems. In this paper, we develop an efficient and scalable algorithm to contract symmetric tensors. We introduce a novel approach that avoids data redistribution in contracting symmetric tensors while also avoiding redundant storage and maintaining load balance. We present experimental results on two parallel supercomputers for several symmetric contractions that appear in the CCSD quantum chemistry method. We also present a novel approach to tensor redistribution thatmore » can take advantage of parallel hyperplanes when the initial distribution has replicated dimensions, and use collective broadcast when the final distribution has replicated dimensions, making the algorithm very efficient.« less
Measurements over distributed high performance computing and storage systems
NASA Technical Reports Server (NTRS)
Williams, Elizabeth; Myers, Tom
1993-01-01
A strawman proposal is given for a framework for presenting a common set of metrics for supercomputers, workstations, file servers, mass storage systems, and the networks that interconnect them. Production control and database systems are also included. Though other applications and third part software systems are not addressed, it is important to measure them as well.
Trajectory NG: portable, compressed, general molecular dynamics trajectories.
Spångberg, Daniel; Larsson, Daniel S D; van der Spoel, David
2011-10-01
We present general algorithms for the compression of molecular dynamics trajectories. The standard ways to store MD trajectories as text or as raw binary floating point numbers result in very large files when efficient simulation programs are used on supercomputers. Our algorithms are based on the observation that differences in atomic coordinates/velocities, in either time or space, are generally smaller than the absolute values of the coordinates/velocities. Also, it is often possible to store values at a lower precision. We apply several compression schemes to compress the resulting differences further. The most efficient algorithms developed here use a block sorting algorithm in combination with Huffman coding. Depending on the frequency of storage of frames in the trajectory, either space, time, or combinations of space and time differences are usually the most efficient. We compare the efficiency of our algorithms with each other and with other algorithms present in the literature for various systems: liquid argon, water, a virus capsid solvated in 15 mM aqueous NaCl, and solid magnesium oxide. We perform tests to determine how much precision is necessary to obtain accurate structural and dynamic properties, as well as benchmark a parallelized implementation of the algorithms. We obtain compression ratios (compared to single precision floating point) of 1:3.3-1:35 depending on the frequency of storage of frames and the system studied.
Investigation of storage options for scientific computing on Grid and Cloud facilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storagemore » server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on bare metal nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.« less
NASA Technical Reports Server (NTRS)
Deardorff, Glenn; Djomehri, M. Jahed; Freeman, Ken; Gambrel, Dave; Green, Bryan; Henze, Chris; Hinke, Thomas; Hood, Robert; Kiris, Cetin; Moran, Patrick;
2001-01-01
A series of NASA presentations for the Supercomputing 2001 conference are summarized. The topics include: (1) Mars Surveyor Landing Sites "Collaboratory"; (2) Parallel and Distributed CFD for Unsteady Flows with Moving Overset Grids; (3) IP Multicast for Seamless Support of Remote Science; (4) Consolidated Supercomputing Management Office; (5) Growler: A Component-Based Framework for Distributed/Collaborative Scientific Visualization and Computational Steering; (6) Data Mining on the Information Power Grid (IPG); (7) Debugging on the IPG; (8) Debakey Heart Assist Device: (9) Unsteady Turbopump for Reusable Launch Vehicle; (10) Exploratory Computing Environments Component Framework; (11) OVERSET Computational Fluid Dynamics Tools; (12) Control and Observation in Distributed Environments; (13) Multi-Level Parallelism Scaling on NASA's Origin 1024 CPU System; (14) Computing, Information, & Communications Technology; (15) NAS Grid Benchmarks; (16) IPG: A Large-Scale Distributed Computing and Data Management System; and (17) ILab: Parameter Study Creation and Submission on the IPG.
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.
Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming
2017-06-16
Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
NASA Technical Reports Server (NTRS)
Saini, Subhash; Hood, Robert T.; Chang, Johnny; Baron, John
2016-01-01
We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5- 2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.
Extreme I/O on HPC for HEP using the Burst Buffer at NERSC
NASA Astrophysics Data System (ADS)
Bhimji, Wahid; Bard, Debbie; Burleigh, Kaylan; Daley, Chris; Farrell, Steve; Fasel, Markus; Friesen, Brian; Gerhardt, Lisa; Liu, Jialin; Nugent, Peter; Paul, Dave; Porter, Jeff; Tsulaia, Vakho
2017-10-01
In recent years there has been increasing use of HPC facilities for HEP experiments. This has initially focussed on less I/O intensive workloads such as generator-level or detector simulation. We now demonstrate the efficient running of I/O-heavy analysis workloads on HPC facilities at NERSC, for the ATLAS and ALICE LHC collaborations as well as astronomical image analysis for DESI and BOSS. To do this we exploit a new 900 TB NVRAM-based storage system recently installed at NERSC, termed a Burst Buffer. This is a novel approach to HPC storage that builds on-demand filesystems on all-SSD hardware that is placed on the high-speed network of the new Cori supercomputer. We describe the hardware and software involved in this system, and give an overview of its capabilities, before focusing in detail on how the ATLAS, ALICE and astronomical workflows were adapted to work on this system. We describe these modifications and the resulting performance results, including comparisons to other filesystems. We demonstrate that we can meet the challenging I/O requirements of HEP experiments and scale to many thousands of cores accessing a single shared storage system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De, K; Jha, S; Klimentov, A
2016-01-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), MIRA supercomputer at Argonne Leadership Computing Facilities (ALCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava and others). Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full production for the ATLAS experiment since September 2015. We will present our current accomplishments with running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.« less
Benchmarking Memory Performance with the Data Cube Operator
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Shabanov, Leonid V.
2004-01-01
Data movement across a computer memory hierarchy and across computational grids is known to be a limiting factor for applications processing large data sets. We use the Data Cube Operator on an Arithmetic Data Set, called ADC, to benchmark capabilities of computers and of computational grids to handle large distributed data sets. We present a prototype implementation of a parallel algorithm for computation of the operatol: The algorithm follows a known approach for computing views from the smallest parent. The ADC stresses all levels of grid memory and storage by producing some of 2d views of an Arithmetic Data Set of d-tuples described by a small number of integers. We control data intensity of the ADC by selecting the tuple parameters, the sizes of the views, and the number of realized views. Benchmarking results of memory performance of a number of computer architectures and of a small computational grid are presented.
Technical Report: Installed Cost Benchmarks and Deployment Barriers for
Cost Benchmarks and Deployment Barriers for Residential Solar Photovoltaics with Energy Storage Q1 2016 Installed Cost Benchmarks and Deployment Barriers for Residential Solar with Energy Storage Researchers from NREL published a report that provides detailed component and system-level cost breakdowns for
NASA Technical Reports Server (NTRS)
Salmon, Ellen
1996-01-01
The data storage and retrieval demands of space and Earth sciences researchers have made the NASA Center for Computational Sciences (NCCS) Mass Data Storage and Delivery System (MDSDS) one of the world's most active Convex UniTree systems. Science researchers formed the NCCS's Computer Environments and Research Requirements Committee (CERRC) to relate their projected supercomputing and mass storage requirements through the year 2000. Using the CERRC guidelines and observations of current usage, some detailed projections of requirements for MDSDS network bandwidth and mass storage capacity and performance are presented.
Mass Storage System Upgrades at the NASA Center for Computational Sciences
NASA Technical Reports Server (NTRS)
Tarshish, Adina; Salmon, Ellen; Macie, Medora; Saletta, Marty
2000-01-01
The NASA Center for Computational Sciences (NCCS) provides supercomputing and mass storage services to over 1200 Earth and space scientists. During the past two years, the mass storage system at the NCCS went through a great deal of changes both major and minor. Tape drives, silo control software, and the mass storage software itself were upgraded, and the mass storage platform was upgraded twice. Some of these upgrades were aimed at achieving year-2000 compliance, while others were simply upgrades to newer and better technologies. In this paper we will describe these upgrades.
Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Younge, Andrew J.; Pedretti, Kevin; Grant, Ryan
While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In thismore » paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muller, U.A.; Baumle, B.; Kohler, P.
1992-10-01
Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.
A History of High-Performance Computing
NASA Technical Reports Server (NTRS)
2006-01-01
Faster than most speedy computers. More powerful than its NASA data-processing predecessors. Able to leap large, mission-related computational problems in a single bound. Clearly, it s neither a bird nor a plane, nor does it need to don a red cape, because it s super in its own way. It's Columbia, NASA s newest supercomputer and one of the world s most powerful production/processing units. Named Columbia to honor the STS-107 Space Shuttle Columbia crewmembers, the new supercomputer is making it possible for NASA to achieve breakthroughs in science and engineering, fulfilling the Agency s missions, and, ultimately, the Vision for Space Exploration. Shortly after being built in 2004, Columbia achieved a benchmark rating of 51.9 teraflop/s on 10,240 processors, making it the world s fastest operational computer at the time of completion. Putting this speed into perspective, 20 years ago, the most powerful computer at NASA s Ames Research Center, home of the NASA Advanced Supercomputing Division (NAS), ran at a speed of about 1 gigaflop (one billion calculations per second). The Columbia supercomputer is 50,000 times faster than this computer and offers a tenfold increase in capacity over the prior system housed at Ames. What s more, Columbia is considered the world s largest Linux-based, shared-memory system. The system is offering immeasurable benefits to society and is the zenith of years of NASA/private industry collaboration that has spawned new generations of commercial, high-speed computing systems.
Toward Scalable Benchmarks for Mass Storage Systems
NASA Technical Reports Server (NTRS)
Miller, Ethan L.
1996-01-01
This paper presents guidelines for the design of a mass storage system benchmark suite, along with preliminary suggestions for programs to be included. The benchmarks will measure both peak and sustained performance of the system as well as predicting both short- and long-term behavior. These benchmarks should be both portable and scalable so they may be used on storage systems from tens of gigabytes to petabytes or more. By developing a standard set of benchmarks that reflect real user workload, we hope to encourage system designers and users to publish performance figures that can be compared with those of other systems. This will allow users to choose the system that best meets their needs and give designers a tool with which they can measure the performance effects of improvements to their systems.
Hurricane Intensity Forecasts with a Global Mesoscale Model on the NASA Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Shen, Bo-Wen; Tao, Wei-Kuo; Atlas, Robert
2006-01-01
It is known that General Circulation Models (GCMs) have insufficient resolution to accurately simulate hurricane near-eye structure and intensity. The increasing capabilities of high-end computers (e.g., the NASA Columbia Supercomputer) have changed this. In 2004, the finite-volume General Circulation Model at a 1/4 degree resolution, doubling the resolution used by most of operational NWP center at that time, was implemented and run to obtain promising landfall predictions for major hurricanes (e.g., Charley, Frances, Ivan, and Jeanne). In 2005, we have successfully implemented the 1/8 degree version, and demonstrated its performance on intensity forecasts with hurricane Katrina (2005). It is found that the 1/8 degree model is capable of simulating the radius of maximum wind and near-eye wind structure, and thereby promising intensity forecasts. In this study, we will further evaluate the model s performance on intensity forecasts of hurricanes Ivan, Jeanne, Karl in 2004. Suggestions for further model development will be made in the end.
NASA Astrophysics Data System (ADS)
Klimentov, A.; De, K.; Jha, S.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Wells, J.; Wenaus, T.
2016-10-01
The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than grid can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility. Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full pro duction for the ATLAS since September 2015. We will present our current accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.
RISC Processors and High Performance Computing
NASA Technical Reports Server (NTRS)
Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)
1995-01-01
In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
Optimizing Scientist Time through In Situ Visualization and Analysis.
Patchett, John; Ahrens, James
2018-01-01
In situ processing produces reduced size persistent representations of a simulations state while the simulation is running. The need for in situ visualization and data analysis is usually described in terms of supercomputer size and performance in relation to available storage size.
NASA Astrophysics Data System (ADS)
Tripathi, Vijay S.; Yeh, G. T.
1993-06-01
Sophisticated and highly computation-intensive models of transport of reactive contaminants in groundwater have been developed in recent years. Application of such models to real-world contaminant transport problems, e.g., simulation of groundwater transport of 10-15 chemically reactive elements (e.g., toxic metals) and relevant complexes and minerals in two and three dimensions over a distance of several hundred meters, requires high-performance computers including supercomputers. Although not widely recognized as such, the computational complexity and demand of these models compare with well-known computation-intensive applications including weather forecasting and quantum chemical calculations. A survey of the performance of a variety of available hardware, as measured by the run times for a reactive transport model HYDROGEOCHEM, showed that while supercomputers provide the fastest execution times for such problems, relatively low-cost reduced instruction set computer (RISC) based scalar computers provide the best performance-to-price ratio. Because supercomputers like the Cray X-MP are inherently multiuser resources, often the RISC computers also provide much better turnaround times. Furthermore, RISC-based workstations provide the best platforms for "visualization" of groundwater flow and contaminant plumes. The most notable result, however, is that current workstations costing less than $10,000 provide performance within a factor of 5 of a Cray X-MP.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Pope, Adrian; Finkel, Hal; Frontiere, Nicholas; Heitmann, Katrin; Daniel, David; Fasel, Patricia; Morozov, Vitali; Zagaris, George; Peterka, Tom; Vishwanath, Venkatram; Lukić, Zarija; Sehrish, Saba; Liao, Wei-keng
2016-01-01
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Salman; Pope, Adrian; Finkel, Hal
2016-01-01
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the ‘Dark Universe’, dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers thatmore » enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC’s design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.« less
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Personal supercomputing by using transputer and Intel 80860 in plasma engineering
NASA Astrophysics Data System (ADS)
Ido, S.; Aoki, K.; Ishine, M.; Kubota, M.
1992-09-01
Transputer (T800) and 64-bit RISC Intel 80860 (i860) added on a personal computer can be used as an accelerator. When 32-bit T800s in a parallel system or 64-bit i860s are used, scientific calculations are carried out several ten times as fast as in the case of commonly used 32-bit personal computers or UNIX workstations. Benchmark tests and examples of physical simulations using T800s and i860 are reported.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
NAS Parallel Benchmark Results 11-96. 1.0
NASA Technical Reports Server (NTRS)
Bailey, David H.; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
The NAS Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a "pencil and paper" fashion. In other words, the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. These results represent the best results that have been reported to us by the vendors for the specific 3 systems listed. In this report, we present new NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, SGI Origin200, and SGI Origin2000. We also report High Performance Fortran (HPF) based NPB results for IBM SP2 Wide Nodes, HP/Convex Exemplar SPP2000, and SGI/CRAY T3D. These results have been submitted by Applied Parallel Research (APR) and Portland Group Inc. (PGI). We also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
Architecture and method for a burst buffer using flash technology
Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing-bung
2016-03-15
A parallel supercomputing cluster includes compute nodes interconnected in a mesh of data links for executing an MPI job, and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage. Each solid-state storage node presents a file system interface to the MPI job, and multiple MPI processes of the MPI job write the checkpoint data to a shared file in the solid-state storage in a strided fashion, and the solid-state storage node asynchronously migrates the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writes the checkpoint data to the magnetic disk storage in a sequential fashion.
NASA Technical Reports Server (NTRS)
Shen, B.-W.; Atlas, R.; Reale, O.; Lin, S.-J.; Chern, J.-D.; Chang, J.; Henze, C.
2006-01-01
Hurricane Katrina was the sixth most intense hurricane in the Atlantic. Katrina's forecast poses major challenges, the most important of which is its rapid intensification. Hurricane intensity forecast with General Circulation Models (GCMs) is difficult because of their coarse resolution. In this article, six 5-day simulations with the ultra-high resolution finite-volume GCM are conducted on the NASA Columbia supercomputer to show the effects of increased resolution on the intensity predictions of Katrina. It is found that the 0.125 degree runs give comparable tracks to the 0.25 degree, but provide better intensity forecasts, bringing the center pressure much closer to observations with differences of only plus or minus 12 hPa. In the runs initialized at 1200 UTC 25 AUG, the 0.125 degree simulates a more realistic intensification rate and better near-eye wind distributions. Moreover, the first global 0.125 degree simulation without convection parameterization (CP) produces even better intensity evolution and near-eye winds than the control run with CP.
Argonne wins four R&D 100 Awards | Argonne National Laboratory
. High-Energy Concentration-Gradient Cathode Material for Plug-in Hybrids and All-Electric Vehicles converting discovery science into innovative, high-impact products, processes and systems." Globus scientific facilities (such as supercomputing centers and high energy physics experiments), cloud storage
Data communication requirements for the advanced NAS network
NASA Technical Reports Server (NTRS)
Levin, Eugene; Eaton, C. K.; Young, Bruce
1986-01-01
The goal of the Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations, and by remote communications to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. In the 1987/1988 time period it is anticipated that a computer with 4 times the processing speed of a Cray 2 will be obtained and by 1990 an additional supercomputer with 16 times the speed of the Cray 2. The implications of this 20-fold increase in processing power on the data communications requirements are described. The analysis was based on models of the projected workload and system architecture. The results are presented together with the estimates of their sensitivity to assumptions inherent in the models.
Close to real life. [solving for transonic flow about lifting airfoils using supercomputers
NASA Technical Reports Server (NTRS)
Peterson, Victor L.; Bailey, F. Ron
1988-01-01
NASA's Numerical Aerodynamic Simulation (NAS) facility for CFD modeling of highly complex aerodynamic flows employs as its basic hardware two Cray-2s, an ETA-10 Model Q, an Amdahl 5880 mainframe computer that furnishes both support processing and access to 300 Gbytes of disk storage, several minicomputers and superminicomputers, and a Thinking Machines 16,000-device 'connection machine' processor. NAS, which was the first supercomputer facility to standardize operating-system and communication software on all processors, has done important Space Shuttle aerodynamics simulations and will be critical to the configurational refinement of the National Aerospace Plane and its intergrated powerplant, which will involve complex, high temperature reactive gasdynamic computations.
HACC: Extreme Scaling and Performance Across Diverse Architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin
2013-11-01
Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.
Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio
2014-07-05
A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greiner, Miles
Radial hydride formation in high-burnup used fuel cladding has the potential to radically reduce its ductility and suitability for long-term storage and eventual transport. To avoid this formation, the maximum post-reactor temperature must remain sufficiently low to limit the cladding hoop stress, and so that hydrogen from the existing circumferential hydrides will not dissolve and become available to re-precipitate into radial hydrides under the slow cooling conditions during drying, transfer and early dry-cask storage. The objective of this research is to develop and experimentallybenchmark computational fluid dynamics simulations of heat transfer in post-pool-storage drying operations, when high-burnup fuel cladding ismore » likely to experience its highest temperature. These benchmarked tools can play a key role in evaluating dry cask storage systems for extended storage of high-burnup fuels and post-storage transportation, including fuel retrievability. The benchmarked tools will be used to aid the design of efficient drying processes, as well as estimate variations of surface temperatures as a means of inferring helium integrity inside the canister or cask. This work will be conducted effectively because the principal investigator has experience developing these types of simulations, and has constructed a test facility that can be used to benchmark them.« less
High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.
2017-12-01
The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Requirements for a network storage service
NASA Technical Reports Server (NTRS)
Kelly, Suzanne M.; Haynes, Rena A.
1991-01-01
Sandia National Laboratories provides a high performance classified computer network as a core capability in support of its mission of nuclear weapons design and engineering, physical sciences research, and energy research and development. The network, locally known as the Internal Secure Network (ISN), comprises multiple distributed local area networks (LAN's) residing in New Mexico and California. The TCP/IP protocol suite is used for inter-node communications. Scientific workstations and mid-range computers, running UNIX-based operating systems, compose most LAN's. One LAN, operated by the Sandia Corporate Computing Computing Directorate, is a general purpose resource providing a supercomputer and a file server to the entire ISN. The current file server on the supercomputer LAN is an implementation of the Common File Server (CFS). Subsequent to the design of the ISN, Sandia reviewed its mass storage requirements and chose to enter into a competitive procurement to replace the existing file server with one more adaptable to a UNIX/TCP/IP environment. The requirements study for the network was the starting point for the requirements study for the new file server. The file server is called the Network Storage Service (NSS) and its requirements are described. An application or functional description of the NSS is given. The final section adds performance, capacity, and access constraints to the requirements.
NASA Technical Reports Server (NTRS)
Nosenchuck, D. M.; Littman, M. G.
1986-01-01
The Navier-Stokes computer (NSC) has been developed for solving problems in fluid mechanics involving complex flow simulations that require more speed and capacity than provided by current and proposed Class VI supercomputers. The machine is a parallel processing supercomputer with several new architectural elements which can be programmed to address a wide range of problems meeting the following criteria: (1) the problem is numerically intensive, and (2) the code makes use of long vectors. A simulation of two-dimensional nonsteady viscous flows is presented to illustrate the architecture, programming, and some of the capabilities of the NSC.
NASA Astrophysics Data System (ADS)
Voronin, A. A.; Panchenko, V. Ya; Zheltikov, A. M.
2016-06-01
High-intensity ultrashort laser pulses propagating in gas media or in condensed matter undergo complex nonlinear spatiotemporal evolution where temporal transformations of optical field waveforms are strongly coupled to an intricate beam dynamics and ultrafast field-induced ionization processes. At the level of laser peak powers orders of magnitude above the critical power of self-focusing, the beam exhibits modulation instabilities, producing random field hot spots and breaking up into multiple noise-seeded filaments. This problem is described by a (3 + 1)-dimensional nonlinear field evolution equation, which needs to be solved jointly with the equation for ultrafast ionization of a medium. Analysis of this problem, which is equivalent to solving a billion-dimensional evolution problem, is only possible by means of supercomputer simulations augmented with coordinated big-data processing of large volumes of information acquired through theory-guiding experiments and supercomputations. Here, we review the main challenges of supercomputations and big-data processing encountered in strong-field ultrafast optical physics and discuss strategies to confront these challenges.
Understanding the Cray X1 System
NASA Technical Reports Server (NTRS)
Cheung, Samson
2004-01-01
This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
Grid Computing Environment using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Alanis, Fransisco; Mahmood, Akhtar
2003-10-01
Custom-made Beowulf clusters using PCs are currently replacing expensive supercomputers to carry out complex scientific computations. At the University of Texas - Pan American, we built a 8 Gflops Beowulf Cluster for doing HEP research using RedHat Linux 7.3 and the LAM-MPI middleware. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes that were compiled in C on the cluster using the LAM-XMPI graphics user environment. We will demonstrate a "simple" prototype grid environment, where we will submit and run parallel jobs remotely across multiple cluster nodes over the internet from the presentation room at Texas Tech. University. The Sphinx Beowulf Cluster will be used for monte-carlo grid test-bed studies for the LHC-ATLAS high energy physics experiment. Grid is a new IT concept for the next generation of the "Super Internet" for high-performance computing. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
High-Performance Computing User Facility | Computational Science | NREL
User Facility High-Performance Computing User Facility The High-Performance Computing User Facility technologies. Photo of the Peregrine supercomputer The High Performance Computing (HPC) User Facility provides Gyrfalcon Mass Storage System. Access Our HPC User Facility Learn more about these systems and how to access
NASA Technical Reports Server (NTRS)
Shen, B.-W.; Atlas, R.; Reale, O.; Chern, J.-D.; Li, S.-J.; Lee, T.; Chang, J.; Henze, C.; Yeh, K.-S.
2006-01-01
It is known that the General Circulation Models (GCMs) have sufficient resolution to accurately simulate hurricane near-eye structure and intensity. To overcome this limitation, the mesoscale-resolving finite-element GCM (fvGCM) has been experimentally deployed on the NASA Columbia supercomputer, and its performance is evaluated choosing hurricane Katrina as an example in this study. On late August 2005 Katrina underwent two stages of rapid intensification and became the sixth most intense hurricane in the Atlantic. Six 5-day simulations of Katrina at both 0.25 deg and 0.125 deg show comparable track forecasts, but the 0,125 deg runs provide much better intensity forecasts, producing center pressure with errors of only +/- 12 hPa. The 0.125 deg simulates better near-eye wind distributions and a more realistic average intensification rate. A convection parameterization (CP) is one of the major limitations in a GCM, the 0.125 deg run with CP disabled produces very encouraging results.
An analysis of file migration in a UNIX supercomputing environment
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1992-01-01
The super computer center at the National Center for Atmospheric Research (NCAR) migrates large numbers of files to and from its mass storage system (MSS) because there is insufficient space to store them on the Cray supercomputer's local disks. This paper presents an analysis of file migration data collected over two years. The analysis shows that requests to the MSS are periodic, with one day and one week periods. Read requests to the MSS account for the majority of the periodicity; as write requests are relatively constant over the course of a week. Additionally, reads show a far greater fluctuation than writes over a day and week since reads are driven by human users while writes are machine-driven.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wasserman, H.J.
1996-02-01
The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with thatmore » of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.« less
The mass storage testing laboratory at GSFC
NASA Technical Reports Server (NTRS)
Venkataraman, Ravi; Williams, Joel; Michaud, David; Gu, Heng; Kalluri, Atri; Hariharan, P. C.; Kobler, Ben; Behnke, Jeanne; Peavey, Bernard
1998-01-01
Industry-wide benchmarks exist for measuring the performance of processors (SPECmarks), and of database systems (Transaction Processing Council). Despite storage having become the dominant item in computing and IT (Information Technology) budgets, no such common benchmark is available in the mass storage field. Vendors and consultants provide services and tools for capacity planning and sizing, but these do not account for the complete set of metrics needed in today's archives. The availability of automated tape libraries, high-capacity RAID systems, and high- bandwidth interconnectivity between processor and peripherals has led to demands for services which traditional file systems cannot provide. File Storage and Management Systems (FSMS), which began to be marketed in the late 80's, have helped to some extent with large tape libraries, but their use has introduced additional parameters affecting performance. The aim of the Mass Storage Test Laboratory (MSTL) at Goddard Space Flight Center is to develop a test suite that includes not only a comprehensive check list to document a mass storage environment but also benchmark code. Benchmark code is being tested which will provide measurements for both baseline systems, i.e. applications interacting with peripherals through the operating system services, and for combinations involving an FSMS. The benchmarks are written in C, and are easily portable. They are initially being aimed at the UNIX Open Systems world. Measurements are being made using a Sun Ultra 170 Sparc with 256MB memory running Solaris 2.5.1 with the following configuration: 4mm tape stacker on SCSI 2 Fast/Wide; 4GB disk device on SCSI 2 Fast/Wide; and Sony Petaserve on Fast/Wide differential SCSI 2.
NASA Astrophysics Data System (ADS)
Leutwyler, David; Fuhrer, Oliver; Cumming, Benjamin; Lapillonne, Xavier; Gysi, Tobias; Lüthi, Daniel; Osuna, Carlos; Schär, Christoph
2014-05-01
The representation of moist convection is a major shortcoming of current global and regional climate models. State-of-the-art global models usually operate at grid spacings of 10-300 km, and therefore cannot fully resolve the relevant upscale and downscale energy cascades. Therefore parametrization of the relevant sub-grid scale processes is required. Several studies have shown that this approach entails major uncertainties for precipitation processes, which raises concerns about the model's ability to represent precipitation statistics and associated feedback processes, as well as their sensitivities to large-scale conditions. Further refining the model resolution to the kilometer scale allows representing these processes much closer to first principles and thus should yield an improved representation of the water cycle including the drivers of extreme events. Although cloud-resolving simulations are very useful tools for climate simulations and numerical weather prediction, their high horizontal resolution and consequently the small time steps needed, challenge current supercomputers to model large domains and long time scales. The recent innovations in the domain of hybrid supercomputers have led to mixed node designs with a conventional CPU and an accelerator such as a graphics processing unit (GPU). GPUs relax the necessity for cache coherency and complex memory hierarchies, but have a larger system memory-bandwidth. This is highly beneficial for low compute intensity codes such as atmospheric stencil-based models. However, to efficiently exploit these hybrid architectures, climate models need to be ported and/or redesigned. Within the framework of the Swiss High Performance High Productivity Computing initiative (HP2C) a project to port the COSMO model to hybrid architectures has recently come to and end. The product of these efforts is a version of COSMO with an improved performance on traditional x86-based clusters as well as hybrid architectures with GPUs. We present our redesign and porting approach as well as our experience and lessons learned. Furthermore, we discuss relevant performance benchmarks obtained on the new hybrid Cray XC30 system "Piz Daint" installed at the Swiss National Supercomputing Centre (CSCS), both in terms of time-to-solution as well as energy consumption. We will demonstrate a first set of short cloud-resolving climate simulations at the European-scale using the GPU-enabled COSMO prototype and elaborate our future plans on how to exploit this new model capability.
New super-computing facility in RIKEN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ohta, Shigemi
1994-12-31
A new superconductor, Fujitsu VPP500/28, was installed in the Institute of Physical and Chemical Research (RIKEN) at the end of March, 1994. It consists of 28 processing elements (PE`s) connected by a high-speed crossbar switch. The switch is a combination of GaAs and ECL circuitry with peak band width of 800 Mbyte per second. Each PE consists of a GaAs/ECL vector processor with 1.6 Gflops peak speed and 256 Mbyte SRAM local memory. In addition, there are 8 GByte DRAM space, two 100 Gbyte RAID disks and a 10 TByte archive based on SONY File Bank system. The author ranmore » three major benchmarks on this machine: modified LINPACK, lattice QCD and FFT. In the modified LINPACK benchmark, a sustained speed of about 28 Gflops is achieved, by removing the restriction on the size of the matrices. In the lattice QCD benchmark, a sustained speed of about 30 Gflops is achieved for inverting staggered fermion propagation matrix on a 32{sup 4} lattice. In the FFT benchmark, real data of 32, 128, 512, and 2048 MByte are Fourier-transformed. The sustained speed for each is respectively 21, 21, 20, and 19 Gflops. The numbers are obtained after only a few weeks of coding efforts and can be improved further.« less
Flux-Level Transit Injection Experiments with NASA Pleiades Supercomputer
NASA Astrophysics Data System (ADS)
Li, Jie; Burke, Christopher J.; Catanzarite, Joseph; Seader, Shawn; Haas, Michael R.; Batalha, Natalie; Henze, Christopher; Christiansen, Jessie; Kepler Project, NASA Advanced Supercomputing Division
2016-06-01
Flux-Level Transit Injection (FLTI) experiments are executed with NASA's Pleiades supercomputer for the Kepler Mission. The latest release (9.3, January 2016) of the Kepler Science Operations Center Pipeline is used in the FLTI experiments. Their purpose is to validate the Analytic Completeness Model (ACM), which can be computed for all Kepler target stars, thereby enabling exoplanet occurrence rate studies. Pleiades, a facility of NASA's Advanced Supercomputing Division, is one of the world's most powerful supercomputers and represents NASA's state-of-the-art technology. We discuss the details of implementing the FLTI experiments on the Pleiades supercomputer. For example, taking into account that ~16 injections are generated by one core of the Pleiades processors in an hour, the “shallow” FLTI experiment, in which ~2000 injections are required per target star, can be done for 16% of all Kepler target stars in about 200 hours. Stripping down the transit search to bare bones, i.e. only searching adjacent high/low periods at high/low pulse durations, makes the computationally intensive FLTI experiments affordable. The design of the FLTI experiments and the analysis of the resulting data are presented in “Validating an Analytic Completeness Model for Kepler Target Stars Based on Flux-level Transit Injection Experiments” by Catanzarite et al. (#2494058).Kepler was selected as the 10th mission of the Discovery Program. Funding for the Kepler Mission has been provided by the NASA Science Mission Directorate.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas
2017-04-01
Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Requirements for a network storage service
NASA Technical Reports Server (NTRS)
Kelly, Suzanne M.; Haynes, Rena A.
1992-01-01
Sandia National Laboratories provides a high performance classified computer network as a core capability in support of its mission of nuclear weapons design and engineering, physical sciences research, and energy research and development. The network, locally known as the Internal Secure Network (ISN), was designed in 1989 and comprises multiple distributed local area networks (LAN's) residing in Albuquerque, New Mexico and Livermore, California. The TCP/IP protocol suite is used for inner-node communications. Scientific workstations and mid-range computers, running UNIX-based operating systems, compose most LAN's. One LAN, operated by the Sandia Corporate Computing Directorate, is a general purpose resource providing a supercomputer and a file server to the entire ISN. The current file server on the supercomputer LAN is an implementation of the Common File System (CFS) developed by Los Alamos National Laboratory. Subsequent to the design of the ISN, Sandia reviewed its mass storage requirements and chose to enter into a competitive procurement to replace the existing file server with one more adaptable to a UNIX/TCP/IP environment. The requirements study for the network was the starting point for the requirements study for the new file server. The file server is called the Network Storage Services (NSS) and is requirements are described in this paper. The next section gives an application or functional description of the NSS. The final section adds performance, capacity, and access constraints to the requirements.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swaminarayan, Sriram; Germann, Timothy C; Kadau, Kai
2008-01-01
The authors present timing and performance numbers for a short-range parallel molecular dynamics (MD) code, SPaSM, that has been rewritten for the heterogeneous Roadrunner supercomputer. Each Roadrunner compute node consists of two AMD Opteron dual-core microprocessors and four PowerXCell 8i enhanced Cell microprocessors, so that there are four MPI ranks per node, each with one Opteron and one Cell. The interatomic forces are computed on the Cells (each with one PPU and eight SPU cores), while the Opterons are used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementationmore » of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), corresponding to 124 MFlop/Watt/s at a price of approximately 3.69 MFlops/dollar. They demonstrate an initial target application, the jetting and ejection of material from a shocked surface.« less
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code.
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling. PMID:28701946
Virtualizing Super-Computation On-Board Uas
NASA Astrophysics Data System (ADS)
Salami, E.; Soler, J. A.; Cuadrado, R.; Barrado, C.; Pastor, E.
2015-04-01
Unmanned aerial systems (UAS, also known as UAV, RPAS or drones) have a great potential to support a wide variety of aerial remote sensing applications. Most UAS work by acquiring data using on-board sensors for later post-processing. Some require the data gathered to be downlinked to the ground in real-time. However, depending on the volume of data and the cost of the communications, this later option is not sustainable in the long term. This paper develops the concept of virtualizing super-computation on-board UAS, as a method to ease the operation by facilitating the downlink of high-level information products instead of raw data. Exploiting recent developments in miniaturized multi-core devices is the way to speed-up on-board computation. This hardware shall satisfy size, power and weight constraints. Several technologies are appearing with promising results for high performance computing on unmanned platforms, such as the 36 cores of the TILE-Gx36 by Tilera (now EZchip) or the 64 cores of the Epiphany-IV by Adapteva. The strategy for virtualizing super-computation on-board includes the benchmarking for hardware selection, the software architecture and the communications aware design. A parallelization strategy is given for the 36-core TILE-Gx36 for a UAS in a fire mission or in similar target-detection applications. The results are obtained for payload image processing algorithms and determine in real-time the data snapshot to gather and transfer to ground according to the needs of the mission, the processing time, and consumed watts.
Test One to Test Many: A Unified Approach to Quantum Benchmarks
NASA Astrophysics Data System (ADS)
Bai, Ge; Chiribella, Giulio
2018-04-01
Quantum benchmarks are routinely used to validate the experimental demonstration of quantum information protocols. Many relevant protocols, however, involve an infinite set of input states, of which only a finite subset can be used to test the quality of the implementation. This is a problem, because the benchmark for the finitely many states used in the test can be higher than the original benchmark calculated for infinitely many states. This situation arises in the teleportation and storage of coherent states, for which the benchmark of 50% fidelity is commonly used in experiments, although finite sets of coherent states normally lead to higher benchmarks. Here, we show that the average fidelity over all coherent states can be indirectly probed with a single setup, requiring only two-mode squeezing, a 50-50 beam splitter, and homodyne detection. Our setup enables a rigorous experimental validation of quantum teleportation, storage, amplification, attenuation, and purification of noisy coherent states. More generally, we prove that every quantum benchmark can be tested by preparing a single entangled state and measuring a single observable.
Optimization of the computational load of a hypercube supercomputer onboard a mobile robot.
Barhen, J; Toomarian, N; Protopopescu, V
1987-12-01
A combinatorial optimization methodology is developed, which enables the efficient use of hypercube multiprocessors onboard mobile intelligent robots dedicated to time-critical missions. The methodology is implemented in terms of large-scale concurrent algorithms based either on fast simulated annealing, or on nonlinear asynchronous neural networks. In particular, analytic expressions are given for the effect of singleneuron perturbations on the systems' configuration energy. Compact neuromorphic data structures are used to model effects such as prec xdence constraints, processor idling times, and task-schedule overlaps. Results for a typical robot-dynamics benchmark are presented.
Optimization of the computational load of a hypercube supercomputer onboard a mobile robot
NASA Technical Reports Server (NTRS)
Barhen, Jacob; Toomarian, N.; Protopopescu, V.
1987-01-01
A combinatorial optimization methodology is developed, which enables the efficient use of hypercube multiprocessors onboard mobile intelligent robots dedicated to time-critical missions. The methodology is implemented in terms of large-scale concurrent algorithms based either on fast simulated annealing, or on nonlinear asynchronous neural networks. In particular, analytic expressions are given for the effect of single-neuron perturbations on the systems' configuration energy. Compact neuromorphic data structures are used to model effects such as precedence constraints, processor idling times, and task-schedule overlaps. Results for a typical robot-dynamics benchmark are presented.
Integration of Panda Workload Management System with supercomputers
NASA Astrophysics Data System (ADS)
De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Novikov, A.; Oleynik, D.; Panitkin, S.; Poyda, A.; Read, K. F.; Ryabinkin, E.; Teslyuk, A.; Velikhov, V.; Wells, J. C.; Wenaus, T.
2016-09-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), Supercomputer at the National Research Center "Kurchatov Institute", IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run singlethreaded workloads in parallel on Titan's multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms. We will present our current accomplishments in running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facility's infrastructure for High Energy and Nuclear Physics, as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.
The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chard, Kyle; Dart, Eli; Foster, Ian
Here we describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe howmore » to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.« less
The Modern Research Data Portal: a design pattern for networked, data-intensive science
Chard, Kyle; Dart, Eli; Foster, Ian; ...
2018-01-15
We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. Here, we capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe howmore » to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site,https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.« less
The Modern Research Data Portal: a design pattern for networked, data-intensive science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chard, Kyle; Dart, Eli; Foster, Ian
We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. Here, we capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe howmore » to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site,https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.« less
Pynamic: the Python Dynamic Benchmark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, G L; Ahn, D H; de Supinksi, B R
2007-07-10
Python is widely used in scientific computing to facilitate application development and to support features such as computational steering. Making full use of some of Python's popular features, which improve programmer productivity, leads to applications that access extremely high numbers of dynamically linked libraries (DLLs). As a result, some important Python-based applications severely stress a system's dynamic linking and loading capabilities and also cause significant difficulties for most development environment tools, such as debuggers. Furthermore, using the Python paradigm for large scale MPI-based applications can create significant file IO and further stress tools and operating systems. In this paper, wemore » present Pynamic, the first benchmark program to support configurable emulation of a wide-range of the DLL usage of Python-based applications for large scale systems. Pynamic has already accurately reproduced system software and tool issues encountered by important large Python-based scientific applications on our supercomputers. Pynamic provided insight for our system software and tool vendors, and our application developers, into the impact of several design decisions. As we describe the Pynamic benchmark, we will highlight some of the issues discovered in our large scale system software and tools using Pynamic.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T
2013-01-01
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Dimpsey, Robert Tod
1992-01-01
In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.
A Computational framework for telemedicine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.
1998-07-01
Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less
Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks
NASA Astrophysics Data System (ADS)
Hogan, Trish
Set to replace the aging TPC-C, the TPC Benchmark E is the next generation OLTP benchmark, which more accurately models client database usage. TPC-E addresses the shortcomings of TPC-C. It has a much more complex workload, requires the use of RAID-protected storage, generates much less I/O, and is much cheaper and easier to set up, run, and audit. After a period of overlap, it is expected that TPC-E will become the de facto OLTP benchmark.
Data Management, the Victorian era child of the 21st century
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farber, Rob
2007-03-30
Do you remember when a gigabyte disk drive was “a lot” of storage in that by-gone age of the 20th century? Still in our first decade of the 21st century, major supercomputer sites now speak of storage in terms of petabytes, 1015 bytes, or six orders of magnitude increase in capacity over a gigabyte! Unlike our archaic “big” disk drive where all the data was in one place, HPC storage is now distributed across many machines and even across the Internet. Collaborative research engages many scientists who need to find and use each others data, preferably in an automated fashion,more » which complicates an already muddled problem.« less
Open systems storage platforms
NASA Technical Reports Server (NTRS)
Collins, Kirby
1992-01-01
The building blocks for an open storage system includes a system platform, a selection of storage devices and interfaces, system software, and storage applications CONVEX storage systems are based on the DS Series Data Server systems. These systems are a variant of the C3200 supercomputer with expanded I/O capabilities. These systems support a variety of medium and high speed interfaces to networks and peripherals. System software is provided in the form of ConvexOS, a POSIX compliant derivative of 4.3BSD UNIX. Storage applications include products such as UNITREE and EMASS. With the DS Series of storage systems, Convex has developed a set of products which provide open system solutions for storage management applications. The systems are highly modular, assembled from off the shelf components with industry standard interfaces. The C Series system architecture provides a stable base, with the performance and reliability of a general purpose platform. This combination of a proven system architecture with a variety of choices in peripherals and application software allows wide flexibility in configurations, and delivers the benefits of open systems to the mass storage world.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
OpenTopography: Addressing Big Data Challenges Using Cloud Computing, HPC, and Data Analytics
NASA Astrophysics Data System (ADS)
Crosby, C. J.; Nandigam, V.; Phan, M.; Youn, C.; Baru, C.; Arrowsmith, R.
2014-12-01
OpenTopography (OT) is a geoinformatics-based data facility initiated in 2009 for democratizing access to high-resolution topographic data, derived products, and tools. Hosted at the San Diego Supercomputer Center (SDSC), OT utilizes cyberinfrastructure, including large-scale data management, high-performance computing, and service-oriented architectures to provide efficient Web based access to large, high-resolution topographic datasets. OT collocates data with processing tools to enable users to quickly access custom data and derived products for their application. OT's ongoing R&D efforts aim to solve emerging technical challenges associated with exponential growth in data, higher order data products, as well as user base. Optimization of data management strategies can be informed by a comprehensive set of OT user access metrics that allows us to better understand usage patterns with respect to the data. By analyzing the spatiotemporal access patterns within the datasets, we can map areas of the data archive that are highly active (hot) versus the ones that are rarely accessed (cold). This enables us to architect a tiered storage environment consisting of high performance disk storage (SSD) for the hot areas and less expensive slower disk for the cold ones, thereby optimizing price to performance. From a compute perspective, OT is looking at cloud based solutions such as the Microsoft Azure platform to handle sudden increases in load. An OT virtual machine image in Microsoft's VM Depot can be invoked and deployed quickly in response to increased system demand. OT has also integrated SDSC HPC systems like the Gordon supercomputer into our infrastructure tier to enable compute intensive workloads like parallel computation of hydrologic routing on high resolution topography. This capability also allows OT to scale to HPC resources during high loads to meet user demand and provide more efficient processing. With a growing user base and maturing scientific user community comes new requests for algorithms and processing capabilities. To address this demand, OT is developing an extensible service based architecture for integrating community-developed software. This "plugable" approach to Web service deployment will enable new processing and analysis tools to run collocated with OT hosted data.
Technology for national asset storage systems
NASA Technical Reports Server (NTRS)
Coyne, Robert A.; Hulen, Harry; Watson, Richard
1993-01-01
An industry-led collaborative project, called the National Storage Laboratory, was organized to investigate technology for storage systems that will be the future repositories for our national information assets. Industry participants are IBM Federal Systems Company, Ampex Recording Systems Corporation, General Atomics DISCOS Division, IBM ADSTAR, Maximum Strategy Corporation, Network Systems Corporation, and Zitel Corporation. Industry members of the collaborative project are funding their own participation. Lawrence Livermore National Laboratory through its National Energy Research Supercomputer Center (NERSC) will participate in the project as the operational site and the provider of applications. The expected result is an evaluation of a high performance storage architecture assembled from commercially available hardware and software, with some software enhancements to meet the project's goals. It is anticipated that the integrated testbed system will represent a significant advance in the technology for distributed storage systems capable of handling gigabyte class files at gigabit-per-second data rates. The National Storage Laboratory was officially launched on 27 May 1992.
INTEGRATION OF PANDA WORKLOAD MANAGEMENT SYSTEM WITH SUPERCOMPUTERS
DOE Office of Scientific and Technical Information (OSTI.GOV)
De, K; Jha, S; Maeno, T
Abstract The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the funda- mental nature of matter and the basic forces that shape our universe, and were recently credited for the dis- covery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Datamore » Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data cen- ters are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Com- puting Facility (OLCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single- threaded workloads in parallel on Titan s multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms. We will present our current accom- plishments in running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facility s infrastructure for High Energy and Nuclear Physics, as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.« less
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)
2002-01-01
The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
Basu, Protonu; Williams, Samuel; Van Straalen, Brian; ...
2017-04-05
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
Tools for 3D scientific visualization in computational aerodynamics
NASA Technical Reports Server (NTRS)
Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val
1989-01-01
The purpose is to describe the tools and techniques in use at the NASA Ames Research Center for performing visualization of computational aerodynamics, for example visualization of flow fields from computer simulations of fluid dynamics about vehicles such as the Space Shuttle. The hardware used for visualization is a high-performance graphics workstation connected to a super computer with a high speed channel. At present, the workstation is a Silicon Graphics IRIS 3130, the supercomputer is a CRAY2, and the high speed channel is a hyperchannel. The three techniques used for visualization are post-processing, tracking, and steering. Post-processing analysis is done after the simulation. Tracking analysis is done during a simulation but is not interactive, whereas steering analysis involves modifying the simulation interactively during the simulation. Using post-processing methods, a flow simulation is executed on a supercomputer and, after the simulation is complete, the results of the simulation are processed for viewing. The software in use and under development at NASA Ames Research Center for performing these types of tasks in computational aerodynamics is described. Workstation performance issues, benchmarking, and high-performance networks for this purpose are also discussed as well as descriptions of other hardware for digital video and film recording.
Energy consumption optimization of the total-FETI solver by changing the CPU frequency
NASA Astrophysics Data System (ADS)
Horak, David; Riha, Lubomir; Sojka, Radim; Kruzik, Jakub; Beseda, Martin; Cermak, Martin; Schuchart, Joseph
2017-07-01
The energy consumption of supercomputers is one of the critical problems for the upcoming Exascale supercomputing era. The awareness of power and energy consumption is required on both software and hardware side. This paper deals with the energy consumption evaluation of the Finite Element Tearing and Interconnect (FETI) based solvers of linear systems, which is an established method for solving real-world engineering problems. We have evaluated the effect of the CPU frequency on the energy consumption of the FETI solver using a linear elasticity 3D cube synthetic benchmark. In this problem, we have evaluated the effect of frequency tuning on the energy consumption of the essential processing kernels of the FETI method. The paper provides results for two types of frequency tuning: (1) static tuning and (2) dynamic tuning. For static tuning experiments, the frequency is set before execution and kept constant during the runtime. For dynamic tuning, the frequency is changed during the program execution to adapt the system to the actual needs of the application. The paper shows that static tuning brings up 12% energy savings when compared to default CPU settings (the highest clock rate). The dynamic tuning improves this further by up to 3%.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Basu, Protonu; Williams, Samuel; Van Straalen, Brian
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
Impact of the Columbia Supercomputer on NASA Space and Exploration Mission
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Kwak, Dochan; Kiris, Cetin; Lawrence, Scott
2006-01-01
NASA's 10,240-processor Columbia supercomputer gained worldwide recognition in 2004 for increasing the space agency's computing capability ten-fold, and enabling U.S. scientists and engineers to perform significant, breakthrough simulations. Columbia has amply demonstrated its capability to accelerate NASA's key missions, including space operations, exploration systems, science, and aeronautics. Columbia is part of an integrated high-end computing (HEC) environment comprised of massive storage and archive systems, high-speed networking, high-fidelity modeling and simulation tools, application performance optimization, and advanced data analysis and visualization. In this paper, we illustrate the impact Columbia is having on NASA's numerous space and exploration applications, such as the development of the Crew Exploration and Launch Vehicles (CEV/CLV), effects of long-duration human presence in space, and damage assessment and repair recommendations for remaining shuttle flights. We conclude by discussing HEC challenges that must be overcome to solve space-related science problems in the future.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ardani, Kristen; O'Shaughnessy, Eric; Fu, Ran
2016-12-01
In this report, we fill a gap in the existing knowledge about PV-plus-storage system costs and value by providing detailed component- and system-level installed cost benchmarks for residential systems. We also examine other barriers to increased deployment of PV-plus-storage systems in the residential sector. The results are meant to help technology manufacturers, installers, and other stakeholders identify cost-reduction opportunities and inform decision makers about regulatory, policy, and market characteristics that impede solar plus storage deployment. In addition, our periodic cost benchmarks will document progress in cost reductions over time. To analyze costs for PV-plus-storage systems deployed in the first quartermore » of 2016, we adapt the National Renewable Energy Laboratory's component- and system-level cost-modeling methods for standalone PV. In general, we attempt to model best-in-class installation techniques and business operations from an installed-cost perspective. In addition to our original analysis, model development, and review of published literature, we derive inputs for our model and validate our draft results via interviews with industry and subject-matter experts. One challenge to analyzing the costs of PV-plus-storage systems is choosing an appropriate cost metric. Unlike standalone PV, energy storage lacks universally accepted cost metrics, such as dollars per watt of installed capacity and lifetime levelized cost of energy. We explain the difficulty of arriving at a standard approach for reporting storage costs and then provide the rationale for using the total installed costs of a standard PV-plus-storage system as our primary metric, rather than using a system-size-normalized metric.« less
Semantics-based distributed I/O with the ParaMEDIC framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balaji, P.; Feng, W.; Lin, H.
2008-01-01
Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called 'ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing' which uses application-specific semantic information to convert the generatedmore » data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.« less
Flow visualization of CFD using graphics workstations
NASA Technical Reports Server (NTRS)
Lasinski, Thomas; Buning, Pieter; Choi, Diana; Rogers, Stuart; Bancroft, Gordon
1987-01-01
High performance graphics workstations are used to visualize the fluid flow dynamics obtained from supercomputer solutions of computational fluid dynamic programs. The visualizations can be done independently on the workstation or while the workstation is connected to the supercomputer in a distributed computing mode. In the distributed mode, the supercomputer interactively performs the computationally intensive graphics rendering tasks while the workstation performs the viewing tasks. A major advantage of the workstations is that the viewers can interactively change their viewing position while watching the dynamics of the flow fields. An overview of the computer hardware and software required to create these displays is presented. For complex scenes the workstation cannot create the displays fast enough for good motion analysis. For these cases, the animation sequences are recorded on video tape or 16 mm film a frame at a time and played back at the desired speed. The additional software and hardware required to create these video tapes or 16 mm movies are also described. Photographs illustrating current visualization techniques are discussed. Examples of the use of the workstations for flow visualization through animation are available on video tape.
Two-dimensional nonsteady viscous flow simulation on the Navier-Stokes computer miniNode
NASA Technical Reports Server (NTRS)
Nosenchuck, Daniel M.; Littman, Michael G.; Flannery, William
1986-01-01
The needs of large-scale scientific computation are outpacing the growth in performance of mainframe supercomputers. In particular, problems in fluid mechanics involving complex flow simulations require far more speed and capacity than that provided by current and proposed Class VI supercomputers. To address this concern, the Navier-Stokes Computer (NSC) was developed. The NSC is a parallel-processing machine, comprised of individual Nodes, each comparable in performance to current supercomputers. The global architecture is that of a hypercube, and a 128-Node NSC has been designed. New architectural features, such as a reconfigurable many-function ALU pipeline and a multifunction memory-ALU switch, have provided the capability to efficiently implement a wide range of algorithms. Efficient algorithms typically involve numerically intensive tasks, which often include conditional operations. These operations may be efficiently implemented on the NSC without, in general, sacrificing vector-processing speed. To illustrate the architecture, programming, and several of the capabilities of the NSC, the simulation of two-dimensional, nonsteady viscous flows on a prototype Node, called the miniNode, is presented.
National Storage Laboratory: a collaborative research project
NASA Astrophysics Data System (ADS)
Coyne, Robert A.; Hulen, Harry; Watson, Richard W.
1993-01-01
The grand challenges of science and industry that are driving computing and communications have created corresponding challenges in information storage and retrieval. An industry-led collaborative project has been organized to investigate technology for storage systems that will be the future repositories of national information assets. Industry participants are IBM Federal Systems Company, Ampex Recording Systems Corporation, General Atomics DISCOS Division, IBM ADSTAR, Maximum Strategy Corporation, Network Systems Corporation, and Zitel Corporation. Industry members of the collaborative project are funding their own participation. Lawrence Livermore National Laboratory through its National Energy Research Supercomputer Center (NERSC) will participate in the project as the operational site and provider of applications. The expected result is the creation of a National Storage Laboratory to serve as a prototype and demonstration facility. It is expected that this prototype will represent a significant advance in the technology for distributed storage systems capable of handling gigabyte-class files at gigabit-per-second data rates. Specifically, the collaboration expects to make significant advances in hardware, software, and systems technology in four areas of need, (1) network-attached high performance storage; (2) multiple, dynamic, distributed storage hierarchies; (3) layered access to storage system services; and (4) storage system management.
NASA Astrophysics Data System (ADS)
Morikawa, Y.; Murata, K. T.; Watari, S.; Kato, H.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Shimojo, S.
2010-12-01
Main methodologies of Solar-Terrestrial Physics (STP) so far are theoretical, experimental and observational, and computer simulation approaches. Recently "informatics" is expected as a new (fourth) approach to the STP studies. Informatics is a methodology to analyze large-scale data (observation data and computer simulation data) to obtain new findings using a variety of data processing techniques. At NICT (National Institute of Information and Communications Technology, Japan) we are now developing a new research environment named "OneSpaceNet". The OneSpaceNet is a cloud-computing environment specialized for science works, which connects many researchers with high-speed network (JGN: Japan Gigabit Network). The JGN is a wide-area back-born network operated by NICT; it provides 10G network and many access points (AP) over Japan. The OneSpaceNet also provides with rich computer resources for research studies, such as super-computers, large-scale data storage area, licensed applications, visualization devices (like tiled display wall: TDW), database/DBMS, cluster computers (4-8 nodes) for data processing and communication devices. What is amazing in use of the science cloud is that a user simply prepares a terminal (low-cost PC). Once connecting the PC to JGN2plus, the user can make full use of the rich resources of the science cloud. Using communication devices, such as video-conference system, streaming and reflector servers, and media-players, the users on the OneSpaceNet can make research communications as if they belong to a same (one) laboratory: they are members of a virtual laboratory. The specification of the computer resources on the OneSpaceNet is as follows: The size of data storage we have developed so far is almost 1PB. The number of the data files managed on the cloud storage is getting larger and now more than 40,000,000. What is notable is that the disks forming the large-scale storage are distributed to 5 data centers over Japan (but the storage system performs as one disk). There are three supercomputers allocated on the cloud, one from Tokyo, one from Osaka and the other from Nagoya. One's simulation job data on any supercomputers are saved on the cloud data storage (same directory); it is a kind of virtual computing environment. The tiled display wall has 36 panels acting as one display; the pixel (resolution) size of it is as large as 18000x4300. This size is enough to preview or analyze the large-scale computer simulation data. It also allows us to take a look of multiple (e.g., 100 pictures) on one screen together with many researchers. In our talk we also present a brief report of the initial results using the OneSpaceNet for Global MHD simulations as an example of successful use of our science cloud; (i) Ultra-high time resolution visualization of Global MHD simulations on the large-scale storage and parallel processing system on the cloud, (ii) Database of real-time Global MHD simulation and statistic analyses of the data, and (iii) 3D Web service of Global MHD simulations.
A multidimensional finite element method for CFD
NASA Technical Reports Server (NTRS)
Pepper, Darrell W.; Humphrey, Joseph W.
1991-01-01
A finite element method is used to solve the equations of motion for 2- and 3-D fluid flow. The time-dependent equations are solved explicitly using quadrilateral (2-D) and hexahedral (3-D) elements, mass lumping, and reduced integration. A Petrov-Galerkin technique is applied to the advection terms. The method requires a minimum of computational storage, executes quickly, and is scalable for execution on computer systems ranging from PCs to supercomputers.
Portable Map-Reduce Utility for MIT SuperCloud Environment
2015-09-17
Reuther, A. Rosa, C. Yee, “Driving Big Data With Big Compute,” IEEE HPEC, Sep 10-12, 2012, Waltham, MA. [6] Apache Hadoop 1.2.1 Documentation: HDFS... big data architecture, which is designed to address these challenges, is made of the computing resources, scheduler, central storage file system...databases, analytics software and web interfaces [1]. These components are common to many big data and supercomputing systems. The platform is
Supercomputer simulations of structure formation in the Universe
NASA Astrophysics Data System (ADS)
Ishiyama, Tomoaki
2017-06-01
We describe the implementation and performance results of our massively parallel MPI†/OpenMP‡ hybrid TreePM code for large-scale cosmological N-body simulations. For domain decomposition, a recursive multi-section algorithm is used and the size of domains are automatically set so that the total calculation time is the same for all processes. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. For two trillion particles benchmark simulation, the average performance on the fullsystem of K computer (82,944 nodes, the total number of core is 663,552) is 5.8 Pflops, which corresponds to 55% of the peak speed.
Machine characterization based on an abstract high-level language machine
NASA Technical Reports Server (NTRS)
Saavedra-Barrera, Rafael H.; Smith, Alan Jay; Miya, Eugene
1989-01-01
Measurements are presented for a large number of machines ranging from small workstations to supercomputers. The authors combine these measurements into groups of parameters which relate to specific aspects of the machine implementation, and use these groups to provide overall machine characterizations. The authors also define the concept of pershapes, which represent the level of performance of a machine for different types of computation. A metric based on pershapes is introduced that provides a quantitative way of measuring how similar two machines are in terms of their performance distributions. The metric is related to the extent to which pairs of machines have varying relative performance levels depending on which benchmark is used.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hazi, A U
2007-02-06
Setting performance goals is part of the business plan for almost every company. The same is true in the world of supercomputers. Ten years ago, the Department of Energy (DOE) launched the Accelerated Strategic Computing Initiative (ASCI) to help ensure the safety and reliability of the nation's nuclear weapons stockpile without nuclear testing. ASCI, which is now called the Advanced Simulation and Computing (ASC) Program and is managed by DOE's National Nuclear Security Administration (NNSA), set an initial 10-year goal to obtain computers that could process up to 100 trillion floating-point operations per second (teraflops). Many computer experts thought themore » goal was overly ambitious, but the program's results have proved them wrong. Last November, a Livermore-IBM team received the 2005 Gordon Bell Prize for achieving more than 100 teraflops while modeling the pressure-induced solidification of molten metal. The prestigious prize, which is named for a founding father of supercomputing, is awarded each year at the Supercomputing Conference to innovators who advance high-performance computing. Recipients for the 2005 prize included six Livermore scientists--physicists Fred Streitz, James Glosli, and Mehul Patel and computer scientists Bor Chan, Robert Yates, and Bronis de Supinski--as well as IBM researchers James Sexton and John Gunnels. This team produced the first atomic-scale model of metal solidification from the liquid phase with results that were independent of system size. The record-setting calculation used Livermore's domain decomposition molecular-dynamics (ddcMD) code running on BlueGene/L, a supercomputer developed by IBM in partnership with the ASC Program. BlueGene/L reached 280.6 teraflops on the Linpack benchmark, the industry standard used to measure computing speed. As a result, it ranks first on the list of Top500 Supercomputer Sites released in November 2005. To evaluate the performance of nuclear weapons systems, scientists must understand how materials behave under extreme conditions. Because experiments at high pressures and temperatures are often difficult or impossible to conduct, scientists rely on computer models that have been validated with obtainable data. Of particular interest to weapons scientists is the solidification of metals. ''To predict the performance of aging nuclear weapons, we need detailed information on a material's phase transitions'', says Streitz, who leads the Livermore-IBM team. For example, scientists want to know what happens to a metal as it changes from molten liquid to a solid and how that transition affects the material's characteristics, such as its strength.« less
Resilience landscapes for Congo basin rainforests vs. climate and management impacts
NASA Astrophysics Data System (ADS)
Pietsch, Stephan Alexander; Gautam, Sishir; Elias Bednar, Johannes; Stanzl, Patrick; Mosnier, Aline; Obersteiner, Michael
2015-04-01
Past climate change caused severe disturbances of the Central African rainforest belt, with forest fragmentation and re-expansion due to drier and wetter climate conditions. Besides climate, human induced forest degradation affected biodiversity, structure and carbon storage of Congo basin rainforests. Information on climatically stable, mature rainforest, unaffected by human induced disturbances, provides means of assessing the impact of forest degradation and may serve as benchmarks of carbon carrying capacity over regions with similar site and climate conditions. BioGeoChemical (BGC) ecosystem models explicitly consider the impacts of site and climate conditions and may assess benchmark levels over regions devoid of undisturbed conditions. We will present a BGC-model validation for the Western Congolian Lowland Rainforest (WCLRF) using field data from a recently confirmed forest refuge, show model - data comparisons for disturbed und undisturbed forests under different site and climate conditions as well as for sites with repeated assessment of biodiversity and standing biomass during recovery from intensive exploitation. We will present climatic thresholds for WCLRF stability, and construct resilience landscapes for current day conditions vs. climate and management impacts.
Hysteresis in the Central African Rainforest
NASA Astrophysics Data System (ADS)
Pietsch, Stephan Alexander; Elias Bednar, Johannes; Gautam, Sishir; Petritsch, Richard; Schier, Franziska; Stanzl, Patrick
2014-05-01
Past climate change caused severe disturbances of the Central African rainforest belt, with forest fragmentation and re-expansion due to drier and wetter climate conditions. Besides climate, human induced forest degradation affected biodiversity, structure and carbon storage of Congo basin rainforests. Information on climatically stable, mature rainforest, unaffected by human induced disturbances, provides means of assessing the impact of forest degradation and may serve as benchmarks of carbon carrying capacity over regions with similar site and climate conditions. BioGeoChemical (BGC) ecosystem models explicitly consider the impacts of site and climate conditions and may assess benchmark levels over regions devoid of undisturbed conditions. We will present a BGC-model validation for the Western Congolian Lowland Rainforest (WCLRF) using field data from a recently confirmed forest refuge, show model - data comparisons for disturbed und undisturbed forests under different site and climate conditions as well as for sites with repeated assessment of biodiversity and standing biomass during recovery from intensive exploitation. We will present climatic thresholds for WCLRF stability, analyse the relationship between resilience, standing C-stocks and change in climate and finally provide evidence of hysteresis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murphy, Richard C.
2009-09-01
This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential ofmore » PIM-based architectures to support mission critical Sandia applications and an emerging class of more data intensive informatics applications. This work has resulted in a stronger architecture/implementation collaboration between 1400 and 1700. Additionally, key technology components have impacted vendor roadmaps, and we are in the process of pursuing these new collaborations. This work has the potential to impact future supercomputer design and construction, reducing power and increasing performance. This final report is organized as follow: this summary chapter discusses the impact of the project (Section 1), provides an enumeration of publications and other public discussion of the work (Section 1), and concludes with a discussion of future work and impact from the project (Section 1). The appendix contains reprints of the refereed publications resulting from this work.« less
: A Scalable and Transparent System for Simulating MPI Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2010-01-01
is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
SNS programming environment user's guide
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Howser, Lona M.; Humes, D. Creig; Cronin, Catherine K.; Bowen, John T.; Drozdowski, Joseph M.; Utley, Judith A.; Flynn, Theresa M.; Austin, Brenda A.
1992-01-01
The computing environment is briefly described for the Supercomputing Network Subsystem (SNS) of the Central Scientific Computing Complex of NASA Langley. The major SNS computers are a CRAY-2, a CRAY Y-MP, a CONVEX C-210, and a CONVEX C-220. The software is described that is common to all of these computers, including: the UNIX operating system, computer graphics, networking utilities, mass storage, and mathematical libraries. Also described is file management, validation, SNS configuration, documentation, and customer services.
Multi-Level Bitmap Indexes for Flash Memory Storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Madduri, Kamesh; Canon, Shane
2010-07-23
Due to their low access latency, high read speed, and power-efficient operation, flash memory storage devices are rapidly emerging as an attractive alternative to traditional magnetic storage devices. However, tests show that the most efficient indexing methods are not able to take advantage of the flash memory storage devices. In this paper, we present a set of multi-level bitmap indexes that can effectively take advantage of flash storage devices. These indexing methods use coarsely binned indexes to answer queries approximately, and then use finely binned indexes to refine the answers. Our new methods read significantly lower volumes of data atmore » the expense of an increased disk access count, thus taking full advantage of the improved read speed and low access latency of flash devices. To demonstrate the advantage of these new indexes, we measure their performance on a number of storage systems using a standard data warehousing benchmark called the Set Query Benchmark. We observe that multi-level strategies on flash drives are up to 3 times faster than traditional indexing strategies on magnetic disk drives.« less
Experiences From NASA/Langley's DMSS Project
NASA Technical Reports Server (NTRS)
1996-01-01
There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at the NASA Langley Research Center (LaRC) has placed such a system into production use. This paper will present the experiences, both good and bad, we have had with this system since putting it into production usage. The system is comprised of: 1) National Storage Laboratory (NSL)/UniTree 2.1, 2) IBM 9570 HIPPI attached disk arrays (both RAID 3 and RAID 5), 3) IBM RS6000 server, 4) HIPPI/IPI3 third party transfers between the disk array systems and the supercomputer clients, a CRAY Y-MP and a CRAY 2, 5) a "warm spare" file server, 6) transition software to convert from CRAY's Data Migration Facility (DMF) based system to DMSS, 7) an NSC PS32 HIPPI switch, and 8) a STK 4490 robotic library accessed from the IBM RS6000 block mux interface. This paper will cover: the performance of the DMSS in the following areas: file transfer rates, migration and recall, and file manipulation (listing, deleting, etc.); the appropriateness of a workstation class of file server for NSL/UniTree with LaRC's present storage requirements in mind the role of the third party transfers between the supercomputers and the DMSS disk array systems in DMSS; a detailed comparison (both in performance and functionality) between the DMF and DMSS systems LaRC's enhancements to the NSL/UniTree system administration environment the mechanism for DMSS to provide file server redundancy the statistics on the availability of DMSS the design and experiences with the locally developed transparent transition software which allowed us to make over 1.5 million DMF files available to NSL/UniTree with minimal system outage
Network issues for large mass storage requirements
NASA Technical Reports Server (NTRS)
Perdue, James
1992-01-01
File Servers and Supercomputing environments need high performance networks to balance the I/O requirements seen in today's demanding computing scenarios. UltraNet is one solution which permits both high aggregate transfer rates and high task-to-task transfer rates as demonstrated in actual tests. UltraNet provides this capability as both a Server-to-Server and Server-to-Client access network giving the supercomputing center the following advantages highest performance Transport Level connections (to 40 MBytes/sec effective rates); matches the throughput of the emerging high performance disk technologies, such as RAID, parallel head transfer devices and software striping; supports standard network and file system applications using SOCKET's based application program interface such as FTP, rcp, rdump, etc.; supports access to the Network File System (NFS) and LARGE aggregate bandwidth for large NFS usage; provides access to a distributed, hierarchical data server capability using DISCOS UniTree product; supports file server solutions available from multiple vendors, including Cray, Convex, Alliant, FPS, IBM, and others.
Developing Benchmarks for Solar Radio Bursts
NASA Astrophysics Data System (ADS)
Biesecker, D. A.; White, S. M.; Gopalswamy, N.; Black, C.; Domm, P.; Love, J. J.; Pierson, J.
2016-12-01
Solar radio bursts can interfere with radar, communication, and tracking signals. In severe cases, radio bursts can inhibit the successful use of radio communications and disrupt a wide range of systems that are reliant on Position, Navigation, and Timing services on timescales ranging from minutes to hours across wide areas on the dayside of Earth. The White House's Space Weather Action Plan has asked for solar radio burst intensity benchmarks for an event occurrence frequency of 1 in 100 years and also a theoretical maximum intensity benchmark. The solar radio benchmark team was also asked to define the wavelength/frequency bands of interest. The benchmark team developed preliminary (phase 1) benchmarks for the VHF (30-300 MHz), UHF (300-3000 MHz), GPS (1176-1602 MHz), F10.7 (2800 MHz), and Microwave (4000-20000) bands. The preliminary benchmarks were derived based on previously published work. Limitations in the published work will be addressed in phase 2 of the benchmark process. In addition, deriving theoretical maxima requires additional work, where it is even possible to, in order to meet the Action Plan objectives. In this presentation, we will present the phase 1 benchmarks and the basis used to derive them. We will also present the work that needs to be done in order to complete the final, or phase 2 benchmarks.
Signorelli, Heather; Straseski, Joely A; Genzen, Jonathan R; Walker, Brandon S; Jackson, Brian R; Schmidt, Robert L
2015-01-01
Appropriate test utilization is usually evaluated by adherence to published guidelines. In many cases, medical guidelines are not available. Benchmarking has been proposed as a method to identify practice variations that may represent inappropriate testing. This study investigated the use of benchmarking to identify sites with inappropriate utilization of testing for a particular analyte. We used a Web-based survey to compare 2 measures of vitamin D utilization: overall testing intensity (ratio of total vitamin D orders to blood-count orders) and relative testing intensity (ratio of 1,25(OH)2D to 25(OH)D test orders). A total of 81 facilities contributed data. The average overall testing intensity index was 0.165, or approximately 1 vitamin D test for every 6 blood-count tests. The average relative testing intensity index was 0.055, or one 1,25(OH)2D test for every 18 of the 25(OH)D tests. Both indexes varied considerably. Benchmarking can be used as a screening tool to identify outliers that may be associated with inappropriate test utilization. Copyright© by the American Society for Clinical Pathology (ASCP).
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions
NASA Astrophysics Data System (ADS)
Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin
2017-01-01
We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
NASA Astrophysics Data System (ADS)
Cardall, Christian Y.; Budiardja, Reuben D.; Endeve, Eirik; Mezzacappa, Anthony
2014-02-01
GenASiS (General Astrophysical Simulation System) is a new code being developed initially and primarily, though by no means exclusively, for the simulation of core-collapse supernovae on the world's leading capability supercomputers. This paper—the first in a series—demonstrates a centrally refined coordinate patch suitable for gravitational collapse and documents methods for compressible nonrelativistic hydrodynamics. We benchmark the hydrodynamics capabilities of GenASiS against many standard test problems; the results illustrate the basic competence of our implementation, demonstrate the strengths and limitations of the HLLC relative to the HLL Riemann solver in a number of interesting cases, and provide preliminary indications of the code's ability to scale and to function with cell-by-cell fixed-mesh refinement.
LUMA: A many-core, Fluid-Structure Interaction solver based on the Lattice-Boltzmann Method
NASA Astrophysics Data System (ADS)
Harwood, Adrian R. G.; O'Connor, Joseph; Sanchez Muñoz, Jonathan; Camps Santasmasas, Marta; Revell, Alistair J.
2018-01-01
The Lattice-Boltzmann Method at the University of Manchester (LUMA) project was commissioned to build a collaborative research environment in which researchers of all abilities can study fluid-structure interaction (FSI) problems in engineering applications from aerodynamics to medicine. It is built on the principles of accessibility, simplicity and flexibility. The LUMA software at the core of the project is a capable FSI solver with turbulence modelling and many-core scalability as well as a wealth of input/output and pre- and post-processing facilities. The software has been validated and several major releases benchmarked on supercomputing facilities internationally. The software architecture is modular and arranged logically using a minimal amount of object-orientation to maintain a simple and accessible software.
NASA Astrophysics Data System (ADS)
Wollherr, Stephanie; Gabriel, Alice-Agnes; Igel, Heiner
2015-04-01
In dynamic rupture models, high stress concentrations at rupture fronts have to to be accommodated by off-fault inelastic processes such as plastic deformation. As presented in (Roten et al., 2014), incorporating plastic yielding can significantly reduce earlier predictions of ground motions in the Los Angeles Basin. Further, an inelastic response of materials surrounding a fault potentially has a strong impact on surface displacement and is therefore a key aspect in understanding the triggering of tsunamis through floor uplifting. We present an implementation of off-fault-plasticity and its verification for the software package SeisSol, an arbitrary high-order derivative discontinuous Galerkin (ADER-DG) method. The software recently reached multi-petaflop/s performance on some of the largest supercomputers worldwide and was a Gordon Bell prize finalist application in 2014 (Heinecke et al., 2014). For the nonelastic calculations we impose a Drucker-Prager yield criterion in shear stress with a viscous regularization following (Andrews, 2005). It permits the smooth relaxation of high stress concentrations induced in the dynamic rupture process. We verify the implementation by comparison to the SCEC/USGS Spontaneous Rupture Code Verification Benchmarks. The results of test problem TPV13 with a 60-degree dipping normal fault show that SeisSol is in good accordance with other codes. Additionally we aim to explore the numerical characteristics of the off-fault plasticity implementation by performing convergence tests for the 2D code. The ADER-DG method is especially suited for complex geometries by using unstructured tetrahedral meshes. Local adaptation of the mesh resolution enables a fine sampling of the cohesive zone on the fault while simultaneously satisfying the dispersion requirements of wave propagation away from the fault. In this context we will investigate the influence of off-fault-plasticity on geometrically complex fault zone structures like subduction zones or branched faults. Studying the interplay of stress conditions and angle dependence of neighbouring branches including inelastic material behaviour and its effects on rupture jumps and seismic activation helps to advance our understanding of earthquake source processes. An application is the simulation of a real large-scale subduction zone scenario including plasticity to validate the coupling of our dynamic rupture calculations to a tsunami model in the framework of the ASCETE project (http://www.ascete.de/). Andrews, D. J. (2005): Rupture dynamics with energy loss outside the slip zone, J. Geophys. Res., 110, B01307. Heinecke, A. (2014), A. Breuer, S. Rettenberger, M. Bader, A.-A. Gabriel, C. Pelties, A. Bode, W. Barth, K. Vaidyanathan, M. Smelyanskiy and P. Dubey: Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers. In Supercomputing 2014, The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, LA, USA, November 2014. Roten, D. (2014), K. B. Olsen, S.M. Day, Y. Cui, and D. Fäh: Expected seismic shaking in Los Angeles reduced by San Andreas fault zone plasticity, Geophys. Res. Lett., 41, 2769-2777.
Seismic signal processing on heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Fichtner, Andreas
2015-04-01
The processing of seismic signals - including the correlation of massive ambient noise data sets - represents an important part of a wide range of seismological applications. It is characterized by large data volumes as well as high computational input/output intensity. Development of efficient approaches towards seismic signal processing on emerging high performance computing systems is therefore essential. Heterogeneous supercomputing systems introduced in the recent years provide numerous computing nodes interconnected via high throughput networks, every node containing a mix of processing elements of different architectures, like several sequential processor cores and one or a few graphical processing units (GPU) serving as accelerators. A typical representative of such computing systems is "Piz Daint", a supercomputer of the Cray XC 30 family operated by the Swiss National Supercomputing Center (CSCS), which we used in this research. Heterogeneous supercomputers provide an opportunity for manifold application performance increase and are more energy-efficient, however they have much higher hardware complexity and are therefore much more difficult to program. The programming effort may be substantially reduced by the introduction of modular libraries of software components that can be reused for a wide class of seismology applications. The ultimate goal of this research is design of a prototype for such library suitable for implementing various seismic signal processing applications on heterogeneous systems. As a representative use case we have chosen an ambient noise correlation application. Ambient noise interferometry has developed into one of the most powerful tools to image and monitor the Earth's interior. Future applications will require the extraction of increasingly small details from noise recordings. To meet this demand, more advanced correlation techniques combined with very large data volumes are needed. This poses new computational problems that require dedicated HPC solutions. The chosen application is using a wide range of common signal processing methods, which include various IIR filter designs, amplitude and phase correlation, computing the analytic signal, and discrete Fourier transforms. Furthermore, various processing methods specific for seismology, like rotation of seismic traces, are used. Efficient implementation of all these methods on the GPU-accelerated systems represents several challenges. In particular, it requires a careful distribution of work between the sequential processors and accelerators. Furthermore, since the application is designed to process very large volumes of data, special attention had to be paid to the efficient use of the available memory and networking hardware resources in order to reduce intensity of data input and output. In our contribution we will explain the software architecture as well as principal engineering decisions used to address these challenges. We will also describe the programming model based on C++ and CUDA that we used to develop the software. Finally, we will demonstrate performance improvements achieved by using the heterogeneous computing architecture. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d26.
Method and apparatus for offloading compute resources to a flash co-processing appliance
Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing -bung
2015-10-13
Solid-State Drive (SSD) burst buffer nodes are interposed into a parallel supercomputing cluster to enable fast burst checkpoint of cluster memory to or from nearby interconnected solid-state storage with asynchronous migration between the burst buffer nodes and slower more distant disk storage. The SSD nodes also perform tasks offloaded from the compute nodes or associated with the checkpoint data. For example, the data for the next job is preloaded in the SSD node and very fast uploaded to the respective compute node just before the next job starts. During a job, the SSD nodes perform fast visualization and statistical analysis upon the checkpoint data. The SSD nodes can also perform data reduction and encryption of the checkpoint data.
Building Columbia from the SysAdmin View
NASA Technical Reports Server (NTRS)
Chan, David
2005-01-01
Project Columbia was built at NASA Ames Research Center in partnership with SGI and Intel. Columbia consists of 20 512 processor Altix machines with 440TB of storage and achieved 51.87 TeraPlops to be ranked the second fastest on the top 500 at SuperComputing 2004. Columbia was delivered, installed and put into production in 3 months. On average, a new Columbia node was brought into production in less than a week. Columbia's configuration, installation, and future plans will be discussed.
NASA Technical Reports Server (NTRS)
Nguyen, Howard; Willacy, Karen; Allen, Mark
2012-01-01
KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
Using a multifrontal sparse solver in a high performance, finite element code
NASA Technical Reports Server (NTRS)
King, Scott D.; Lucas, Robert; Raefsky, Arthur
1990-01-01
We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
Space Weather Action Plan Solar Radio Burst Phase 1 Benchmarks and the Steps to Phase 2
NASA Astrophysics Data System (ADS)
Biesecker, D. A.; White, S. M.; Gopalswamy, N.; Black, C.; Love, J. J.; Pierson, J.
2017-12-01
Solar radio bursts, when at the right frequency and when strong enough, can interfere with radar, communication, and tracking signals. In severe cases, radio bursts can inhibit the successful use of radio communications and disrupt a wide range of systems that are reliant on Position, Navigation, and Timing services on timescales ranging from minutes to hours across wide areas on the dayside of Earth. The White House's Space Weather Action Plan asked for solar radio burst intensity benchmarks for an event occurrence frequency of 1 in 100 years and also a theoretical maximum intensity benchmark. The benchmark team has developed preliminary (phase 1) benchmarks for the VHF (30-300 MHz), UHF (300-3000 MHz), GPS (1176-1602 MHz), F10.7 (2800 MHz), and Microwave (4000-20000) bands. The preliminary benchmarks were derived based on previously published work. Limitations in the published work will be addressed in phase 2 of the benchmark process. In addition, deriving theoretical maxima requires additional work, where it is even possible to, in order to meet the Action Plan objectives. In this presentation, we will present the phase 1 benchmarks, the basis used to derive them, and the limitations of that work. We will also discuss the work that needs to be done to complete the phase 2 benchmarks.
SP2Bench: A SPARQL Performance Benchmark
NASA Astrophysics Data System (ADS)
Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg
A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
Overview 1993: Computational applications
NASA Technical Reports Server (NTRS)
Benek, John A.
1993-01-01
Computational applications include projects that apply or develop computationally intensive computer programs. Such programs typically require supercomputers to obtain solutions in a timely fashion. This report describes two CSTAR projects involving Computational Fluid Dynamics (CFD) technology. The first, the Parallel Processing Initiative, is a joint development effort and the second, the Chimera Technology Development, is a transfer of government developed technology to American industry.
Hahne, Jan; Helias, Moritz; Kunkel, Susanne; Igarashi, Jun; Bolten, Matthias; Frommer, Andreas; Diesmann, Markus
2015-01-01
Contemporary simulators for networks of point and few-compartment model neurons come with a plethora of ready-to-use neuron and synapse models and support complex network topologies. Recent technological advancements have broadened the spectrum of application further to the efficient simulation of brain-scale networks on supercomputers. In distributed network simulations the amount of spike data that accrues per millisecond and process is typically low, such that a common optimization strategy is to communicate spikes at relatively long intervals, where the upper limit is given by the shortest synaptic transmission delay in the network. This approach is well-suited for simulations that employ only chemical synapses but it has so far impeded the incorporation of gap-junction models, which require instantaneous neuronal interactions. Here, we present a numerical algorithm based on a waveform-relaxation technique which allows for network simulations with gap junctions in a way that is compatible with the delayed communication strategy. Using a reference implementation in the NEST simulator, we demonstrate that the algorithm and the required data structures can be smoothly integrated with existing code such that they complement the infrastructure for spiking connections. To show that the unified framework for gap-junction and spiking interactions achieves high performance and delivers high accuracy in the presence of gap junctions, we present benchmarks for workstations, clusters, and supercomputers. Finally, we discuss limitations of the novel technology.
Hahne, Jan; Helias, Moritz; Kunkel, Susanne; Igarashi, Jun; Bolten, Matthias; Frommer, Andreas; Diesmann, Markus
2015-01-01
Contemporary simulators for networks of point and few-compartment model neurons come with a plethora of ready-to-use neuron and synapse models and support complex network topologies. Recent technological advancements have broadened the spectrum of application further to the efficient simulation of brain-scale networks on supercomputers. In distributed network simulations the amount of spike data that accrues per millisecond and process is typically low, such that a common optimization strategy is to communicate spikes at relatively long intervals, where the upper limit is given by the shortest synaptic transmission delay in the network. This approach is well-suited for simulations that employ only chemical synapses but it has so far impeded the incorporation of gap-junction models, which require instantaneous neuronal interactions. Here, we present a numerical algorithm based on a waveform-relaxation technique which allows for network simulations with gap junctions in a way that is compatible with the delayed communication strategy. Using a reference implementation in the NEST simulator, we demonstrate that the algorithm and the required data structures can be smoothly integrated with existing code such that they complement the infrastructure for spiking connections. To show that the unified framework for gap-junction and spiking interactions achieves high performance and delivers high accuracy in the presence of gap junctions, we present benchmarks for workstations, clusters, and supercomputers. Finally, we discuss limitations of the novel technology. PMID:26441628
Improvements to the ICRH antenna time-domain 3D plasma simulation model
NASA Astrophysics Data System (ADS)
Smithe, David N.; Jenkins, Thomas G.; King, J. R.
2015-12-01
We present a summary of ongoing improvements to the 3D time-domain plasma modeling software that has been used to look at ICRH antennas on Alcator C-Mod, NSTX, and ITER [1]. Our past investigations have shown that in low density cases where the slow wave is propagating, strong amplitude lower hybrid resonant fields can occur. Such a scenario could result in significant parasitic power loss in the SOL. The primary resonance broadening in this case is likely collisions with neutral gas, and thus we are upgrading the model to include realistic neutral gas in the SOL, in order to provide a better understanding of energy balance in these situations. Related to this, we are adding a temporal variation capability to the local plasma density in front of the antenna in order to investigate whether the near fields of the antenna could modify the local density sufficiently to initiate a low density situation. We will start with a simple scalar ponderomotive potential density expulsion model [2] for the density evolution, but are also looking to eventually couple to a more complex fluid treatment that would include tensor pressures and convective physics and sources of neutrals and ionization. We also review continued benchmarking efforts, and ongoing and planned improvements to the computational algorithms, resulting from experience gained during our recent supercomputing runs on the Titan supercomputer, including GPU operations.
An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform
NASA Technical Reports Server (NTRS)
Saini, Subhash; Heistand, Steve; Jin, Haoqiang; Chang, Johnny; Hood, Robert T.; Mehrotra, Piyush; Biswas, Rupak
2012-01-01
The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.
Development of a Computing Cluster At the University of Richmond
NASA Astrophysics Data System (ADS)
Carbonneau, J.; Gilfoyle, G. P.; Bunn, E. F.
2010-11-01
The University of Richmond has developed a computing cluster to support the massive simulation and data analysis requirements for programs in intermediate-energy nuclear physics, and cosmology. It is a 20-node, 240-core system running Red Hat Enterprise Linux 5. We have built and installed the physics software packages (Geant4, gemc, MADmap...) and developed shell and Perl scripts for running those programs on the remote nodes. The system has a theoretical processing peak of about 2500 GFLOPS. Testing with the High Performance Linpack (HPL) benchmarking program (one of the standard benchmarks used by the TOP500 list of fastest supercomputers) resulted in speeds of over 900 GFLOPS. The difference between the maximum and measured speeds is due to limitations in the communication speed among the nodes; creating a bottleneck for large memory problems. As HPL sends data between nodes, the gigabit Ethernet connection cannot keep up with the processing power. We will show how both the theoretical and actual performance of the cluster compares with other current and past clusters, as well as the cost per GFLOP. We will also examine the scaling of the performance when distributed to increasing numbers of nodes.
Grid-Enabled High Energy Physics Research using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Mahmood, Akhtar
2005-04-01
At Edinboro University of Pennsylvania, we have built a 8-node 25 Gflops Beowulf Cluster with 2.5 TB of disk storage space to carry out grid-enabled, data-intensive high energy physics research for the ATLAS experiment via Grid3. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes. Once fully functional, the Cluster will be part of Grid3[www.ivdgl.org/grid3]. The current ATLAS simulation grid application, models the entire physical processes from the proton anti-proton collisions and detector's response to the collision debri through the complete reconstruction of the event from analyses of these responses. The end result is a detailed set of data that simulates the real physical collision event inside a particle detector. Grid is the new IT infrastructure for the 21^st century science -- a new computing paradigm that is poised to transform the practice of large-scale data-intensive research in science and engineering. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
An integrity measure to benchmark quantum error correcting memories
NASA Astrophysics Data System (ADS)
Xu, Xiaosi; de Beaudrap, Niel; O'Gorman, Joe; Benjamin, Simon C.
2018-02-01
Rapidly developing experiments across multiple platforms now aim to realise small quantum codes, and so demonstrate a memory within which a logical qubit can be protected from noise. There is a need to benchmark the achievements in these diverse systems, and to compare the inherent power of the codes they rely upon. We describe a recently introduced performance measure called integrity, which relates to the probability that an ideal agent will successfully ‘guess’ the state of a logical qubit after a period of storage in the memory. Integrity is straightforward to evaluate experimentally without state tomography and it can be related to various established metrics such as the logical fidelity and the pseudo-threshold. We offer a set of experimental milestones that are steps towards demonstrating unconditionally superior encoded memories. Using intensive numerical simulations we compare memories based on the five-qubit code, the seven-qubit Steane code, and a nine-qubit code which is the smallest instance of a surface code; we assess both the simple and fault-tolerant implementations of each. While the ‘best’ code upon which to base a memory does vary according to the nature and severity of the noise, nevertheless certain trends emerge.
NASA Astrophysics Data System (ADS)
Buaria, D.; Yeung, P. K.
2017-12-01
A new parallel algorithm utilizing a partitioned global address space (PGAS) programming model to achieve high scalability is reported for particle tracking in direct numerical simulations of turbulent fluid flow. The work is motivated by the desire to obtain Lagrangian information necessary for the study of turbulent dispersion at the largest problem sizes feasible on current and next-generation multi-petaflop supercomputers. A large population of fluid particles is distributed among parallel processes dynamically, based on instantaneous particle positions such that all of the interpolation information needed for each particle is available either locally on its host process or neighboring processes holding adjacent sub-domains of the velocity field. With cubic splines as the preferred interpolation method, the new algorithm is designed to minimize the need for communication, by transferring between adjacent processes only those spline coefficients determined to be necessary for specific particles. This transfer is implemented very efficiently as a one-sided communication, using Co-Array Fortran (CAF) features which facilitate small data movements between different local partitions of a large global array. The cost of monitoring transfer of particle properties between adjacent processes for particles migrating across sub-domain boundaries is found to be small. Detailed benchmarks are obtained on the Cray petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign. For operations on the particles in a 81923 simulation (0.55 trillion grid points) on 262,144 Cray XE6 cores, the new algorithm is found to be orders of magnitude faster relative to a prior algorithm in which each particle is tracked by the same parallel process at all times. This large speedup reduces the additional cost of tracking of order 300 million particles to just over 50% of the cost of computing the Eulerian velocity field at this scale. Improving support of PGAS models on major compilers suggests that this algorithm will be of wider applicability on most upcoming supercomputers.
Efficient development of memory bounded geo-applications to scale on modern supercomputers
NASA Astrophysics Data System (ADS)
Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric
2016-04-01
Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases, near peak memory bandwidth transfer is achieved. Our approach allows us to get the best out of the current hardware.
Comprehensive efficiency analysis of supercomputer resource usage based on system monitoring data
NASA Astrophysics Data System (ADS)
Mamaeva, A. A.; Shaykhislamov, D. I.; Voevodin, Vad V.; Zhumatiy, S. A.
2018-03-01
One of the main problems of modern supercomputers is the low efficiency of their usage, which leads to the significant idle time of computational resources, and, in turn, to the decrease in speed of scientific research. This paper presents three approaches to study the efficiency of supercomputer resource usage based on monitoring data analysis. The first approach performs an analysis of computing resource utilization statistics, which allows to identify different typical classes of programs, to explore the structure of the supercomputer job flow and to track overall trends in the supercomputer behavior. The second approach is aimed specifically at analyzing off-the-shelf software packages and libraries installed on the supercomputer, since efficiency of their usage is becoming an increasingly important factor for the efficient functioning of the entire supercomputer. Within the third approach, abnormal jobs – jobs with abnormally inefficient behavior that differs significantly from the standard behavior of the overall supercomputer job flow – are being detected. For each approach, the results obtained in practice in the Supercomputer Center of Moscow State University are demonstrated.
2014-09-30
portability is difficult to achieve on future supercomputers that use various type of accelerators (GPUs, Xeon - Phi , and SIMD etc). All of these...bottlenecks of NUMA. For example, in the CG code the state vector was originally stored as q(1 : Nvar ,1 : Npoin) where Nvar are the number of...a Global Grid Point (GGP) storage. On the other hand, in the DG code the state vector is typically stored as q(1 : Nvar ,1 : Npts,1 : Nelem) where
Optimization of Supercomputer Use on EADS II System
NASA Technical Reports Server (NTRS)
Ahmed, Ardsher
1998-01-01
The main objective of this research was to optimize supercomputer use to achieve better throughput and utilization of supercomputers and to help facilitate the movement of non-supercomputing (inappropriate for supercomputer) codes to mid-range systems for better use of Government resources at Marshall Space Flight Center (MSFC). This work involved the survey of architectures available on EADS II and monitoring customer (user) applications running on a CRAY T90 system.
Evaluating the Efficacy of Wavelet Configurations on Turbulent-Flow Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Shaomeng; Gruchalla, Kenny; Potter, Kristin
2015-10-25
I/O is increasingly becoming a significant constraint for simulation codes and visualization tools on modern supercomputers. Data compression is an attractive workaround, and, in particular, wavelets provide a promising solution. However, wavelets can be applied in multiple configurations, and the variations in configuration impact accuracy, storage cost, and execution time. While the variation in these factors over wavelet configurations have been explored in image processing, they are not well understood for visualization and analysis of scientific data. To illuminate this issue, we evaluate multiple wavelet configurations on turbulent-flow data. Our approach is to repeat established analysis routines on uncompressed andmore » lossy-compressed versions of a data set, and then quantitatively compare their outcomes. Our findings show that accuracy varies greatly based on wavelet configuration, while storage cost and execution time vary less. Overall, our study provides new insights for simulation analysts and visualization experts, who need to make tradeoffs between accuracy, storage cost, and execution time.« less
Will Allis Prize Talk: Electron Collisions - Experiment, Theory and Applications
NASA Astrophysics Data System (ADS)
Bartschat, Klaus
2016-05-01
Electron collisions with atoms, ions, and molecules represent one of the very early topics of quantum mechanics. In spite of the field's maturity, a number of recent developments in detector technology (e.g., the ``reaction microscope'' or the ``magnetic-angle changer'') and the rapid increase in computational resources have resulted in significant progress in the measurement, understanding, and theoretical/computational description of few-body Coulomb problems. Close collaborations between experimentalists and theorists worldwide continue to produce high-quality benchmark data, which allow for thoroughly testing and further developing a variety of theoretical approaches. As a result, it has now become possible to reliably calculate the vast amount of atomic data needed for detailed modelling of the physics and chemistry of planetary atmospheres, the interpretation of astrophysical data, optimizing the energy transport in reactive plasmas, and many other topics - including light-driven processes, in which electrons are produced by continuous or short-pulse ultra-intense electromagnetic radiation. In this talk, I will highlight some of the recent developments that have had a major impact on the field. This will be followed by showcasing examples, in which accurate electron collision data enabled applications in fields beyond traditional AMO physics. Finally, open problems and challenges for the future will be outlined. I am very grateful for fruitful scientific collaborations with many colleagues, and the long-term financial support by the NSF through the Theoretical AMO and Computational Physics programs, as well as supercomputer resources through TeraGrid and XSEDE.
NASA Astrophysics Data System (ADS)
Bartschat, Klaus
2016-09-01
Electron collisions with atoms, ions, and molecules represent one of the very early topics of quantum mechanics. In spite of the field's maturity, a number of recent developments in detector technology (e.g., the ``reaction microscope'' or the ``magnetic-angle changer'') and the rapid increase in computational resources have resulted in significant progress in the measurement, understanding, and theoretical/computational description of few-body Coulomb problems. Close collaborations between experimentalists and theorists worldwide continue to produce high-quality benchmark data, which allow for thoroughly testing and further developing a variety of theoretical approaches. As a result, it has now become possible to reliably calculate the vast amount of atomic data needed for detailed modelling of the physics and chemistry of planetary atmospheres, the interpretation of astrophysical data, optimizing the energy transport in reactive plasmas, and many other topics - including light-driven processes, in which electrons are produced by continuous or short-pulse ultra-intense electromagnetic radiation. I will highlight some of the recent developments that have had a major impact on the field. This will be followed by showcasing examples, in which accurate electron collision data enabled applications in fields beyond traditional AMO physics. Finally, open problems and challenges for the future will be outlined. I am very grateful for fruitful scientific collaborations with many colleagues, and the long-term financial support by the NSF through the Theoretical AMO and Computational Physics programs, as well as supercomputer resources through TeraGrid and XSEDE.
The role of graphics super-workstations in a supercomputing environment
NASA Technical Reports Server (NTRS)
Levin, E.
1989-01-01
A new class of very powerful workstations has recently become available which integrate near supercomputer computational performance with very powerful and high quality graphics capability. These graphics super-workstations are expected to play an increasingly important role in providing an enhanced environment for supercomputer users. Their potential uses include: off-loading the supercomputer (by serving as stand-alone processors, by post-processing of the output of supercomputer calculations, and by distributed or shared processing), scientific visualization (understanding of results, communication of results), and by real time interaction with the supercomputer (to steer an iterative computation, to abort a bad run, or to explore and develop new algorithms).
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2010 CFR
2010-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2014 CFR
2014-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2012 CFR
2012-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2013 CFR
2013-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2011 CFR
2011-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
Computer Electromagnetics and Supercomputer Architecture
NASA Technical Reports Server (NTRS)
Cwik, Tom
1993-01-01
The dramatic increase in performance over the last decade for microporcessor computations is compared with that for the supercomputer computations. This performance, the projected performance, and a number of other issues such as cost and the inherent pysical limitations in curent supercomputer technology have naturally led to parallel supercomputers and ensemble of interconnected microprocessors.
Applications of Massive Mathematical Computations
1990-04-01
particles from the first principles of QCD . This problem is under intensive numerical study 11-6 using special purpose parallel supercomputers in...several places around the world. The method used here is the Monte Carlo integration for a fixed 3-D plus time lattices . Reliable results are still years...mathematical and theoretical physics, but its most promising applications are in the numerical realization of QCD computations. Our programs for the solution
Science& Technology Review June 2003
DOE Office of Scientific and Technical Information (OSTI.GOV)
McMahon, D
This month's issue has the following articles: (1) Livermore's Three-Pronged Strategy for High-Performance Computing, Commentary by Dona Crawford; (2) Riding the Waves of Supercomputing Technology--Livermore's Computation Directorate is exploiting multiple technologies to ensure high-performance, cost-effective computing; (3) Chromosome 19 and Lawrence Livermore Form a Long-Lasting Bond--Lawrence Livermore biomedical scientists have played an important role in the Human Genome Project through their long-term research on chromosome 19; (4) A New Way to Measure the Mass of Stars--For the first time, scientists have determined the mass of a star in isolation from other celestial bodies; and (5) Flexibly Fueled Storage Tank Bringsmore » Hydrogen-Powered Cars Closer to Reality--Livermore's cryogenic hydrogen fuel storage tank for passenger cars of the future can accommodate three forms of hydrogen fuel separately or in combination.« less
Integrated Vertical Bloch Line (VBL) memory
NASA Technical Reports Server (NTRS)
Katti, R. R.; Wu, J. C.; Stadler, H. L.
1991-01-01
Vertical Bloch Line (VBL) Memory is a recently conceived, integrated, solid state, block access, VLSI memory which offers the potential of 1 Gbit/sq cm areal storage density, data rates of hundreds of megabits/sec, and submillisecond average access time simultaneously at relatively low mass, volume, and power values when compared to alternative technologies. VBLs are micromagnetic structures within magnetic domain walls which can be manipulated using magnetic fields from integrated conductors. The presence or absence of BVL pairs are used to store binary information. At present, efforts are being directed at developing a single chip memory using 25 Mbit/sq cm technology in magnetic garnet material which integrates, at a single operating point, the writing, storage, reading, and amplification functions needed in a memory. The current design architecture, functional elements, and supercomputer simulation results are described which are used to assist the design process.
Accessing Wind Tunnels From NASA's Information Power Grid
NASA Technical Reports Server (NTRS)
Becker, Jeff; Biegel, Bryan (Technical Monitor)
2002-01-01
The NASA Ames wind tunnel customers are one of the first users of the Information Power Grid (IPG) storage system at the NASA Advanced Supercomputing Division. We wanted to be able to store their data on the IPG so that it could be accessed remotely in a secure but timely fashion. In addition, incorporation into the IPG allows future use of grid computational resources, e.g., for post-processing of data, or to do side-by-side CFD validation. In this paper, we describe the integration of grid data access mechanisms with the existing DARWIN web-based system that is used to access wind tunnel test data. We also show that the combined system has reasonable performance: wind tunnel data may be retrieved at 50Mbits/s over a 100 base T network connected to the IPG storage server.
Space Weather Action Plan Ionizing Radiation Benchmarks: Phase 1 update and plans for Phase 2
NASA Astrophysics Data System (ADS)
Talaat, E. R.; Kozyra, J.; Onsager, T. G.; Posner, A.; Allen, J. E., Jr.; Black, C.; Christian, E. R.; Copeland, K.; Fry, D. J.; Johnston, W. R.; Kanekal, S. G.; Mertens, C. J.; Minow, J. I.; Pierson, J.; Rutledge, R.; Semones, E.; Sibeck, D. G.; St Cyr, O. C.; Xapsos, M.
2017-12-01
Changes in the near-Earth radiation environment can affect satellite operations, astronauts in space, commercial space activities, and the radiation environment on aircraft at relevant latitudes or altitudes. Understanding the diverse effects of increased radiation is challenging, but producing ionizing radiation benchmarks will help address these effects. The following areas have been considered in addressing the near-Earth radiation environment: the Earth's trapped radiation belts, the galactic cosmic ray background, and solar energetic-particle events. The radiation benchmarks attempt to account for any change in the near-Earth radiation environment, which, under extreme cases, could present a significant risk to critical infrastructure operations or human health. The goal of these ionizing radiation benchmarks and associated confidence levels will define at least the radiation intensity as a function of time, particle type, and energy for an occurrence frequency of 1 in 100 years and an intensity level at the theoretical maximum for the event. In this paper, we present the benchmarks that address radiation levels at all applicable altitudes and latitudes in the near-Earth environment, the assumptions made and the associated uncertainties, and the next steps planned for updating the benchmarks.
Edison - A New Cray Supercomputer Advances Discovery at NERSC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy
2014-02-06
When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Edison - A New Cray Supercomputer Advances Discovery at NERSC
Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy; Trebotich, David; Broughton, Jeff; Antypas, Katie; Lukic, Zarija, Borrill, Julian; Draney, Brent; Chen, Jackie
2018-01-16
When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Thermo-hydro-mechanical-chemical processes in fractured-porous media: Benchmarks and examples
NASA Astrophysics Data System (ADS)
Kolditz, O.; Shao, H.; Görke, U.; Kalbacher, T.; Bauer, S.; McDermott, C. I.; Wang, W.
2012-12-01
The book comprises an assembly of benchmarks and examples for porous media mechanics collected over the last twenty years. Analysis of thermo-hydro-mechanical-chemical (THMC) processes is essential to many applications in environmental engineering, such as geological waste deposition, geothermal energy utilisation, carbon capture and storage, water resources management, hydrology, even climate change. In order to assess the feasibility as well as the safety of geotechnical applications, process-based modelling is the only tool to put numbers, i.e. to quantify future scenarios. This charges a huge responsibility concerning the reliability of computational tools. Benchmarking is an appropriate methodology to verify the quality of modelling tools based on best practices. Moreover, benchmarking and code comparison foster community efforts. The benchmark book is part of the OpenGeoSys initiative - an open source project to share knowledge and experience in environmental analysis and scientific computation.
Quantum lattice model solver HΦ
NASA Astrophysics Data System (ADS)
Kawamura, Mitsuaki; Yoshimi, Kazuyoshi; Misawa, Takahiro; Yamaji, Youhei; Todo, Synge; Kawashima, Naoki
2017-08-01
HΦ [aitch-phi ] is a program package based on the Lanczos-type eigenvalue solution applicable to a broad range of quantum lattice models, i.e., arbitrary quantum lattice models with two-body interactions, including the Heisenberg model, the Kitaev model, the Hubbard model and the Kondo-lattice model. While it works well on PCs and PC-clusters, HΦ also runs efficiently on massively parallel computers, which considerably extends the tractable range of the system size. In addition, unlike most existing packages, HΦ supports finite-temperature calculations through the method of thermal pure quantum (TPQ) states. In this paper, we explain theoretical background and user-interface of HΦ. We also show the benchmark results of HΦ on supercomputers such as the K computer at RIKEN Advanced Institute for Computational Science (AICS) and SGI ICE XA (Sekirei) at the Institute for the Solid State Physics (ISSP).
Johnson, T K; Vessella, R L
1989-07-01
Dosimetry calculations of monoclonal antibodies (MABs) are made difficult because the focus of radioactivity is targeted for a nonstandard volume in a nonstandard geometry, precluding straightforward application of the MIRD formalism. The MABDOS software addresses this shortcoming by interactive placement of a spherical perturbation into the Standard Man geometry for each tumor focus. S tables are calculated by a Monte Carlo simulation of photon transport for each organ system (including tumor) that localizes activity. Performance benchmarks are reported that measure the time required to simulate 60,000 photons for each penetrating radiation in the spectrum of 99mTc and 131I using the kidney as source organ. Results indicate that calculation times are probably prohibitive on current microcomputer platforms. Mini and supercomputers offer a realistic platform for MABDOS patient dosimetry estimates.
Porous polymeric materials for hydrogen storage
Yu, Luping [Hoffman Estates, IL; Liu, Di-Jia [Naperville, IL; Yuan, Shengwen [Chicago, IL; Yang, Junbing [Westmont, IL
2011-12-13
Porous polymers, tribenzohexazatriphenylene, poly-9,9'-spirobifluorene, poly-tetraphenyl methane and their derivatives for storage of H.sub.2 prepared through a chemical synthesis method. The porous polymers have high specific surface area and narrow pore size distribution. Hydrogen uptake measurements conducted for these polymers determined a higher hydrogen storage capacity at the ambient temperature over that of the benchmark materials. The method of preparing such polymers, includes oxidatively activating solids by CO.sub.2/steam oxidation and supercritical water treatment.
Porous polymeric materials for hydrogen storage
Yu, Luping; Liu, Di-Jia; Yuan, Shengwen; Yang, Junbing
2013-04-02
A porous polymer, poly-9,9'-spirobifluorene and its derivatives for storage of H.sub.2 are prepared through a chemical synthesis method. The porous polymers have high specific surface area and narrow pore size distribution. Hydrogen uptake measurements conducted for these polymers determined a higher hydrogen storage capacity at the ambient temperature over that of the benchmark materials. The method of preparing such polymers, includes oxidatively activating solids by CO.sub.2/steam oxidation and supercritical water treatment.
Proactive replica checking to assure reliability of data in cloud storage with minimum replication
NASA Astrophysics Data System (ADS)
Murarka, Damini; Maheswari, G. Uma
2017-11-01
The two major issues for cloud storage systems are data reliability and storage costs. For data reliability protection, multi-replica replication strategy which is used mostly in current clouds acquires huge storage consumption, leading to a large storage cost for applications within the loud specifically. This paper presents a cost-efficient data reliability mechanism named PRCR to cut back the cloud storage consumption. PRCR ensures data reliability of large cloud information with the replication that might conjointly function as a price effective benchmark for replication. The duplication shows that when resembled to the standard three-replica approach, PRCR will scale back to consume only a simple fraction of the cloud storage from one-third of the storage, thence considerably minimizing the cloud storage price.
Radio Astronomy at the Centre for High Performance Computing in South Africa
NASA Astrophysics Data System (ADS)
Catherine Cress; UWC Simulation Team
2014-04-01
I will present results on galaxy evolution and cosmology which we obtained using the supercomputing facilities at the CHPC. These include cosmological-scale N-body simulations modelling neutral hydrogen as well as the study of the clustering of radio galaxies to probe the relationship between dark and luminous matter in the universe. I will also discuss the various roles that the CHPC is playing in Astronomy in SA, including the provision of HPC for a variety of Astronomical applications, the provision of storage for radio data, our educational programs and our participation in planning for the SKA.
NASA Technical Reports Server (NTRS)
2002-01-01
The study of Earth science is like a giant puzzle, says Braulio Sanchez. "The more you know about the individual pieces, the easier it is to fit them together." A researcher with Goddard's Space Geodesy Branch, Sanchez has been using NCCS supercomputer and mass storage resources to show how the angular momenta of the atmosphere, the oceans, and the solid Earth are dynamically coupled. Sanchez has calculated the magnitude of atmospheric torque on the planet and has determined some of the possible effects that torque has on Earth's rotation.
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
COMBAT: mobile-Cloud-based cOmpute/coMmunications infrastructure for BATtlefield applications
NASA Astrophysics Data System (ADS)
Soyata, Tolga; Muraleedharan, Rajani; Langdon, Jonathan; Funai, Colin; Ames, Scott; Kwon, Minseok; Heinzelman, Wendi
2012-05-01
The amount of data processed annually over the Internet has crossed the zetabyte boundary, yet this Big Data cannot be efficiently processed or stored using today's mobile devices. Parallel to this explosive growth in data, a substantial increase in mobile compute-capability and the advances in cloud computing have brought the state-of-the- art in mobile-cloud computing to an inflection point, where the right architecture may allow mobile devices to run applications utilizing Big Data and intensive computing. In this paper, we propose the MObile Cloud-based Hybrid Architecture (MOCHA), which formulates a solution to permit mobile-cloud computing applications such as object recognition in the battlefield by introducing a mid-stage compute- and storage-layer, called the cloudlet. MOCHA is built on the key observation that many mobile-cloud applications have the following characteristics: 1) they are compute-intensive, requiring the compute-power of a supercomputer, and 2) they use Big Data, requiring a communications link to cloud-based database sources in near-real-time. In this paper, we describe the operation of MOCHA in battlefield applications, by formulating the aforementioned mobile and cloudlet to be housed within a soldier's vest and inside a military vehicle, respectively, and enabling access to the cloud through high latency satellite links. We provide simulations using the traditional mobile-cloud approach as well as utilizing MOCHA with a mid-stage cloudlet to quantify the utility of this architecture. We show that the MOCHA platform for mobile-cloud computing promises a future for critical battlefield applications that access Big Data, which is currently not possible using existing technology.
Staff confidence in dealing with aggressive patients: a benchmarking exercise.
McGowan, S; Wynaden, D; Harding, N; Yassine, A; Parker, J
1999-09-01
Interacting with potentially aggressive patients is a common occurrence for nurses working in psychiatric intensive care units. Although the literature highlights the need to educate staff in the prevention and management of aggression, often little, or no, training is provided by employers. This article describes a benchmarking exercise conducted in psychiatric intensive care units at two Western Australian hospitals to assess staff confidence in coping with patient aggression. Results demonstrated that staff in the hospital where regular training was undertaken were significantly more confident in dealing with aggression. Following the completion of a safe physical restraint module at the other hospital staff reported a significant increase in their level of confidence that either matched or bettered the results of their benchmark colleagues.
Miyakawa, Tomoki; Satoh, Masaki; Miura, Hiroaki; Tomita, Hirofumi; Yashiro, Hisashi; Noda, Akira T.; Yamada, Yohei; Kodama, Chihiro; Kimoto, Masahide; Yoneyama, Kunio
2014-01-01
Global cloud/cloud system-resolving models are perceived to perform well in the prediction of the Madden–Julian Oscillation (MJO), a huge eastward -propagating atmospheric pulse that dominates intraseasonal variation of the tropics and affects the entire globe. However, owing to model complexity, detailed analysis is limited by computational power. Here we carry out a simulation series using a recently developed supercomputer, which enables the statistical evaluation of the MJO prediction skill of a costly new-generation model in a manner similar to operational forecast models. We estimate the current MJO predictability of the model as 27 days by conducting simulations including all winter MJO cases identified during 2003–2012. The simulated precipitation patterns associated with different MJO phases compare well with observations. An MJO case captured in a recent intensive observation is also well reproduced. Our results reveal that the global cloud-resolving approach is effective in understanding the MJO and in providing month-long tropical forecasts. PMID:24801254
Miyakawa, Tomoki; Satoh, Masaki; Miura, Hiroaki; Tomita, Hirofumi; Yashiro, Hisashi; Noda, Akira T; Yamada, Yohei; Kodama, Chihiro; Kimoto, Masahide; Yoneyama, Kunio
2014-05-06
Global cloud/cloud system-resolving models are perceived to perform well in the prediction of the Madden-Julian Oscillation (MJO), a huge eastward -propagating atmospheric pulse that dominates intraseasonal variation of the tropics and affects the entire globe. However, owing to model complexity, detailed analysis is limited by computational power. Here we carry out a simulation series using a recently developed supercomputer, which enables the statistical evaluation of the MJO prediction skill of a costly new-generation model in a manner similar to operational forecast models. We estimate the current MJO predictability of the model as 27 days by conducting simulations including all winter MJO cases identified during 2003-2012. The simulated precipitation patterns associated with different MJO phases compare well with observations. An MJO case captured in a recent intensive observation is also well reproduced. Our results reveal that the global cloud-resolving approach is effective in understanding the MJO and in providing month-long tropical forecasts.
Benchmarking criticality analysis of TRIGA fuel storage racks.
Robinson, Matthew Loren; DeBey, Timothy M; Higginbotham, Jack F
2017-01-01
A criticality analysis was benchmarked to sub-criticality measurements of the hexagonal fuel storage racks at the United States Geological Survey TRIGA MARK I reactor in Denver. These racks, which hold up to 19 fuel elements each, are arranged at 0.61m (2 feet) spacings around the outer edge of the reactor. A 3-dimensional model was created of the racks using MCNP5, and the model was verified experimentally by comparison to measured subcritical multiplication data collected in an approach to critical loading of two of the racks. The validated model was then used to show that in the extreme condition where the entire circumference of the pool was lined with racks loaded with used fuel the storage array is subcritical with a k value of about 0.71; well below the regulatory limit of 0.8. A model was also constructed of the rectangular 2×10 fuel storage array used in many other TRIGA reactors to validate the technique against the original TRIGA licensing sub-critical analysis performed in 1966. The fuel used in this study was standard 20% enriched (LEU) aluminum or stainless steel clad TRIGA fuel. Copyright © 2016. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Jain, Anubhav
2017-04-01
Density functional theory (DFT) simulations solve for the electronic structure of materials starting from the Schrödinger equation. Many case studies have now demonstrated that researchers can often use DFT to design new compounds in the computer (e.g., for batteries, catalysts, and hydrogen storage) before synthesis and characterization in the lab. In this talk, I will focus on how DFT calculations can be executed on large supercomputing resources in order to generate very large data sets on new materials for functional applications. First, I will briefly describe the Materials Project, an effort at LBNL that has virtually characterized over 60,000 materials using DFT and has shared the results with over 17,000 registered users. Next, I will talk about how such data can help discover new materials, describing how preliminary computational screening led to the identification and confirmation of a new family of bulk AMX2 thermoelectric compounds with measured zT reaching 0.8. I will outline future plans for how such data-driven methods can be used to better understand the factors that control thermoelectric behavior, e.g., for the rational design of electronic band structures, in ways that are different from conventional approaches.
Sharing lattice QCD data over a widely distributed file system
NASA Astrophysics Data System (ADS)
Amagasa, T.; Aoki, S.; Aoki, Y.; Aoyama, T.; Doi, T.; Fukumura, K.; Ishii, N.; Ishikawa, K.-I.; Jitsumoto, H.; Kamano, H.; Konno, Y.; Matsufuru, H.; Mikami, Y.; Miura, K.; Sato, M.; Takeda, S.; Tatebe, O.; Togawa, H.; Ukawa, A.; Ukita, N.; Watanabe, Y.; Yamazaki, T.; Yoshie, T.
2015-12-01
JLDG is a data-grid for the lattice QCD (LQCD) community in Japan. Several large research groups in Japan have been working on lattice QCD simulations using supercomputers distributed over distant sites. The JLDG provides such collaborations with an efficient method of data management and sharing. File servers installed on 9 sites are connected to the NII SINET VPN and are bound into a single file system with the GFarm. The file system looks the same from any sites, so that users can do analyses on a supercomputer on a site, using data generated and stored in the JLDG at a different site. We present a brief description of hardware and software of the JLDG, including a recently developed subsystem for cooperating with the HPCI shared storage, and report performance and statistics of the JLDG. As of April 2015, 15 research groups (61 users) store their daily research data of 4.7PB including replica and 68 million files in total. Number of publications for works which used the JLDG is 98. The large number of publications and recent rapid increase of disk usage convince us that the JLDG has grown up into a useful infrastructure for LQCD community in Japan.
Automatic discovery of the communication network topology for building a supercomputer model
NASA Astrophysics Data System (ADS)
Sobolev, Sergey; Stefanov, Konstantin; Voevodin, Vadim
2016-10-01
The Research Computing Center of Lomonosov Moscow State University is developing the Octotron software suite for automatic monitoring and mitigation of emergency situations in supercomputers so as to maximize hardware reliability. The suite is based on a software model of the supercomputer. The model uses a graph to describe the computing system components and their interconnections. One of the most complex components of a supercomputer that needs to be included in the model is its communication network. This work describes the proposed approach for automatically discovering the Ethernet communication network topology in a supercomputer and its description in terms of the Octotron model. This suite automatically detects computing nodes and switches, collects information about them and identifies their interconnections. The application of this approach is demonstrated on the "Lomonosov" and "Lomonosov-2" supercomputers.
Damsel: A Data Model Storage Library for Exascale Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koziol, Quincey
The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. We will accomplish this through three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community.
TOP500 Supercomputers for June 2004
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
2004-06-23
23rd Edition of TOP500 List of World's Fastest Supercomputers Released: Japan's Earth Simulator Enters Third Year in Top Position MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 23rd edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2004) at the International Supercomputer Conference in Heidelberg, Germany.
Automotive applications of superconductors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ginsberg, M.
1987-01-01
These proceedings compile papers on supercomputers in the automobile industry. Titles include: An automotive engineer's guide to the effective use of scalar, vector, and parallel computers; fluid mechanics, finite elements, and supercomputers; and Automotive crashworthiness performance on a supercomputer.
Implementing Journaling in a Linux Shared Disk File System
NASA Technical Reports Server (NTRS)
Preslan, Kenneth W.; Barry, Andrew; Brassow, Jonathan; Cattelan, Russell; Manthei, Adam; Nygaard, Erling; VanOort, Seth; Teigland, David; Tilstra, Mike; O'Keefe, Matthew;
2000-01-01
In computer systems today, speed and responsiveness is often determined by network and storage subsystem performance. Faster, more scalable networking interfaces like Fibre Channel and Gigabit Ethernet provide the scaffolding from which higher performance computer systems implementations may be constructed, but new thinking is required about how machines interact with network-enabled storage devices. In this paper we describe how we implemented journaling in the Global File System (GFS), a shared-disk, cluster file system for Linux. Our previous three papers on GFS at the Mass Storage Symposium discussed our first three GFS implementations, their performance, and the lessons learned. Our fourth paper describes, appropriately enough, the evolution of GFS version 3 to version 4, which supports journaling and recovery from client failures. In addition, GFS scalability tests extending to 8 machines accessing 8 4-disk enclosures were conducted: these tests showed good scaling. We describe the GFS cluster infrastructure, which is necessary for proper recovery from machine and disk failures in a collection of machines sharing disks using GFS. Finally, we discuss the suitability of Linux for handling the big data requirements of supercomputing centers.
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation
NASA Astrophysics Data System (ADS)
Anisenkov, A. V.
2018-03-01
In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
Improved Access to Supercomputers Boosts Chemical Applications.
ERIC Educational Resources Information Center
Borman, Stu
1989-01-01
Supercomputing is described in terms of computing power and abilities. The increase in availability of supercomputers for use in chemical calculations and modeling are reported. Efforts of the National Science Foundation and Cray Research are highlighted. (CW)
Scientific Visualization in High Speed Network Environments
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kutler, Paul (Technical Monitor)
1997-01-01
In several cases, new visualization techniques have vastly increased the researcher's ability to analyze and comprehend data. Similarly, the role of networks in providing an efficient supercomputing environment have become more critical and continue to grow at a faster rate than the increase in the processing capabilities of supercomputers. A close relationship between scientific visualization and high-speed networks in providing an important link to support efficient supercomputing is identified. The two technologies are driven by the increasing complexities and volume of supercomputer data. The interaction of scientific visualization and high-speed networks in a Computational Fluid Dynamics simulation/visualization environment are given. Current capabilities supported by high speed networks, supercomputers, and high-performance graphics workstations at the Numerical Aerodynamic Simulation Facility (NAS) at NASA Ames Research Center are described. Applied research in providing a supercomputer visualization environment to support future computational requirements are summarized.
IonGAP: integrative bacterial genome analysis for Ion Torrent sequence data.
Baez-Ortega, Adrian; Lorenzo-Diaz, Fabian; Hernandez, Mariano; Gonzalez-Vila, Carlos Ignacio; Roda-Garcia, Jose Luis; Colebrook, Marcos; Flores, Carlos
2015-09-01
We introduce IonGAP, a publicly available Web platform designed for the analysis of whole bacterial genomes using Ion Torrent sequence data. Besides assembly, it integrates a variety of comparative genomics, annotation and bacterial classification routines, based on the widely used FASTQ, BAM and SRA file formats. Benchmarking with different datasets evidenced that IonGAP is a fast, powerful and simple-to-use bioinformatics tool. By releasing this platform, we aim to translate low-cost bacterial genome analysis for microbiological prevention and control in healthcare, agroalimentary and pharmaceutical industry applications. IonGAP is hosted by the ITER's Teide-HPC supercomputer and is freely available on the Web for non-commercial use at http://iongap.hpc.iter.es. mcolesan@ull.edu.es or cflores@ull.edu.es Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Automation of Data Traffic Control on DSM Architecture
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry
2001-01-01
The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Review of FD-TD numerical modeling of electromagnetic wave scattering and radar cross section
NASA Technical Reports Server (NTRS)
Taflove, Allen; Umashankar, Korada R.
1989-01-01
Applications of the finite-difference time-domain (FD-TD) method for numerical modeling of electromagnetic wave interactions with structures are reviewed, concentrating on scattering and radar cross section (RCS). A number of two- and three-dimensional examples of FD-TD modeling of scattering and penetration are provided. The objects modeled range in nature from simple geometric shapes to extremely complex aerospace and biological systems. Rigorous analytical or experimental validatons are provided for the canonical shapes, and it is shown that FD-TD predictive data for near fields and RCS are in excellent agreement with the benchmark data. It is concluded that with continuing advances in FD-TD modeling theory for target features relevant to the RCS problems and in vector and concurrent supercomputer technology, it is likely that FD-TD numerical modeling will occupy an important place in RCS technology in the 1990s and beyond.
Towards Efficient Supercomputing: Searching for the Right Efficiency Metric
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hsu, Chung-Hsing; Kuehn, Jeffery A; Poole, Stephen W
2012-01-01
The efficiency of supercomputing has traditionally been in the execution time. In early 2000 s, the concept of total cost of ownership was re-introduced, with the introduction of efficiency measure to include aspects such as energy and space. Yet the supercomputing community has never agreed upon a metric that can cover these aspects altogether and also provide a fair basis for comparison. This paper exam- ines the metrics that have been proposed in the past decade, and proposes a vector-valued metric for efficient supercom- puting. Using this metric, the paper presents a study of where the supercomputing industry has beenmore » and how it stands today with respect to efficient supercomputing.« less
Sub-Selective Quantization for Learning Binary Codes in Large-Scale Image Search.
Li, Yeqing; Liu, Wei; Huang, Junzhou
2018-06-01
Recently with the explosive growth of visual content on the Internet, large-scale image search has attracted intensive attention. It has been shown that mapping high-dimensional image descriptors to compact binary codes can lead to considerable efficiency gains in both storage and performing similarity computation of images. However, most existing methods still suffer from expensive training devoted to large-scale binary code learning. To address this issue, we propose a sub-selection based matrix manipulation algorithm, which can significantly reduce the computational cost of code learning. As case studies, we apply the sub-selection algorithm to several popular quantization techniques including cases using linear and nonlinear mappings. Crucially, we can justify the resulting sub-selective quantization by proving its theoretic properties. Extensive experiments are carried out on three image benchmarks with up to one million samples, corroborating the efficacy of the sub-selective quantization method in terms of image retrieval.
NASA's supercomputing experience
NASA Technical Reports Server (NTRS)
Bailey, F. Ron
1990-01-01
A brief overview of NASA's recent experience in supercomputing is presented from two perspectives: early systems development and advanced supercomputing applications. NASA's role in supercomputing systems development is illustrated by discussion of activities carried out by the Numerical Aerodynamical Simulation Program. Current capabilities in advanced technology applications are illustrated with examples in turbulence physics, aerodynamics, aerothermodynamics, chemistry, and structural mechanics. Capabilities in science applications are illustrated by examples in astrophysics and atmospheric modeling. Future directions and NASA's new High Performance Computing Program are briefly discussed.
OpenMP Performance on the Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Haoqiang, Jin; Hood, Robert
2005-01-01
This presentation discusses Columbia World Class Supercomputer which is one of the world's fastest supercomputers providing 61 TFLOPs (10/20/04). Conceived, designed, built, and deployed in just 120 days. A 20-node supercomputer built on proven 512-processor nodes. The largest SGI system in the world with over 10,000 Intel Itanium 2 processors and provides the largest node size incorporating commodity parts (512) and the largest shared-memory environment (2048) with 88% efficiency tops the scalar systems on the Top500 list.
The Isprs Benchmark on Indoor Modelling
NASA Astrophysics Data System (ADS)
Khoshelham, K.; Díaz Vilariño, L.; Peter, M.; Kang, Z.; Acharya, D.
2017-09-01
Automated generation of 3D indoor models from point cloud data has been a topic of intensive research in recent years. While results on various datasets have been reported in literature, a comparison of the performance of different methods has not been possible due to the lack of benchmark datasets and a common evaluation framework. The ISPRS benchmark on indoor modelling aims to address this issue by providing a public benchmark dataset and an evaluation framework for performance comparison of indoor modelling methods. In this paper, we present the benchmark dataset comprising several point clouds of indoor environments captured by different sensors. We also discuss the evaluation and comparison of indoor modelling methods based on manually created reference models and appropriate quality evaluation criteria. The benchmark dataset is available for download at: http://www2.isprs.org/commissions/comm4/wg5/benchmark-on-indoor-modelling.html.
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Ye; Ma, Xiaosong; Liu, Qing Gary
2015-01-01
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
Parameter regimes for a single sequential quantum repeater
NASA Astrophysics Data System (ADS)
Rozpędek, F.; Goodenough, K.; Ribeiro, J.; Kalb, N.; Caprara Vivoli, V.; Reiserer, A.; Hanson, R.; Wehner, S.; Elkouss, D.
2018-07-01
Quantum key distribution allows for the generation of a secret key between distant parties connected by a quantum channel such as optical fibre or free space. Unfortunately, the rate of generation of a secret key by direct transmission is fundamentally limited by the distance. This limit can be overcome by the implementation of so-called quantum repeaters. Here, we assess the performance of a specific but very natural setup called a single sequential repeater for quantum key distribution. We offer a fine-grained assessment of the repeater by introducing a series of benchmarks. The benchmarks, which should be surpassed to claim a working repeater, are based on finite-energy considerations, thermal noise and the losses in the setup. In order to boost the performance of the studied repeaters we introduce two methods. The first one corresponds to the concept of a cut-off, which reduces the effect of decoherence during the storage of a quantum state by introducing a maximum storage time. Secondly, we supplement the standard classical post-processing with an advantage distillation procedure. Using these methods, we find realistic parameters for which it is possible to achieve rates greater than each of the benchmarks, guiding the way towards implementing quantum repeaters.
Final Report for Project FG02-05ER25685
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiaosong Ma
2009-05-07
In this report, the PI summarizes the results and achievements obtained in the sponsored project. Overall, the project has been very successful and produced both research results in massive data-intensive computing and data management for large scale supercomputers today, and in open-source software products. During the project period, 14 conference/journal publications, as well as two PhD students, have been produced due to exclusive or shared support from this award. In addition, the PI has recently been granted tenure from NC State University.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System
NASA Astrophysics Data System (ADS)
Wang, R.; Harris, C.; Wicenec, A.
2016-07-01
In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
Supercomputer networking for space science applications
NASA Technical Reports Server (NTRS)
Edelson, B. I.
1992-01-01
The initial design of a supercomputer network topology including the design of the communications nodes along with the communications interface hardware and software is covered. Several space science applications that are proposed experiments by GSFC and JPL for a supercomputer network using the NASA ACTS satellite are also reported.
Most Social Scientists Shun Free Use of Supercomputers.
ERIC Educational Resources Information Center
Kiernan, Vincent
1998-01-01
Social scientists, who frequently complain that the federal government spends too little on them, are passing up what scholars in the physical and natural sciences see as the government's best give-aways: free access to supercomputers. Some social scientists say the supercomputers are difficult to use; others find desktop computers provide…
A fault tolerant spacecraft supercomputer to enable a new class of scientific discovery
NASA Technical Reports Server (NTRS)
Katz, D. S.; McVittie, T. I.; Silliman, A. G., Jr.
2000-01-01
The goal of the Remote Exploration and Experimentation (REE) Project is to move supercomputeing into space in a coste effective manner and to allow the use of inexpensive, state of the art, commercial-off-the-shelf components and subsystems in these space-based supercomputers.
Benchmarking organic mixed conductors for transistors.
Inal, Sahika; Malliaras, George G; Rivnay, Jonathan
2017-11-24
Organic mixed conductors have garnered significant attention in applications from bioelectronics to energy storage/generation. Their implementation in organic transistors has led to enhanced biosensing, neuromorphic function, and specialized circuits. While a narrow class of conducting polymers continues to excel in these new applications, materials design efforts have accelerated as researchers target new functionality, processability, and improved performance/stability. Materials for organic electrochemical transistors (OECTs) require both efficient electronic transport and facile ion injection in order to sustain high capacity. In this work, we show that the product of the electronic mobility and volumetric charge storage capacity (µC*) is the materials/system figure of merit; we use this framework to benchmark and compare the steady-state OECT performance of ten previously reported materials. This product can be independently verified and decoupled to guide materials design and processing. OECTs can therefore be used as a tool for understanding and designing new organic mixed conductors.
Modification and benchmarking of SKYSHINE-III for use with ISFSI cask arrays
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hertel, N.E.; Napolitano, D.G.
1997-12-01
Dry cask storage arrays are becoming more and more common at nuclear power plants in the United States. Title 10 of the Code of Federal Regulations, Part 72, limits doses at the controlled area boundary of these independent spent-fuel storage installations (ISFSI) to 0.25 mSv (25 mrem)/yr. The minimum controlled area boundaries of such a facility are determined by cask array dose calculations, which include direct radiation and radiation scattered by the atmosphere, also known as skyshine. NAC International (NAC) uses SKYSHINE-III to calculate the gamma-ray and neutron dose rates as a function of distance from ISFSI arrays. In thismore » paper, we present modifications to the SKYSHINE-III that more explicitly model cask arrays. In addition, we have benchmarked the radiation transport methods used in SKYSHINE-III against {sup 60}Co gamma-ray experiments and MCNP neutron calculations.« less
Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolfe, Noah; Carothers, Christopher; Mubarak, Misbah
As supercomputers close in on exascale performance, the increased number of processors and processing power translates to an increased demand on the underlying network interconnect. The Slim Fly network topology, a new lowdiameter and low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present a high-fidelity Slim Fly it-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate our Slim Fly model with the Kathareios et al. Slim Fly model results provided at moderately sized network scales. We further scale the modelmore » size up to n unprecedented 1 million compute nodes; and through visualization of network simulation metrics such as link bandwidth, packet latency, and port occupancy, we get an insight into the network behavior at the million-node scale. We also show linear strong scaling of the Slim Fly model on an Intel cluster achieving a peak event rate of 36 million events per second using 128 MPI tasks to process 7 billion events. Detailed analysis of the underlying discrete-event simulation performance shows that a million-node Slim Fly model simulation can execute in 198 seconds on the Intel cluster.« less
SAMSA2: a standalone metatranscriptome analysis pipeline.
Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G
2018-05-21
Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.
Distributed user services for supercomputers
NASA Technical Reports Server (NTRS)
Sowizral, Henry A.
1989-01-01
User-service operations at supercomputer facilities are examined. The question is whether a single, possibly distributed, user-services organization could be shared by NASA's supercomputer sites in support of a diverse, geographically dispersed, user community. A possible structure for such an organization is identified as well as some of the technologies needed in operating such an organization.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolfe, A.
1986-03-10
Supercomputing software is moving into high gear, spurred by the rapid spread of supercomputers into new applications. The critical challenge is how to develop tools that will make it easier for programmers to write applications that take advantage of vectorizing in the classical supercomputer and the parallelism that is emerging in supercomputers and minisupercomputers. Writing parallel software is a challenge that every programmer must face because parallel architectures are springing up across the range of computing. Cray is developing a host of tools for programmers. Tools to support multitasking (in supercomputer parlance, multitasking means dividing up a single program tomore » run on multiple processors) are high on Cray's agenda. On tap for multitasking is Premult, dubbed a microtasking tool. As a preprocessor for Cray's CFT77 FORTRAN compiler, Premult will provide fine-grain multitasking.« less
Barminova, H Y; Saratovskyh, M S
2016-02-01
The experiment automation system is supposed to be developed for experimental facility for material science at ITEP, based on a Bernas ion source. The program CAMFT is assumed to be involved into the program of the experiment automation. CAMFT is developed to simulate the intense charged particle bunch motion in the external magnetic fields with arbitrary geometry by means of the accurate solution of the particle motion equation. Program allows the consideration of the bunch intensity up to 10(10) ppb. Preliminary calculations are performed at ITEP supercomputer. The results of the simulation of the beam pre-acceleration and following turn in magnetic field are presented for different initial conditions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barminova, H. Y., E-mail: barminova@bk.ru; Saratovskyh, M. S.
2016-02-15
The experiment automation system is supposed to be developed for experimental facility for material science at ITEP, based on a Bernas ion source. The program CAMFT is assumed to be involved into the program of the experiment automation. CAMFT is developed to simulate the intense charged particle bunch motion in the external magnetic fields with arbitrary geometry by means of the accurate solution of the particle motion equation. Program allows the consideration of the bunch intensity up to 10{sup 10} ppb. Preliminary calculations are performed at ITEP supercomputer. The results of the simulation of the beam pre-acceleration and following turnmore » in magnetic field are presented for different initial conditions.« less
NASA Technical Reports Server (NTRS)
Botts, Michael E.; Phillips, Ron J.; Parker, John V.; Wright, Patrick D.
1992-01-01
Five scientists at MSFC/ESAD have EOS SCF investigator status. Each SCF has unique tasks which require the establishment of a computing facility dedicated to accomplishing those tasks. A SCF Working Group was established at ESAD with the charter of defining the computing requirements of the individual SCFs and recommending options for meeting these requirements. The primary goal of the working group was to determine which computing needs can be satisfied using either shared resources or separate but compatible resources, and which needs require unique individual resources. The requirements investigated included CPU-intensive vector and scalar processing, visualization, data storage, connectivity, and I/O peripherals. A review of computer industry directions and a market survey of computing hardware provided information regarding important industry standards and candidate computing platforms. It was determined that the total SCF computing requirements might be most effectively met using a hierarchy consisting of shared and individual resources. This hierarchy is composed of five major system types: (1) a supercomputer class vector processor; (2) a high-end scalar multiprocessor workstation; (3) a file server; (4) a few medium- to high-end visualization workstations; and (5) several low- to medium-range personal graphics workstations. Specific recommendations for meeting the needs of each of these types are presented.
Will Moores law be sufficient?
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeBenedictis, Erik P.
2004-07-01
It seems well understood that supercomputer simulation is an enabler for scientific discoveries, weapons, and other activities of value to society. It also seems widely believed that Moore's Law will make progressively more powerful supercomputers over time and thus enable more of these contributions. This paper seeks to add detail to these arguments, revealing them to be generally correct but not a smooth and effortless progression. This paper will review some key problems that can be solved with supercomputer simulation, showing that more powerful supercomputers will be useful up to a very high yet finite limit of around 1021 FLOPSmore » (1 Zettaflops) . The review will also show the basic nature of these extreme problems. This paper will review work by others showing that the theoretical maximum supercomputer power is very high indeed, but will explain how a straightforward extrapolation of Moore's Law will lead to technological maturity in a few decades. The power of a supercomputer at the maturity of Moore's Law will be very high by today's standards at 1016-1019 FLOPS (100 Petaflops to 10 Exaflops), depending on architecture, but distinctly below the level required for the most ambitious applications. Having established that Moore's Law will not be that last word in supercomputing, this paper will explore the nearer term issue of what a supercomputer will look like at maturity of Moore's Law. Our approach will quantify the maximum performance as permitted by the laws of physics for extension of current technology and then find a design that approaches this limit closely. We study a 'multi-architecture' for supercomputers that combines a microprocessor with other 'advanced' concepts and find it can reach the limits as well. This approach should be quite viable in the future because the microprocessor would provide compatibility with existing codes and programming styles while the 'advanced' features would provide a boost to the limits of performance.« less
Quantum storage of a photonic polarization qubit in a solid.
Gündoğan, Mustafa; Ledingham, Patrick M; Almasi, Attaallah; Cristiani, Matteo; de Riedmatten, Hugues
2012-05-11
We report on the quantum storage and retrieval of photonic polarization quantum bits onto and out of a solid state storage device. The qubits are implemented with weak coherent states at the single photon level, and are stored for a predetermined time of 500 ns in a praseodymium doped crystal with a storage and retrieval efficiency of 10%, using the atomic frequency comb scheme. We characterize the storage by using quantum state tomography, and find that the average conditional fidelity of the retrieved qubits exceeds 95% for a mean photon number μ=0.4. This is significantly higher than a classical benchmark, taking into account the poissonian statistics and finite memory efficiency, which proves that our crystal functions as a quantum storage device for polarization qubits. These results extend the storage capabilities of solid state quantum light matter interfaces to polarization encoding, which is widely used in quantum information science.
Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-01-01
This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-03-10
This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
ERIC Educational Resources Information Center
General Accounting Office, Washington, DC. Information Management and Technology Div.
This report was prepared in response to a request for information on supercomputers and high-speed networks from the Senate Committee on Commerce, Science, and Transportation, and the House Committee on Science, Space, and Technology. The following information was requested: (1) examples of how various industries are using supercomputers to…
Supercomputer Provides Molecular Insight into Cellulose (Fact Sheet)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
2011-02-01
Groundbreaking research at the National Renewable Energy Laboratory (NREL) has used supercomputing simulations to calculate the work that enzymes must do to deconstruct cellulose, which is a fundamental step in biomass conversion technologies for biofuels production. NREL used the new high-performance supercomputer Red Mesa to conduct several million central processing unit (CPU) hours of simulation.
GREEN SUPERCOMPUTING IN A DESKTOP BOX
DOE Office of Scientific and Technical Information (OSTI.GOV)
HSU, CHUNG-HSING; FENG, WU-CHUN; CHING, AVERY
2007-01-17
The computer workstation, introduced by Sun Microsystems in 1982, was the tool of choice for scientists and engineers as an interactive computing environment for the development of scientific codes. However, by the mid-1990s, the performance of workstations began to lag behind high-end commodity PCs. This, coupled with the disappearance of BSD-based operating systems in workstations and the emergence of Linux as an open-source operating system for PCs, arguably led to the demise of the workstation as we knew it. Around the same time, computational scientists started to leverage PCs running Linux to create a commodity-based (Beowulf) cluster that provided dedicatedmore » computer cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large supercomputers, i.e., supercomputing for the few. However, as the cluster movement has matured, with respect to cluster hardware and open-source software, these clusters have become much more like their large-scale supercomputing brethren - a shared (and power-hungry) datacenter resource that must reside in a machine-cooled room in order to operate properly. Consequently, the above observations, when coupled with the ever-increasing performance gap between the PC and cluster supercomputer, provide the motivation for a 'green' desktop supercomputer - a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation 1 'pizza box' workstation. In this paper, they present the hardware and software architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop supercomputer that achieves 14 Gflops on Linpack but sips only 185 watts of power at load, resulting in a performance-power ratio that is over 300% better than their reference SMP platform.« less
Benchmarking of MCNP for calculating dose rates at an interim storage facility for nuclear waste.
Heuel-Fabianek, Burkhard; Hille, Ralf
2005-01-01
During the operation of research facilities at Research Centre Jülich, Germany, nuclear waste is stored in drums and other vessels in an interim storage building on-site, which has a concrete shielding at the side walls. Owing to the lack of a well-defined source, measured gamma spectra were unfolded to determine the photon flux on the surface of the containers. The dose rate simulation, including the effects of skyshine, using the Monte Carlo transport code MCNP is compared with the measured dosimetric data at some locations in the vicinity of the interim storage building. The MCNP data for direct radiation confirm the data calculated using a point-kernel method. However, a comparison of the modelled dose rates for direct radiation and skyshine with the measured data demonstrate the need for a more precise definition of the source. Both the measured and the modelled dose rates verified the fact that the legal limits (<1 mSv a(-1)) are met in the area outside the perimeter fence of the storage building to which members of the public have access. Using container surface data (gamma spectra) to define the source may be a useful tool for practical calculations and additionally for benchmarking of computer codes if the discussed critical aspects with respect to the source can be addressed adequately.
Estimating the maximum potential revenue for grid connected electricity storage :
DOE Office of Scientific and Technical Information (OSTI.GOV)
Byrne, Raymond Harry; Silva Monroy, Cesar Augusto.
2012-12-01
The valuation of an electricity storage device is based on the expected future cash flow generated by the device. Two potential sources of income for an electricity storage system are energy arbitrage and participation in the frequency regulation market. Energy arbitrage refers to purchasing (stor- ing) energy when electricity prices are low, and selling (discharging) energy when electricity prices are high. Frequency regulation is an ancillary service geared towards maintaining system frequency, and is typically procured by the independent system operator in some type of market. This paper outlines the calculations required to estimate the maximum potential revenue from participatingmore » in these two activities. First, a mathematical model is presented for the state of charge as a function of the storage device parameters and the quantities of electricity purchased/sold as well as the quantities o ered into the regulation market. Using this mathematical model, we present a linear programming optimization approach to calculating the maximum potential revenue from an elec- tricity storage device. The calculation of the maximum potential revenue is critical in developing an upper bound on the value of storage, as a benchmark for evaluating potential trading strate- gies, and a tool for capital nance risk assessment. Then, we use historical California Independent System Operator (CAISO) data from 2010-2011 to evaluate the maximum potential revenue from the Tehachapi wind energy storage project, an American Recovery and Reinvestment Act of 2009 (ARRA) energy storage demonstration project. We investigate the maximum potential revenue from two di erent scenarios: arbitrage only and arbitrage combined with the regulation market. Our analysis shows that participation in the regulation market produces four times the revenue compared to arbitrage in the CAISO market using 2010 and 2011 data. Then we evaluate several trading strategies to illustrate how they compare to the maximum potential revenue benchmark. We conclude with a sensitivity analysis with respect to key parameters.« less
Abraham, Mark James; Murtola, Teemu; Schulz, Roland; ...
2015-07-15
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abraham, Mark James; Murtola, Teemu; Schulz, Roland
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
Input/output behavior of supercomputing applications
NASA Technical Reports Server (NTRS)
Miller, Ethan L.
1991-01-01
The collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations are described. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designer to optimize I/O hardware and file system algorithms to that model. Second, the buffering simulations show what resources are needed to maximize the CPU utilization of a supercomputer given a very bursty I/O request rate. By using read-ahead and write-behind in a large solid stated disk, one or two applications were sufficient to fully utilize a Cray Y-MP CPU.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.
Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers
Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
High performance Python for direct numerical simulations of turbulent flows
NASA Astrophysics Data System (ADS)
Mortensen, Mikael; Langtangen, Hans Petter
2016-06-01
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform.
Scaling a Convection-Resolving RCM to Near-Global Scales
NASA Astrophysics Data System (ADS)
Leutwyler, D.; Fuhrer, O.; Chadha, T.; Kwasniewski, G.; Hoefler, T.; Lapillonne, X.; Lüthi, D.; Osuna, C.; Schar, C.; Schulthess, T. C.; Vogt, H.
2017-12-01
In the recent years, first decade-long kilometer-scale resolution RCM simulations have been performed on continental-scale computational domains. However, the size of the planet Earth is still an order of magnitude larger and thus the computational implications of performing global climate simulations at this resolution are challenging. We explore the gap between the currently established RCM simulations and global simulations by scaling the GPU accelerated version of the COSMO model to a near-global computational domain. To this end, the evolution of an idealized moist baroclinic wave has been simulated over the course of 10 days with a grid spacing of up to 930 m. The computational mesh employs 36'000 x 16'001 x 60 grid points and covers 98.4% of the planet's surface. The code shows perfect weak scaling up to 4'888 Nodes of the Piz Daint supercomputer and yields 0.043 simulated years per day (SYPD) which is approximately one seventh of the 0.2-0.3 SYPD required to conduct AMIP-type simulations. However, at half the resolution (1.9 km) we've observed 0.23 SYPD. Besides formation of frontal precipitating systems containing embedded explicitly-resolved convective motions, the simulations reveal a secondary instability that leads to cut-off warm-core cyclonic vortices in the cyclone's core, once the grid spacing is refined to the kilometer scale. The explicit representation of embedded moist convection and the representation of the previously unresolved instabilities exhibit a physically different behavior in comparison to coarser-resolution simulations. The study demonstrates that global climate simulations using kilometer-scale resolution are imminent and serves as a baseline benchmark for global climate model applications and future exascale supercomputing systems.
Integration of Titan supercomputer at OLCF with ATLAS Production System
NASA Astrophysics Data System (ADS)
Barreiro Megino, F.; De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wells, J.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this paper we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job submission to Titan’s batch queues and local data management, with lightweight MPI wrappers to run single node workloads in parallel on Titan’s multi-core worker nodes. It provides for running of standard ATLAS production jobs on unused resources (backfill) on Titan. The system already allowed ATLAS to collect on Titan millions of core-hours per month, execute hundreds of thousands jobs, while simultaneously improving Titans utilization efficiency. We will discuss the details of the implementation, current experience with running the system, as well as future plans aimed at improvements in scalability and efficiency. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Arithmetic Data Cube as a Data Intensive Benchmark
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Shabano, Leonid
2003-01-01
Data movement across computational grids and across memory hierarchy of individual grid machines is known to be a limiting factor for application involving large data sets. In this paper we introduce the Data Cube Operator on an Arithmetic Data Set which we call Arithmetic Data Cube (ADC). We propose to use the ADC to benchmark grid capabilities to handle large distributed data sets. The ADC stresses all levels of grid memory by producing 2d views of an Arithmetic Data Set of d-tuples described by a small number of parameters. We control data intensity of the ADC by controlling the sizes of the views through choice of the tuple parameters.
Energy storage arbitrage under day-ahead and real-time price uncertainty
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnamurthy, Dheepak; Uckun, Canan; Zhou, Zhi
Electricity markets must match real-time supply and demand of electricity. With increasing penetration of renewable resources, it is important that this balancing is done effectively, considering the high uncertainty of wind and solar energy. Storing electrical energy can make the grid more reliable and efficient and energy storage is proposed as a complement to highly variable renewable energy sources. However, for investments in energy storage to increase, participating in the market must become economically viable for owners. This paper proposes a stochastic formulation of a storage owner’s arbitrage profit maximization problem under uncertainty in day-ahead (DA) and real-time (RT) marketmore » prices. The proposed model helps storage owners in market bidding and operational decisions and in estimation of the economic viability of energy storage. Finally, case study results on realistic market price data show that the novel stochastic bidding approach does significantly better than the deterministic benchmark.« less
Energy storage arbitrage under day-ahead and real-time price uncertainty
Krishnamurthy, Dheepak; Uckun, Canan; Zhou, Zhi; ...
2017-04-04
Electricity markets must match real-time supply and demand of electricity. With increasing penetration of renewable resources, it is important that this balancing is done effectively, considering the high uncertainty of wind and solar energy. Storing electrical energy can make the grid more reliable and efficient and energy storage is proposed as a complement to highly variable renewable energy sources. However, for investments in energy storage to increase, participating in the market must become economically viable for owners. This paper proposes a stochastic formulation of a storage owner’s arbitrage profit maximization problem under uncertainty in day-ahead (DA) and real-time (RT) marketmore » prices. The proposed model helps storage owners in market bidding and operational decisions and in estimation of the economic viability of energy storage. Finally, case study results on realistic market price data show that the novel stochastic bidding approach does significantly better than the deterministic benchmark.« less
Prospects for Boiling of Subcooled Dielectric Liquids for Supercomputer Cooling
NASA Astrophysics Data System (ADS)
Zeigarnik, Yu. A.; Vasil'ev, N. V.; Druzhinin, E. A.; Kalmykov, I. V.; Kosoi, A. S.; Khodakov, K. A.
2018-02-01
It is shown experimentally that using forced-convection boiling of dielectric coolants of the Novec 649 Refrigerant subcooled relative to the saturation temperature makes possible removing heat flow rates up to 100 W/cm2 from modern supercomputer chip interface. This fact creates prerequisites for the application of dielectric liquids in cooling systems of modern supercomputers with increased requirements for their operating reliability.
NASA Astrophysics Data System (ADS)
Shokrollahpour, Elsa; Hosseinzadeh Lotfi, Farhad; Zandieh, Mostafa
2016-06-01
Efficiency and quality of services are crucial to today's banking industries. The competition in this section has become increasingly intense, as a result of fast improvements in Technology. Therefore, performance analysis of the banking sectors attracts more attention these days. Even though data envelopment analysis (DEA) is a pioneer approach in the literature as of an efficiency measurement tool and finding benchmarks, it is on the other hand unable to demonstrate the possible future benchmarks. The drawback to it could be that the benchmarks it provides us with, may still be less efficient compared to the more advanced future benchmarks. To cover for this weakness, artificial neural network is integrated with DEA in this paper to calculate the relative efficiency and more reliable benchmarks of one of the Iranian commercial bank branches. Therefore, each branch could have a strategy to improve the efficiency and eliminate the cause of inefficiencies based on a 5-year time forecast.
A survey of CPU-GPU heterogeneous computing techniques
Mittal, Sparsh; Vetter, Jeffrey S.
2015-07-04
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
Pc as Physics Computer for Lhc ?
NASA Astrophysics Data System (ADS)
Jarp, Sverre; Simmins, Antony; Tang, Hong; Yaari, R.
In the last five years, we have seen RISC workstations take over the computing scene that was once controlled by mainframes and supercomputers. In this paper we will argue that the same phenomenon might happen again. A project, active since March this year in the Physics Data Processing group, of CERN's CN division is described where ordinary desktop PCs running Windows (NT and 3.11) have been used for creating an environment for running large LHC batch jobs (initially the DICE simulation job of Atlas). The problems encountered in porting both the CERN library and the specific Atlas codes are described together with some encouraging benchmark results when comparing to existing RISC workstations in use by the Atlas collaboration. The issues of establishing the batch environment (Batch monitor, staging software, etc.) are also covered. Finally a quick extrapolation of commodity computing power available in the future is touched upon to indicate what kind of cost envelope could be sufficient for the simulation farms required by the LHC experiments.
A survey of CPU-GPU heterogeneous computing techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S.
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
Spectral Element Method for the Simulation of Unsteady Compressible Flows
NASA Technical Reports Server (NTRS)
Diosady, Laslo Tibor; Murman, Scott M.
2013-01-01
This work uses a discontinuous-Galerkin spectral-element method (DGSEM) to solve the compressible Navier-Stokes equations [1{3]. The inviscid ux is computed using the approximate Riemann solver of Roe [4]. The viscous fluxes are computed using the second form of Bassi and Rebay (BR2) [5] in a manner consistent with the spectral-element approximation. The method of lines with the classical 4th-order explicit Runge-Kutta scheme is used for time integration. Results for polynomial orders up to p = 15 (16th order) are presented. The code is parallelized using the Message Passing Interface (MPI). The computations presented in this work are performed using the Sandy Bridge nodes of the NASA Pleiades supercomputer at NASA Ames Research Center. Each Sandy Bridge node consists of 2 eight-core Intel Xeon E5-2670 processors with a clock speed of 2.6Ghz and 2GB per core memory. On a Sandy Bridge node the Tau Benchmark [6] runs in a time of 7.6s.
National Test Facility civilian agency use of supercomputers not feasible
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1994-12-01
Based on interviews with civilian agencies cited in the House report (DOE, DoEd, HHS, FEMA, NOAA), none would be able to make effective use of NTF`s excess supercomputing capabilities. These agencies stated they could not use the resources primarily because (1) NTF`s supercomputers are older machines whose performance and costs cannot match those of more advanced computers available from other sources and (2) some agencies have not yet developed applications requiring supercomputer capabilities or do not have funding to support such activities. In addition, future support for the hardware and software at NTF is uncertain, making any investment by anmore » outside user risky.« less
Kriging for Spatial-Temporal Data on the Bridges Supercomputer
NASA Astrophysics Data System (ADS)
Hodgess, E. M.
2017-12-01
Currently, kriging of spatial-temporal data is slow and limited to relatively small vector sizes. We have developed a method on the Bridges supercomputer, at the Pittsburgh supercomputer center, which uses a combination of the tools R, Fortran, the Message Passage Interface (MPI), OpenACC, and special R packages for big data. This combination of tools now permits us to complete tasks which could previously not be completed, or takes literally hours to complete. We ran simulation studies from a laptop against the supercomputer. We also look at "real world" data sets, such as the Irish wind data, and some weather data. We compare the timings. We note that the timings are suprising good.
Multiple DNA and protein sequence alignment on a workstation and a supercomputer.
Tajima, K
1988-11-01
This paper describes a multiple alignment method using a workstation and supercomputer. The method is based on the alignment of a set of aligned sequences with the new sequence, and uses a recursive procedure of such alignment. The alignment is executed in a reasonable computation time on diverse levels from a workstation to a supercomputer, from the viewpoint of alignment results and computational speed by parallel processing. The application of the algorithm is illustrated by several examples of multiple alignment of 12 amino acid and DNA sequences of HIV (human immunodeficiency virus) env genes. Colour graphic programs on a workstation and parallel processing on a supercomputer are discussed.
Development of a Cloud Resolving Model for Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Sreepathi, S.; Norman, M. R.; Pal, A.; Hannah, W.; Ponder, C.
2017-12-01
A cloud resolving climate model is needed to reduce major systematic errors in climate simulations due to structural uncertainty in numerical treatments of convection - such as convective storm systems. This research describes the porting effort to enable SAM (System for Atmosphere Modeling) cloud resolving model on heterogeneous supercomputers using GPUs (Graphical Processing Units). We have isolated a standalone configuration of SAM that is targeted to be integrated into the DOE ACME (Accelerated Climate Modeling for Energy) Earth System model. We have identified key computational kernels from the model and offloaded them to a GPU using the OpenACC programming model. Furthermore, we are investigating various optimization strategies intended to enhance GPU utilization including loop fusion/fission, coalesced data access and loop refactoring to a higher abstraction level. We will present early performance results, lessons learned as well as optimization strategies. The computational platform used in this study is the Summitdev system, an early testbed that is one generation removed from Summit, the next leadership class supercomputer at Oak Ridge National Laboratory. The system contains 54 nodes wherein each node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. This work is part of a larger project, ACME-MMF component of the U.S. Department of Energy(DOE) Exascale Computing Project. The ACME-MMF approach addresses structural uncertainty in cloud processes by replacing traditional parameterizations with cloud resolving "superparameterization" within each grid cell of global climate model. Super-parameterization dramatically increases arithmetic intensity, making the MMF approach an ideal strategy to achieve good performance on emerging exascale computing architectures. The goal of the project is to integrate superparameterization into ACME, and explore its full potential to scientifically and computationally advance climate simulation and prediction.
NASA Center for Climate Simulation (NCCS) Presentation
NASA Technical Reports Server (NTRS)
Webster, William P.
2012-01-01
The NASA Center for Climate Simulation (NCCS) offers integrated supercomputing, visualization, and data interaction technologies to enhance NASA's weather and climate prediction capabilities. It serves hundreds of users at NASA Goddard Space Flight Center, as well as other NASA centers, laboratories, and universities across the US. Over the past year, NCCS has continued expanding its data-centric computing environment to meet the increasingly data-intensive challenges of climate science. We doubled our Discover supercomputer's peak performance to more than 800 teraflops by adding 7,680 Intel Xeon Sandy Bridge processor-cores and most recently 240 Intel Xeon Phi Many Integrated Core (MIG) co-processors. A supercomputing-class analysis system named Dali gives users rapid access to their data on Discover and high-performance software including the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT), with interfaces from user desktops and a 17- by 6-foot visualization wall. NCCS also is exploring highly efficient climate data services and management with a new MapReduce/Hadoop cluster while augmenting its data distribution to the science community. Using NCCS resources, NASA completed its modeling contributions to the Intergovernmental Panel on Climate Change (IPCG) Fifth Assessment Report this summer as part of the ongoing Coupled Modellntercomparison Project Phase 5 (CMIP5). Ensembles of simulations run on Discover reached back to the year 1000 to test model accuracy and projected climate change through the year 2300 based on four different scenarios of greenhouse gases, aerosols, and land use. The data resulting from several thousand IPCC/CMIP5 simulations, as well as a variety of other simulation, reanalysis, and observationdatasets, are available to scientists and decision makers through an enhanced NCCS Earth System Grid Federation Gateway. Worldwide downloads have totaled over 110 terabytes of data.
Singular boundary method for wave propagation analysis in periodic structures
NASA Astrophysics Data System (ADS)
Fu, Zhuojia; Chen, Wen; Wen, Pihua; Zhang, Chuanzeng
2018-07-01
A strong-form boundary collocation method, the singular boundary method (SBM), is developed in this paper for the wave propagation analysis at low and moderate wavenumbers in periodic structures. The SBM is of several advantages including mathematically simple, easy-to-program, meshless with the application of the concept of origin intensity factors in order to eliminate the singularity of the fundamental solutions and avoid the numerical evaluation of the singular integrals in the boundary element method. Due to the periodic behaviors of the structures, the SBM coefficient matrix can be represented as a block Toeplitz matrix. By employing three different fast Toeplitz-matrix solvers, the computational time and storage requirements are significantly reduced in the proposed SBM analysis. To demonstrate the effectiveness of the proposed SBM formulation for wave propagation analysis in periodic structures, several benchmark examples are presented and discussed The proposed SBM results are compared with the analytical solutions, the reference results and the COMSOL software.
NASA Technical Reports Server (NTRS)
Kutler, Paul; Yee, Helen
1987-01-01
Topics addressed include: numerical aerodynamic simulation; computational mechanics; supercomputers; aerospace propulsion systems; computational modeling in ballistics; turbulence modeling; computational chemistry; computational fluid dynamics; and computational astrophysics.
Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
2015-04-08
The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
Accelerating cardiac bidomain simulations using graphics processing units.
Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G
2012-08-01
Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.
Accelerating Cardiac Bidomain Simulations Using Graphics Processing Units
Neic, Aurel; Liebmann, Manfred; Hoetzl, Elena; Mitchell, Lawrence; Vigmond, Edward J.; Haase, Gundolf
2013-01-01
Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6–20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20GPUs, 476 CPU cores were required on a national supercomputing facility. PMID:22692867
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Omega Hawaii Antenna System: Modification and Validation Tests. Volume 2. Data Sheets.
1979-10-19
a benchmark because of potential hotel construction . DS 5-1 DATA SHEET 5 (DS-5) RADIO FIELD INTENSITY MEASUREMENTS OMEGA STATION: HAWAII SITE NO. C 1A...27.5 1008 11.05 26.5 1007 Ft 11.80 28.1 COMMENT Not considered for a benchmark because of potential hotel construction . DS 5-5 DATA SHEET 5 (DS-5) RADIO
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL.
Stone, John E; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-05-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications.
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL
Stone, John E.; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-01-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications. PMID:27747137
ERIC Educational Resources Information Center
Bitran, Stella; Morissette, Sandra B.; Spiegel, David A.; Barlow, David H.
2008-01-01
This report presents results of a treatment for panic disorder with moderate to severe agoraphobia (PDA-MS) called sensation-focused intensive treatment (SFIT). SFIT is an 8-day intensive treatment that combines features of cognitive-behavioral treatment for panic disorder, such as interoceptive exposure and cognitive restructuring with ungraded…
Desktop supercomputer: what can it do?
NASA Astrophysics Data System (ADS)
Bogdanov, A.; Degtyarev, A.; Korkhov, V.
2017-12-01
The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.
Information technologies for astrophysics circa 2001
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1990-01-01
It is easy to extrapolate current trends to see where technologies relating to information systems in astrophysics and other disciplines will be by the end of the decade. These technologies include mineaturization, multiprocessing, software technology, networking, databases, graphics, pattern computation, and interdisciplinary studies. It is easy to see what limits our current paradigms place on our thinking about technologies that will allow us to understand the laws governing very large systems about which we have large datasets. Three limiting paradigms are saving all the bits collected by instruments or generated by supercomputers; obtaining technology for information compression, storage and retrieval off the shelf; and the linear mode of innovation. We must extend these paradigms to meet our goals for information technology at the end of the decade.
Final Report for File System Support for Burst Buffers on HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, W.; Mohror, K.
Distributed burst buffers are a promising storage architecture for handling I/O workloads for exascale computing. As they are being deployed on more supercomputers, a file system that efficiently manages these burst buffers for fast I/O operations carries great consequence. Over the past year, FSU team has undertaken several efforts to design, prototype and evaluate distributed file systems for burst buffers on HPC systems. These include MetaKV: a Key-Value Store for Metadata Management of Distributed Burst Buffers, a user-level file system with multiple backends, and a specialized file system for large datasets of deep neural networks. Our progress for these respectivemore » efforts are elaborated further in this report.« less
NASA Astrophysics Data System (ADS)
Kaskhedikar, Apoorva Prakash
According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI pertinent to the building type. The ability to identify and rank the important variables is of great importance in practical implementation of the benchmarking tools which rely on query-based building and HVAC variable filters specified by the user.
Grand challenges in mass storage: A systems integrators perspective
NASA Technical Reports Server (NTRS)
Lee, Richard R.; Mintz, Daniel G.
1993-01-01
Within today's much ballyhooed supercomputing environment, with its CFLOPS of CPU power, and Gigabit networks, there exists a major roadblock to computing success; that of Mass Storage. The solution to this mass storage problem is considered to be one of the 'Grand Challenges' facing the computer industry today, as well as long into the future. It has become obvious to us, as well as many others in the industry, that there is no clear single solution in sight. The Systems Integrator today is faced with a myriad of quandaries in approaching this challenge. He must first be innovative in approach, second choose hardware solutions that are volumetric efficient; high in signal bandwidth; available from multiple sources; competitively priced, and have forward growth extendibility. In addition he must also comply with a variety of mandated, and often conflicting software standards (GOSIP, POSIX, IEEE, MSRM 4.0, and others), and finally he must deliver a systems solution with the 'most bang for the buck' in terms of cost vs. performance factors. These quandaries challenge the Systems Integrator to 'push the envelope' in terms of his or her ingenuity and innovation on an almost daily basis. This dynamic is explored further, and an attempt to acquaint the audience with rational approaches to this 'Grand Challenge' is made.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahrens, James P; Patchett, John M; Lo, Li - Ta
2011-01-24
This report provides documentation for the completion of the Los Alamos portion of the ASC Level II 'Visualization on the Supercomputing Platform' milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratory and Los Alamos National Laboratory. The milestone text is shown in Figure 1 with the Los Alamos portions highlighted in boldfaced text. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is the most computationally intensive portion of the visualization process. Formore » terascale platforms, commodity clusters with graphics processors (GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the perfromance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. In conclusion, we improved CPU-based rendering performance by a a factor of 2-10 times on our tests. In addition, we evaluated CPU and CPU-based rendering performance. We encourage production visualization experts to consider using CPU-based rendering solutions when it is appropriate. For example, on remote supercomputers CPU-based rendering can offer a means of viewing data without having to offload the data or geometry onto a CPU-based visualization system. In terms of comparative performance of the CPU and CPU we believe that further optimizations of the performance of both CPU or CPU-based rendering are possible. The simulation community is currently confronting this reality as they work to port their simulations to different hardware architectures. What is interesting about CPU rendering of massive datasets is that for part two decades CPU performance has significantly outperformed CPU-based systems. Based on our advancements, evaluations and explorations we believe that CPU-based rendering has returned as one viable option for the visualization of massive datasets.« less
Color graphics, interactive processing, and the supercomputer
NASA Technical Reports Server (NTRS)
Smith-Taylor, Rudeen
1987-01-01
The development of a common graphics environment for the NASA Langley Research Center user community and the integration of a supercomputer into this environment is examined. The initial computer hardware, the software graphics packages, and their configurations are described. The addition of improved computer graphics capability to the supercomputer, and the utilization of the graphic software and hardware are discussed. Consideration is given to the interactive processing system which supports the computer in an interactive debugging, processing, and graphics environment.
Automated Help System For A Supercomputer
NASA Technical Reports Server (NTRS)
Callas, George P.; Schulbach, Catherine H.; Younkin, Michael
1994-01-01
Expert-system software developed to provide automated system of user-helping displays in supercomputer system at Ames Research Center Advanced Computer Facility. Users located at remote computer terminals connected to supercomputer and each other via gateway computers, local-area networks, telephone lines, and satellite links. Automated help system answers routine user inquiries about how to use services of computer system. Available 24 hours per day and reduces burden on human experts, freeing them to concentrate on helping users with complicated problems.
Remini, Hocine; Mertz, Christian; Belbahi, Amine; Achir, Nawel; Dornier, Manuel; Madani, Khodir
2015-04-15
The stability of ascorbic acid and colour intensity in pasteurised blood orange juice (Citrus sinensis [L.] Osbeck) during one month of storage was investigated at 4-37 °C. The effects of ascorbic acid fortification (at 100, 200 mg L(-1)) and deaeration, temperature/time storage on the kinetic behaviour were determined. Ascorbic acid was monitored by HPLC-DAD and colour intensity by spectrophotometric measurements. Degradation kinetics were best fitted by first-order reaction models for both ascorbic acid and colour intensity. Three models (Arrhenius, Eyring and Ball) were used to assess the temperature-dependent degradation. Following the Arrhenius model, activation energies were ranged from 51 to 135 kJ mol(-1) for ascorbic acid and from 49 to 99 kJ mol(-1) for colour intensity. The effect of storage temperature and deaeration are the most influent factors on kinetics degradation, while the fortification revealed no significant effect on ascorbic acid content and colour intensity. Copyright © 2014 Elsevier Ltd. All rights reserved.
Benchmarking NNWSI flow and transport codes: COVE 1 results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hayden, N.K.
1985-06-01
The code verification (COVE) activity of the Nevada Nuclear Waste Storage Investigations (NNWSI) Project is the first step in certification of flow and transport codes used for NNWSI performance assessments of a geologic repository for disposing of high-level radioactive wastes. The goals of the COVE activity are (1) to demonstrate and compare the numerical accuracy and sensitivity of certain codes, (2) to identify and resolve problems in running typical NNWSI performance assessment calculations, and (3) to evaluate computer requirements for running the codes. This report describes the work done for COVE 1, the first step in benchmarking some of themore » codes. Isothermal calculations for the COVE 1 benchmarking have been completed using the hydrologic flow codes SAGUARO, TRUST, and GWVIP; the radionuclide transport codes FEMTRAN and TRUMP; and the coupled flow and transport code TRACR3D. This report presents the results of three cases of the benchmarking problem solved for COVE 1, a comparison of the results, questions raised regarding sensitivities to modeling techniques, and conclusions drawn regarding the status and numerical sensitivities of the codes. 30 refs.« less
NASA Advanced Supercomputing (NAS) User Services Group
NASA Technical Reports Server (NTRS)
Pandori, John; Hamilton, Chris; Niggley, C. E.; Parks, John W. (Technical Monitor)
2002-01-01
This viewgraph presentation provides an overview of NAS (NASA Advanced Supercomputing), its goals, and its mainframe computer assets. Also covered are its functions, including systems monitoring and technical support.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baxter, Richard
Project financing is emerging as the linchpin for the future health, direction, and momentum of the energy storage industry. Market leaders have so far relied on selffunding or captive lending arrangements to fund projects. New lenders are proceeding hesitantly as they lack a full understanding of the technology, business, and credit risks involved in this rapidly changing market. The U.S. Department of Energy is poised to play a critical role in expanding access to capital by reducing the barriers to entry for new lenders, and providing trusted analytical benchmarks to better judge and price the risk in systematic ways.
Thermal modeling with solid/liquid phase change of the thermal energy storage experiment
NASA Technical Reports Server (NTRS)
Skarda, J. Raymond Lee
1991-01-01
A thermal model which simulates combined conduction and phase change characteristics of thermal energy storage (TES) materials is presented. Both the model and results are presented for the purpose of benchmarking the conduction and phase change capabilities of recently developed and unvalidated microgravity TES computer programs. Specifically, operation of TES-1 is simulated. A two-dimensional SINDA85 model of the TES experiment in cylindrical coordinates was constructed. The phase change model accounts for latent heat stored in, or released from, a node undergoing melting and freezing.
Memory for light as a quantum process.
Lobino, M; Kupchak, C; Figueroa, E; Lvovsky, A I
2009-05-22
We report complete characterization of an optical memory based on electromagnetically induced transparency. We recover the superoperator associated with the memory, under two different working conditions, by means of a quantum process tomography technique that involves storage of coherent states and their characterization upon retrieval. In this way, we can predict the quantum state retrieved from the memory for any input, for example, the squeezed vacuum or the Fock state. We employ the acquired superoperator to verify the nonclassicality benchmark for the storage of a Gaussian distributed set of coherent states.
NSF Commits to Supercomputers.
ERIC Educational Resources Information Center
Waldrop, M. Mitchell
1985-01-01
The National Science Foundation (NSF) has allocated at least $200 million over the next five years to support four new supercomputer centers. Issues and trends related to this NSF initiative are examined. (JN)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Monti, Henri; Butt, Ali R; Vazhkudai, Sudharshan S
2010-04-01
Innovative scientific applications and emerging dense data sources are creating a data deluge for high-end computing systems. Processing such large input data typically involves copying (or staging) onto the supercomputer's specialized high-speed storage, scratch space, for sustained high I/O throughput. The current practice of conservatively staging data as early as possible makes the data vulnerable to storage failures, which may entail re-staging and consequently reduced job throughput. To address this, we present a timely staging framework that uses a combination of job startup time predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job start-up.more » By delaying staging to when it is necessary, the exposure to failures and its effects can be reduced. Evaluation using both PlanetLab and simulations based on three years of Jaguar (No. 1 in Top500) job logs show as much as 85.9% reduction in staging times compared to direct transfers, 75.2% reduction in wait time on scratch, and 2.4% reduction in usage/hour.« less
Beam diagnostics at high-intensity storage rings
NASA Astrophysics Data System (ADS)
Plum, Mike
1994-10-01
Beam diagnostics at high-intensity facilities feature their own special set of problems and characteristics, issues peculiar to high-intensity storage rings include beam loss, beam halos, extraction efficiency, beam in the gap, clearing electrodes, and beam-profile measurement. The Los Alamos Proton Storage Ring (PSR) is a nice example of a high-intensity storage ring. I will discuss in some detail three diagnostic systems currently in use at the PSR: the beam-loss-monitor system, the electron-clearing system, and the beam-in-the-gap monitor. Much of our discussion is inspired by the problems we have encountered and the useful things we have learned while commissioning and developing the PSR. Another inspiration is our work on the next-generation neutron-spallation source, also known as the National Center for Neutron Research (NCNR).
Mira: Argonne's 10-petaflops supercomputer
Papka, Michael; Coghlan, Susan; Isaacs, Eric; Peters, Mark; Messina, Paul
2018-02-13
Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Adventures in Computational Grids
NASA Technical Reports Server (NTRS)
Walatka, Pamela P.; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Sometimes one supercomputer is not enough. Or your local supercomputers are busy, or not configured for your job. Or you don't have any supercomputers. You might be trying to simulate worldwide weather changes in real time, requiring more compute power than you could get from any one machine. Or you might be collecting microbiological samples on an island, and need to examine them with a special microscope located on the other side of the continent. These are the times when you need a computational grid.
Mira: Argonne's 10-petaflops supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Papka, Michael; Coghlan, Susan; Isaacs, Eric
2013-07-03
Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Breakthrough: NETL's Simulation-Based Engineering User Center (SBEUC)
Guenther, Chris
2018-05-23
The National Energy Technology Laboratory relies on supercomputers to develop many novel ideas that become tomorrow's energy solutions. Supercomputers provide a cost-effective, efficient platform for research and usher technologies into widespread use faster to bring benefits to the nation. In 2013, Secretary of Energy Dr. Ernest Moniz dedicated NETL's new supercomputer, the Simulation Based Engineering User Center, or SBEUC. The SBEUC is dedicated to fossil energy research and is a collaborative tool for all of NETL and our regional university partners.
A high level language for a high performance computer
NASA Technical Reports Server (NTRS)
Perrott, R. H.
1978-01-01
The proposed computational aerodynamic facility will join the ranks of the supercomputers due to its architecture and increased execution speed. At present, the languages used to program these supercomputers have been modifications of programming languages which were designed many years ago for sequential machines. A new programming language should be developed based on the techniques which have proved valuable for sequential programming languages and incorporating the algorithmic techniques required for these supercomputers. The design objectives for such a language are outlined.
Technology advances and market forces: Their impact on high performance architectures
NASA Technical Reports Server (NTRS)
Best, D. R.
1978-01-01
Reasonable projections into future supercomputer architectures and technology require an analysis of the computer industry market environment, the current capabilities and trends within the component industry, and the research activities on computer architecture in the industrial and academic communities. Management, programmer, architect, and user must cooperate to increase the efficiency of supercomputer development efforts. Care must be taken to match the funding, compiler, architecture and application with greater attention to testability, maintainability, reliability, and usability than supercomputer development programs of the past.
Floating point arithmetic in future supercomputers
NASA Technical Reports Server (NTRS)
Bailey, David H.; Barton, John T.; Simon, Horst D.; Fouts, Martin J.
1989-01-01
Considerations in the floating-point design of a supercomputer are discussed. Particular attention is given to word size, hardware support for extended precision, format, and accuracy characteristics. These issues are discussed from the perspective of the Numerical Aerodynamic Simulation Systems Division at NASA Ames. The features believed to be most important for a future supercomputer floating-point design include: (1) a 64-bit IEEE floating-point format with 11 exponent bits, 52 mantissa bits, and one sign bit and (2) hardware support for reasonably fast double-precision arithmetic.
Breakthrough: NETL's Simulation-Based Engineering User Center (SBEUC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guenther, Chris
The National Energy Technology Laboratory relies on supercomputers to develop many novel ideas that become tomorrow's energy solutions. Supercomputers provide a cost-effective, efficient platform for research and usher technologies into widespread use faster to bring benefits to the nation. In 2013, Secretary of Energy Dr. Ernest Moniz dedicated NETL's new supercomputer, the Simulation Based Engineering User Center, or SBEUC. The SBEUC is dedicated to fossil energy research and is a collaborative tool for all of NETL and our regional university partners.
Tracing Scientific Facilities through the Research Literature Using Persistent Identifiers
NASA Astrophysics Data System (ADS)
Mayernik, M. S.; Maull, K. E.
2016-12-01
Tracing persistent identifiers to their source publications is an easy task when authors use them, since it is a simple matter of matching the persistent identifier to the specific text string of the identifier. However, trying to understand if a publication uses the resource behind an identifier when such identifier is not referenced explicitly is a harder task. In this research, we explore the effectiveness of alternative strategies of associating publications with uses of the resource referenced by an identifier when it may not be explicit. This project is explored within the context of the NCAR supercomputer, where we are broadly interesting in the science that can be traced to the usage of the NCAR supercomputing facility, by way of the peer-reviewed research publications that utilize and reference it. In this project we explore several ways of drawing linkages between publications and the NCAR supercomputing resources. Identifying and compiling peer-reviewed publications related to NCAR supercomputer usage are explored via three sources: 1) User-supplied publications gathered through a community survey, 2) publications that were identified via manual searching of the Google scholar search index, and 3) publications associated with National Science Foundation (NSF) grants extracted from a public NSF database. These three sources represent three styles of collecting information about publications that likely imply usage of the NCAR supercomputing facilities. Each source has strengths and weaknesses, thus our discussion will explore how our publication identification and analysis methods vary in terms of accuracy, reliability, and effort. We will also discuss strategies for enabling more efficient tracing of research impacts of supercomputing facilities going forward through the assignment of a persistent web identifier to the NCAR supercomputer. While this solution has potential to greatly enhance our ability to trace the use of the facility through publications, authors must cite the facility consistently. It is therefore necessary to provide recommendations for citation and attribution behavior, and we will conclude our discussion with how such recommendations have improved tracing the supercomputer facility allowing for more consistent and widespread measurement of its impact.
Robotic tape library system level testing at NSA: Present and planned
NASA Technical Reports Server (NTRS)
Shields, Michael F.
1994-01-01
In the present of declining Defense budgets, increased pressure has been placed on the DOD to utilize Commercial Off the Shelf (COTS) solutions to incrementally solve a wide variety of our computer processing requirements. With the rapid growth in processing power, significant expansion of high performance networking, and the increased complexity of applications data sets, the requirement for high performance, large capacity, reliable and secure, and most of all affordable robotic tape storage libraries has greatly increased. Additionally, the migration to a heterogeneous, distributed computing environment has further complicated the problem. With today's open system compute servers approaching yesterday's supercomputer capabilities, the need for affordable, reliable secure Mass Storage Systems (MSS) has taken on an ever increasing importance to our processing center's ability to satisfy operational mission requirements. To that end, NSA has established an in-house capability to acquire, test, and evaluate COTS products. Its goal is to qualify a set of COTS MSS libraries, thereby achieving a modicum of standardization for robotic tape libraries which can satisfy our low, medium, and high performance file and volume serving requirements. In addition, NSA has established relations with other Government Agencies to complete this in-house effort and to maximize our research, testing, and evaluation work. While the preponderance of the effort is focused at the high end of the storage ladder, considerable effort will be extended this year and next at the server class or mid range storage systems.
Energy Efficient Supercomputing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anypas, Katie
2014-10-17
Katie Anypas, Head of NERSC's Services Department discusses the Lab's research into developing increasingly powerful and energy efficient supercomputers at our '8 Big Ideas' Science at the Theater event on October 8th, 2014, in Oakland, California.
Energy Efficient Supercomputing
Anypas, Katie
2018-05-07
Katie Anypas, Head of NERSC's Services Department discusses the Lab's research into developing increasingly powerful and energy efficient supercomputers at our '8 Big Ideas' Science at the Theater event on October 8th, 2014, in Oakland, California.
Job Management Requirements for NAS Parallel Systems and Clusters
NASA Technical Reports Server (NTRS)
Saphir, William; Tanner, Leigh Ann; Traversat, Bernard
1995-01-01
A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
Data Intensive Systems (DIS) Benchmark Performance Summary
2003-08-01
models assumed by today’s conventional architectures. Such applications include model- based Automatic Target Recognition (ATR), synthetic aperture...radar (SAR) codes, large scale dynamic databases/battlefield integration, dynamic sensor- based processing, high-speed cryptanalysis, high speed...distributed interactive and data intensive simulations, data-oriented problems characterized by pointer- based and other highly irregular data structures
Approaching the exa-scale: a real-world evaluation of rendering extremely large data sets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patchett, John M; Ahrens, James P; Lo, Li - Ta
2010-10-15
Extremely large scale analysis is becoming increasingly important as supercomputers and their simulations move from petascale to exascale. The lack of dedicated hardware acceleration for rendering on today's supercomputing platforms motivates our detailed evaluation of the possibility of interactive rendering on the supercomputer. In order to facilitate our understanding of rendering on the supercomputing platform, we focus on scalability of rendering algorithms and architecture envisioned for exascale datasets. To understand tradeoffs for dealing with extremely large datasets, we compare three different rendering algorithms for large polygonal data: software based ray tracing, software based rasterization and hardware accelerated rasterization. We presentmore » a case study of strong and weak scaling of rendering extremely large data on both GPU and CPU based parallel supercomputers using Para View, a parallel visualization tool. Wc use three different data sets: two synthetic and one from a scientific application. At an extreme scale, algorithmic rendering choices make a difference and should be considered while approaching exascale computing, visualization, and analysis. We find software based ray-tracing offers a viable approach for scalable rendering of the projected future massive data sizes.« less
40 CFR 141.721 - Reporting requirements.
Code of Federal Regulations, 2010 CFR
2010-07-01
... § 141.702 and source water monitoring results under § 141.706 unless they notify the State that they... report the use of uncovered finished water storage facilities to the State as described in § 141.714. (c...) Systems must report disinfection profiles and benchmarks to the State as described in §§ 141.708 through...
Supercomputing Drives Innovation - Continuum Magazine | NREL
years, NREL scientists have used supercomputers to simulate 3D models of the primary enzymes and Scientist, discuss a 3D model of wind plant aerodynamics, showing low velocity wakes and impact on
Accurate quantum chemical calculations
NASA Technical Reports Server (NTRS)
Bauschlicher, Charles W., Jr.; Langhoff, Stephen R.; Taylor, Peter R.
1989-01-01
An important goal of quantum chemical calculations is to provide an understanding of chemical bonding and molecular electronic structure. A second goal, the prediction of energy differences to chemical accuracy, has been much harder to attain. First, the computational resources required to achieve such accuracy are very large, and second, it is not straightforward to demonstrate that an apparently accurate result, in terms of agreement with experiment, does not result from a cancellation of errors. Recent advances in electronic structure methodology, coupled with the power of vector supercomputers, have made it possible to solve a number of electronic structure problems exactly using the full configuration interaction (FCI) method within a subspace of the complete Hilbert space. These exact results can be used to benchmark approximate techniques that are applicable to a wider range of chemical and physical problems. The methodology of many-electron quantum chemistry is reviewed. Methods are considered in detail for performing FCI calculations. The application of FCI methods to several three-electron problems in molecular physics are discussed. A number of benchmark applications of FCI wave functions are described. Atomic basis sets and the development of improved methods for handling very large basis sets are discussed: these are then applied to a number of chemical and spectroscopic problems; to transition metals; and to problems involving potential energy surfaces. Although the experiences described give considerable grounds for optimism about the general ability to perform accurate calculations, there are several problems that have proved less tractable, at least with current computer resources, and these and possible solutions are discussed.
Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarje, Abhinav; Jacobsen, Douglas W.; Williams, Samuel W.
The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.
Supercomputer algorithms for efficient linear octree encoding of three-dimensional brain images.
Berger, S B; Reis, D J
1995-02-01
We designed and implemented algorithms for three-dimensional (3-D) reconstruction of brain images from serial sections using two important supercomputer architectures, vector and parallel. These architectures were represented by the Cray YMP and Connection Machine CM-2, respectively. The programs operated on linear octree representations of the brain data sets, and achieved 500-800 times acceleration when compared with a conventional laboratory workstation. As the need for higher resolution data sets increases, supercomputer algorithms may offer a means of performing 3-D reconstruction well above current experimental limits.
Intelligent supercomputers: the Japanese computer sputnik
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walter, G.
1983-11-01
Japan's government-supported fifth-generation computer project has had a pronounced effect on the American computer and information systems industry. The US firms are intensifying their research on and production of intelligent supercomputers, a combination of computer architecture and artificial intelligence software programs. While the present generation of computers is built for the processing of numbers, the new supercomputers will be designed specifically for the solution of symbolic problems and the use of artificial intelligence software. This article discusses new and exciting developments that will increase computer capabilities in the 1990s. 4 references.
An approach to secure weather and climate models against hardware faults
NASA Astrophysics Data System (ADS)
Düben, Peter D.; Dawson, Andrew
2017-03-01
Enabling Earth System models to run efficiently on future supercomputers is a serious challenge for model development. Many publications study efficient parallelization to allow better scaling of performance on an increasing number of computing cores. However, one of the most alarming threats for weather and climate predictions on future high performance computing architectures is widely ignored: the presence of hardware faults that will frequently hit large applications as we approach exascale supercomputing. Changes in the structure of weather and climate models that would allow them to be resilient against hardware faults are hardly discussed in the model development community. In this paper, we present an approach to secure the dynamical core of weather and climate models against hardware faults using a backup system that stores coarse resolution copies of prognostic variables. Frequent checks of the model fields on the backup grid allow the detection of severe hardware faults, and prognostic variables that are changed by hardware faults on the model grid can be restored from the backup grid to continue model simulations with no significant delay. To justify the approach, we perform model simulations with a C-grid shallow water model in the presence of frequent hardware faults. As long as the backup system is used, simulations do not crash and a high level of model quality can be maintained. The overhead due to the backup system is reasonable and additional storage requirements are small. Runtime is increased by only 13 % for the shallow water model.
An approach to secure weather and climate models against hardware faults
NASA Astrophysics Data System (ADS)
Düben, Peter; Dawson, Andrew
2017-04-01
Enabling Earth System models to run efficiently on future supercomputers is a serious challenge for model development. Many publications study efficient parallelisation to allow better scaling of performance on an increasing number of computing cores. However, one of the most alarming threats for weather and climate predictions on future high performance computing architectures is widely ignored: the presence of hardware faults that will frequently hit large applications as we approach exascale supercomputing. Changes in the structure of weather and climate models that would allow them to be resilient against hardware faults are hardly discussed in the model development community. We present an approach to secure the dynamical core of weather and climate models against hardware faults using a backup system that stores coarse resolution copies of prognostic variables. Frequent checks of the model fields on the backup grid allow the detection of severe hardware faults, and prognostic variables that are changed by hardware faults on the model grid can be restored from the backup grid to continue model simulations with no significant delay. To justify the approach, we perform simulations with a C-grid shallow water model in the presence of frequent hardware faults. As long as the backup system is used, simulations do not crash and a high level of model quality can be maintained. The overhead due to the backup system is reasonable and additional storage requirements are small. Runtime is increased by only 13% for the shallow water model.
Simulating functional magnetic materials on supercomputers.
Gruner, Markus Ernst; Entel, Peter
2009-07-22
The recent passing of the petaflop per second landmark by the Roadrunner project at the Los Alamos National Laboratory marks a preliminary peak of an impressive world-wide development in the high-performance scientific computing sector. Also, purely academic state-of-the-art supercomputers such as the IBM Blue Gene/P at Forschungszentrum Jülich allow us nowadays to investigate large systems of the order of 10(3) spin polarized transition metal atoms by means of density functional theory. Three applications will be presented where large-scale ab initio calculations contribute to the understanding of key properties emerging from a close interrelation between structure and magnetism. The first two examples discuss the size dependent evolution of equilibrium structural motifs in elementary iron and binary Fe-Pt and Co-Pt transition metal nanoparticles, which are currently discussed as promising candidates for ultra-high-density magnetic data storage media. However, the preference for multiply twinned morphologies at smaller cluster sizes counteracts the formation of a single-crystalline L1(0) phase, which alone provides the required hard magnetic properties. The third application is concerned with the magnetic shape memory effect in the Ni-Mn-Ga Heusler alloy, which is a technologically relevant candidate for magnetomechanical actuators and sensors. In this material strains of up to 10% can be induced by external magnetic fields due to the field induced shifting of martensitic twin boundaries, requiring an extremely high mobility of the martensitic twin boundaries, but also the selection of the appropriate martensitic structure from the rich phase diagram.
NASA Astrophysics Data System (ADS)
Lescinsky, D. T.; Wyborn, L. A.; Evans, B. J. K.; Allen, C.; Fraser, R.; Rankine, T.
2014-12-01
We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia's National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI's 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud. The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes. We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.
Advanced Aerospace Materials by Design
NASA Technical Reports Server (NTRS)
Srivastava, Deepak; Djomehri, Jahed; Wei, Chen-Yu
2004-01-01
The advances in the emerging field of nanophase thermal and structural composite materials; materials with embedded sensors and actuators for morphing structures; light-weight composite materials for energy and power storage; and large surface area materials for in-situ resource generation and waste recycling, are expected to :revolutionize the capabilities of virtually every system comprising of future robotic and :human moon and mars exploration missions. A high-performance multiscale simulation platform, including the computational capabilities and resources of Columbia - the new supercomputer, is being developed to discover, validate, and prototype next generation (of such advanced materials. This exhibit will describe the porting and scaling of multiscale 'physics based core computer simulation codes for discovering and designing carbon nanotube-polymer composite materials for light-weight load bearing structural and 'thermal protection applications.
FFTs in external or hierarchical memory
NASA Technical Reports Server (NTRS)
Bailey, David H.
1989-01-01
A description is given of advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms (1) require as few as two passes through the external data set, (2) use strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation. Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly 2 Gflops.
Information technologies for astrophysics circa 2001
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1991-01-01
It is easy to extrapolate current trends to see where technologies relating to information systems in astrophysics and other disciplines will be by the end of the decade. These technologies include miniaturization, multiprocessing, software technology, networking, databases, graphics, pattern computation, and interdisciplinary studies. It is less easy to see what limits our current paradigms place on our thinking about technologies that will allow us to understand the laws governing very large systems about which we have large data sets. Three limiting paradigms are as follows: saving all the bits collected by instruments or generated by supercomputers; obtaining technology for information compression, storage, and retrieval off the shelf; and the linear model of innovation. We must extend these paradigms to meet our goals for information technology at the end of the decade.
On multigrid methods for the Navier-Stokes Computer
NASA Technical Reports Server (NTRS)
Nosenchuck, D. M.; Krist, S. E.; Zang, T. A.
1988-01-01
The overall architecture of the multipurpose parallel-processing Navier-Stokes Computer (NSC) being developed by Princeton and NASA Langley (Nosenchuck et al., 1986) is described and illustrated with extensive diagrams, and the NSC implementation of an elementary multigrid algorithm for simulating isotropic turbulence (based on solution of the incompressible time-dependent Navier-Stokes equations with constant viscosity) is characterized in detail. The present NSC design concept calls for 64 nodes, each with the performance of a class VI supercomputer, linked together by a fiber-optic hypercube network and joined to a front-end computer by a global bus. In this configuration, the NSC would have a storage capacity of over 32 Gword and a peak speed of over 40 Gflops. The multigrid Navier-Stokes code discussed would give sustained operation rates of about 25 Gflops.
HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.
2017-10-01
PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.
Introducing Mira, Argonne's Next-Generation Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2013-03-19
Mira, the new petascale IBM Blue Gene/Q system installed at the ALCF, will usher in a new era of scientific supercomputing. An engineering marvel, the 10-petaflops machine is capable of carrying out 10 quadrillion calculations per second.
Green Supercomputing at Argonne
Pete Beckman
2017-12-09
Pete Beckman, head of Argonne's Leadership Computing Facility (ALCF) talks about Argonne National Laboratory's green supercomputingâeverything from designing algorithms to use fewer kilowatts per operation to using cold Chicago winter air to cool the machine more efficiently.
Benchmarking Data for the Proposed Signature of Used Fuel Casks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rauch, Eric Benton
2016-09-23
A set of benchmarking measurements to test facets of the proposed extended storage signature was conducted on May 17, 2016. The measurements were designed to test the overall concept of how the proposed signature can be used to identify a used fuel cask based only on the distribution of neutron sources within the cask. To simulate the distribution, 4 Cf-252 sources were chosen and arranged on a 3x3 grid in 3 different patterns and raw neutron totals counts were taken at 6 locations around the grid. This is a very simplified test of the typical geometry studied previously in simulationmore » with simulated used nuclear fuel.« less
Rasdaman for Big Spatial Raster Data
NASA Astrophysics Data System (ADS)
Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.
2015-12-01
Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.
Enabling the High Level Synthesis of Data Analytics Accelerators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less
Benchmarking Attosecond Physics with Atomic Hydrogen
2015-05-25
theoretical simulations are available in this regime. We provided accurate reference data on the photoionization yield and the CEP-dependent...this difficulty. This experiment claimed to show that, contrary to current understanding, the photoionization of an atomic electron is not an... photoion yield and transferrable intensity calibration. The dependence of photoionization probability on laser intensity is one of the most
Definition of the Spatial Resolution of X-Ray Microanalysis in Thin Foils
NASA Technical Reports Server (NTRS)
Williams, D. B.; Michael, J. R.; Goldstein, J. I.; Romig, A. D., Jr.
1992-01-01
The spatial resolution of X-ray microanalysis in thin foils is defined in terms of the incident electron beam diameter and the average beam broadening. The beam diameter is defined as the full width tenth maximum of a Gaussian intensity distribution. The spatial resolution is calculated by a convolution of the beam diameter and the average beam broadening. This definition of the spatial resolution can be related simply to experimental measurements of composition profiles across interphase interfaces. Monte Carlo calculations using a high-speed parallel supercomputer show good agreement with this definition of the spatial resolution and calculations based on this definition. The agreement is good over a range of specimen thicknesses and atomic number, but is poor when excessive beam tailing distorts the assumed Gaussian electron intensity distributions. Beam tailing occurs in low-Z materials because of fast secondary electrons and in high-Z materials because of plural scattering.
Advanced Computing for Manufacturing.
ERIC Educational Resources Information Center
Erisman, Albert M.; Neves, Kenneth W.
1987-01-01
Discusses ways that supercomputers are being used in the manufacturing industry, including the design and production of airplanes and automobiles. Describes problems that need to be solved in the next few years for supercomputers to assume a major role in industry. (TW)
Treatment planning for spinal radiosurgery : A competitive multiplatform benchmark challenge.
Moustakis, Christos; Chan, Mark K H; Kim, Jinkoo; Nilsson, Joakim; Bergman, Alanah; Bichay, Tewfik J; Palazon Cano, Isabel; Cilla, Savino; Deodato, Francesco; Doro, Raffaela; Dunst, Jürgen; Eich, Hans Theodor; Fau, Pierre; Fong, Ming; Haverkamp, Uwe; Heinze, Simon; Hildebrandt, Guido; Imhoff, Detlef; de Klerck, Erik; Köhn, Janett; Lambrecht, Ulrike; Loutfi-Krauss, Britta; Ebrahimi, Fatemeh; Masi, Laura; Mayville, Alan H; Mestrovic, Ante; Milder, Maaike; Morganti, Alessio G; Rades, Dirk; Ramm, Ulla; Rödel, Claus; Siebert, Frank-Andre; den Toom, Wilhelm; Wang, Lei; Wurster, Stefan; Schweikard, Achim; Soltys, Scott G; Ryu, Samuel; Blanck, Oliver
2018-05-25
To investigate the quality of treatment plans of spinal radiosurgery derived from different planning and delivery systems. The comparisons include robotic delivery and intensity modulated arc therapy (IMAT) approaches. Multiple centers with equal systems were used to reduce a bias based on individual's planning abilities. The study used a series of three complex spine lesions to maximize the difference in plan quality among the various approaches. Internationally recognized experts in the field of treatment planning and spinal radiosurgery from 12 centers with various treatment planning systems participated. For a complex spinal lesion, the results were compared against a previously published benchmark plan derived for CyberKnife radiosurgery (CKRS) using circular cones only. For two additional cases, one with multiple small lesions infiltrating three vertebrae and a single vertebra lesion treated with integrated boost, the results were compared against a benchmark plan generated using a best practice guideline for CKRS. All plans were rated based on a previously established ranking system. All 12 centers could reach equality (n = 4) or outperform (n = 8) the benchmark plan. For the multiple lesions and the single vertebra lesion plan only 5 and 3 of the 12 centers, respectively, reached equality or outperformed the best practice benchmark plan. However, the absolute differences in target and critical structure dosimetry were small and strongly planner-dependent rather than system-dependent. Overall, gantry-based IMAT with simple planning techniques (two coplanar arcs) produced faster treatments and significantly outperformed static gantry intensity modulated radiation therapy (IMRT) and multileaf collimator (MLC) or non-MLC CKRS treatment plan quality regardless of the system (mean rank out of 4 was 1.2 vs. 3.1, p = 0.002). High plan quality for complex spinal radiosurgery was achieved among all systems and all participating centers in this planning challenge. This study concludes that simple IMAT techniques can generate significantly better plan quality compared to previous established CKRS benchmarks.
Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; ...
2017-10-04
The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overallmore » memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.« less
Advances in Patch-Based Adaptive Mesh Refinement Scalability
Gunney, Brian T.N.; Anderson, Robert W.
2015-12-18
Patch-based structured adaptive mesh refinement (SAMR) is widely used for high-resolution simu- lations. Combined with modern supercomputers, it could provide simulations of unprecedented size and resolution. A persistent challenge for this com- bination has been managing dynamically adaptive meshes on more and more MPI tasks. The dis- tributed mesh management scheme in SAMRAI has made some progress SAMR scalability, but early al- gorithms still had trouble scaling past the regime of 105 MPI tasks. This work provides two critical SAMR regridding algorithms, which are integrated into that scheme to ensure efficiency of the whole. The clustering algorithm is an extensionmore » of the tile- clustering approach, making it more flexible and efficient in both clustering and parallelism. The partitioner is a new algorithm designed to prevent the network congestion experienced by its prede- cessor. We evaluated performance using weak- and strong-scaling benchmarks designed to be difficult for dynamic adaptivity. Results show good scaling on up to 1.5M cores and 2M MPI tasks. Detailed timing diagnostics suggest scaling would continue well past that.« less
Accelerating 3D Hall MHD Magnetosphere Simulations with Graphics Processing Units
NASA Astrophysics Data System (ADS)
Bard, C.; Dorelli, J.
2017-12-01
The resolution required to simulate planetary magnetospheres with Hall magnetohydrodynamics result in program sizes approaching several hundred million grid cells. These would take years to run on a single computational core and require hundreds or thousands of computational cores to complete in a reasonable time. However, this requires access to the largest supercomputers. Graphics processing units (GPUs) provide a viable alternative: one GPU can do the work of roughly 100 cores, bringing Hall MHD simulations of Ganymede within reach of modest GPU clusters ( 8 GPUs). We report our progress in developing a GPU-accelerated, three-dimensional Hall magnetohydrodynamic code and present Hall MHD simulation results for both Ganymede (run on 8 GPUs) and Mercury (56 GPUs). We benchmark our Ganymede simulation with previous results for the Galileo G8 flyby, namely that adding the Hall term to ideal MHD simulations changes the global convection pattern within the magnetosphere. Additionally, we present new results for the G1 flyby as well as initial results from Hall MHD simulations of Mercury and compare them with the corresponding ideal MHD runs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friese, Ryan; Khemka, Bhavesh; Maciejewski, Anthony A
Rising costs of energy consumption and an ongoing effort for increases in computing performance are leading to a significant need for energy-efficient computing. Before systems such as supercomputers, servers, and datacenters can begin operating in an energy-efficient manner, the energy consumption and performance characteristics of the system must be analyzed. In this paper, we provide an analysis framework that will allow a system administrator to investigate the tradeoffs between system energy consumption and utility earned by a system (as a measure of system performance). We model these trade-offs as a bi-objective resource allocation problem. We use a popular multi-objective geneticmore » algorithm to construct Pareto fronts to illustrate how different resource allocations can cause a system to consume significantly different amounts of energy and earn different amounts of utility. We demonstrate our analysis framework using real data collected from online benchmarks, and further provide a method to create larger data sets that exhibit similar heterogeneity characteristics to real data sets. This analysis framework can provide system administrators with insight to make intelligent scheduling decisions based on the energy and utility needs of their systems.« less
Optimizing legacy molecular dynamics software with directive-based offload
NASA Astrophysics Data System (ADS)
Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.
2015-10-01
Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.
Advances in Patch-Based Adaptive Mesh Refinement Scalability
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gunney, Brian T.N.; Anderson, Robert W.
Patch-based structured adaptive mesh refinement (SAMR) is widely used for high-resolution simu- lations. Combined with modern supercomputers, it could provide simulations of unprecedented size and resolution. A persistent challenge for this com- bination has been managing dynamically adaptive meshes on more and more MPI tasks. The dis- tributed mesh management scheme in SAMRAI has made some progress SAMR scalability, but early al- gorithms still had trouble scaling past the regime of 105 MPI tasks. This work provides two critical SAMR regridding algorithms, which are integrated into that scheme to ensure efficiency of the whole. The clustering algorithm is an extensionmore » of the tile- clustering approach, making it more flexible and efficient in both clustering and parallelism. The partitioner is a new algorithm designed to prevent the network congestion experienced by its prede- cessor. We evaluated performance using weak- and strong-scaling benchmarks designed to be difficult for dynamic adaptivity. Results show good scaling on up to 1.5M cores and 2M MPI tasks. Detailed timing diagnostics suggest scaling would continue well past that.« less
Sumboja, Afriyanti; Liu, Jiawei; Zheng, Wesley Guangyuan; Zong, Yun; Zhang, Hua; Liu, Zhaolin
2018-06-27
Compatible energy storage devices that are able to withstand various mechanical deformations, while delivering their intended functions, are required in wearable technologies. This imposes constraints on the structural designs, materials selection, and miniaturization of the cells. To date, extensive efforts have been dedicated towards developing electrochemical energy storage devices for wearables, with a focus on incorporation of shape-conformable materials into mechanically robust designs that can be worn on the human body. In this review, we highlight the quantified performances of reported wearable electrochemical energy storage devices, as well as their micro-sized counterparts under specific mechanical deformations, which can be used as the benchmark for future studies in this field. A general introduction to the wearable technology, the development of the selection and synthesis of active materials, cell design approaches and device fabrications are discussed. It is followed by challenges and outlook toward the practical use of electrochemical energy storage devices for wearable applications.
Decibel: The Relational Dataset Branching System
Maddox, Michael; Goehring, David; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya; Deshpande, Amol
2017-01-01
As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these shortcomings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs. PMID:28149668
Quance, S C; Shortall, A C; Harrington, E; Lumley, P J
2001-11-01
The effect of variation in post-exposure storage temperature (18 vs. 37 degrees C) and light intensity (200 vs. 500mW/cm(2)) on micro-hardness of seven light-activated resin composite materials, cured with a Prismetics Mk II (Dentsply) light activation unit, were studied. Hardness values at the upper and lower surfaces of 2mm thick disc shaped specimens of seven light-cured resin composite materials (Herculite XRV and Prodigy/Kerr, Z100 and Silux Plus/3M, TPH/Dentsply, Pertac-Hybrid/Espe, and Charisma/Kulzer), which had been stored dry, were determined 24h after irradiation with a Prismetics Mk II (Dentsply) light activation unit. Hardness values varied with product, surface, storage temperature, and curing light intensity. In no case did the hardness at the lower surface equal that of the upper surface, and the combination of 500mW/cm(2) intensity and 37 degrees C storage produced the best hardness results at the lower surface. Material composition had a significant influence on surface hardness. Only one of the seven products (TPH) produced a mean hardness values at the lower surface >80% of the maximum mean upper surface hardness obtained for the corresponding product at 500mW/cm(2) intensity/37 degrees C storage temperature when subjected to all four test regimes. Despite optimum post-cure storage conditions, 200mW/cm(2) intensity curing for 40s will not produce acceptable hardness at the lower surface of 2mm increments of the majority of products tested.
do Nascimento, Cássio; Muller, Katia; Sato, Sandra; Albuquerque Junior, Rubens Ferreira
2012-04-01
Long-term sample storage can affect the intensity of the hybridization signals provided by molecular diagnostic methods that use chemiluminescent detection. The aim of this study was to evaluate the effect of different storage times on the hybridization signals of 13 bacterial species detected by the Checkerboard DNA-DNA hybridization method using whole-genomic DNA probes. Ninety-six subgingival biofilm samples were collected from 36 healthy subjects, and the intensity of hybridization signals was evaluated at 4 different time periods: (1) immediately after collecting (n = 24) and (2) after storage at -20 °C for 6 months (n = 24), (3) for 12 months (n = 24), and (4) for 24 months (n = 24). The intensity of hybridization signals obtained from groups 1 and 2 were significantly higher than in the other groups (p < 0.001). No differences were found between groups 1 and 2 (p > 0.05). The Checkerboard DNA-DNA hybridization method was suitable to detect hybridization signals from all groups evaluated, and the intensity of signals decreased significantly after long periods of sample storage.
Sensory Quality Preservation of Coated Walnuts.
Grosso, Antonella L; Asensio, Claudia M; Grosso, Nelson R; Nepote, Valeria
2017-01-01
The objective of this study was to evaluate the sensory stability of coated walnuts during storage. Four walnut samples were prepared: uncoated (NC), and samples coated with carboxymethyl cellulose (NCMC), methyl cellulose (NMC), or whey protein (NPS). The samples were stored at room temperature for 210 d and were periodically removed from storage to perform a sensory descriptive analysis. A consumer acceptance test was carried out on the fresh product (storage day 0) to evaluate flavor. All samples exhibited significant differences in their sensory attributes initially and after storage. Intensity ratings for oxidized and cardboard flavors increased during storage. NC showed the highest oxidized and cardboard intensity ratings (39 and 22, respectively) and NMC exhibited the lowest intensity ratings for these negative attributes (8 and 17, respectively) after 210 d of storage. Alternatively, the intensity ratings for sweetness and walnut flavors were decreased for all samples. NMC had the lowest decrease at the end of storage for these positive attributes (75.86 in walnut flavor and 12.09 in sweetness). The results of this study suggest a protective effect of the use of an edible coating to preserve sensory attributes during storage, especially for samples coated with MC. The results of the acceptance test showed that addition of the coating negatively affected the flavor acceptance for NMC and NCMC coated walnuts. Edible coatings help to preserve sensory attributes in walnuts, improving their shelf-life, however, these coatings may affect consumer acceptance in some cases. © 2016 Institute of Food Technologists®.
Supercomputers Join the Fight against Cancer – U.S. Department of Energy
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
The Department of Energy has some of the best supercomputers in the world. Now, they’re joining the fight against cancer. Learn about our new partnership with the National Cancer Institute and GlaxoSmithKline Pharmaceuticals.
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Howser, Lona M.
1993-01-01
This document briefly describes the use of the CRAY supercomputers that are an integral part of the Supercomputing Network Subsystem of the Central Scientific Computing Complex at LaRC. Features of the CRAY supercomputers are covered, including: FORTRAN, C, PASCAL, architectures of the CRAY-2 and CRAY Y-MP, the CRAY UNICOS environment, batch job submittal, debugging, performance analysis, parallel processing, utilities unique to CRAY, and documentation. The document is intended for all CRAY users as a ready reference to frequently asked questions and to more detailed information contained in the vendor manuals. It is appropriate for both the novice and the experienced user.
New Report Shines Light on Installed Costs and Deployment Barriers for
Laboratory (NREL) are making available the most detailed component and system-level cost breakdowns to date previously unknown soft costs for the first time. The report, titled "Installed Cost Benchmarks and interest in pairing distributed PV with storage, but there's a lack of publicly available cost data and
Chen, Lili; Yuan, Zhiyou; Shao, Hongbo; Wang, Dexiang; Mu, Xingmin
2014-01-01
Thinning is a crucial practice in the forest ecosystem management. The soil infiltration rate and water storage capacity of pine-oak mixed forest under three different thinning intensity treatments (15%, 30%, and 60%) were studied in Qinling Mountains of China. The thinning operations had a significant influence on soil infiltration rate and water storage capacity. The soil infiltration rate and water storage capacity in different thinning treatments followed the order of control (nonthinning): <60%, <15%, and <30%. It demonstrated that thinning operation with 30% intensity can substantially improve soil infiltration rate and water storage capacity of pine-oak mixed forest in Qinling Mountains. The soil initial infiltration rate, stable infiltration rate, and average infiltration rate in thinning 30% treatment were significantly increased by 21.1%, 104.6%, and 60.9%, compared with the control. The soil maximal water storage capacity and noncapillary water storage capacity in thinning 30% treatment were significantly improved by 20.1% and 34.3% in contrast to the control. The soil infiltration rate and water storage capacity were significantly higher in the surface layer (0~20 cm) than in the deep layers (20~40 cm and 40~60 cm). We found that the soil property was closely related to soil infiltration rate and water storage capacity.
Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E
2015-09-29
In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
NASA Astrophysics Data System (ADS)
Anantharaj, V.; Mayer, B.; Wang, F.; Hack, J.; McKenna, D.; Hartman-Baker, R.
2012-04-01
The Oak Ridge Leadership Computing Facility (OLCF) facilitates the execution of computational experiments that require tens of millions of CPU hours (typically using thousands of processors simultaneously) while generating hundreds of terabytes of data. A set of ultra high resolution climate experiments in progress, using the Community Earth System Model (CESM), will produce over 35,000 files, ranging in sizes from 21 MB to 110 GB each. The execution of the experiments will require nearly 70 Million CPU hours on the Jaguar and Titan supercomputers at OLCF. The total volume of the output from these climate modeling experiments will be in excess of 300 TB. This model output must then be archived, analyzed, distributed to the project partners in a timely manner, and also made available more broadly. Meeting this challenge would require efficient movement of the data, staging the simulation output to a large and fast file system that provides high volume access to other computational systems used to analyze the data and synthesize results. This file system also needs to be accessible via high speed networks to an archival system that can provide long term reliable storage. Ideally this archival system is itself directly available to other systems that can be used to host services making the data and analysis available to the participants in the distributed research project and to the broader climate community. The various resources available at the OLCF now support this workflow. The available systems include the new Jaguar Cray XK6 2.63 petaflops (estimated) supercomputer, the 10 PB Spider center-wide parallel file system, the Lens/EVEREST analysis and visualization system, the HPSS archival storage system, the Earth System Grid (ESG), and the ORNL Climate Data Server (CDS). The ESG features federated services, search & discovery, extensive data handling capabilities, deep storage access, and Live Access Server (LAS) integration. The scientific workflow enabled on these systems, and developed as part of the Ultra-High Resolution Climate Modeling Project, allows users of OLCF resources to efficiently share simulated data, often multi-terabyte in volume, as well as the results from the modeling experiments and various synthesized products derived from these simulations. The final objective in the exercise is to ensure that the simulation results and the enhanced understanding will serve the needs of a diverse group of stakeholders across the world, including our research partners in U.S. Department of Energy laboratories & universities, domain scientists, students (K-12 as well as higher education), resource managers, decision makers, and the general public.
Roadrunner Supercomputer Breaks the Petaflop Barrier
Los Alamos National Lab - Brian Albright, Charlie McMillan, Lin Yin
2017-12-09
At 3:30 a.m. on May 26, 2008, Memorial Day, the "Roadrunner" supercomputer exceeded a sustained speed of 1 petaflop/s, or 1 million billion calculations per second. The sustained performance makes Roadrunner more than twice as fast as the current number 1
QCD on the BlueGene/L Supercomputer
NASA Astrophysics Data System (ADS)
Bhanot, G.; Chen, D.; Gara, A.; Sexton, J.; Vranas, P.
2005-03-01
In June 2004 QCD was simulated for the first time at sustained speed exceeding 1 TeraFlops in the BlueGene/L supercomputer at the IBM T.J. Watson Research Lab. The implementation and performance of QCD in the BlueGene/L is presented.
Supercomputer Issues from a University Perspective.
ERIC Educational Resources Information Center
Beering, Steven C.
1984-01-01
Discusses issues related to the access of and training of university researchers in using supercomputers, considering National Science Foundation's (NSF) role in this area, microcomputers on campuses, and the limited use of existing telecommunication networks. Includes examples of potential scientific projects (by subject area) utilizing…
Modelling of a Solar Thermal Power Plant for Benchmarking Blackbox Optimization Solvers
NASA Astrophysics Data System (ADS)
Lemyre Garneau, Mathieu
A new family of problems is provided to serve as a benchmark for blackbox optimization solvers. The problems are single or bi-objective and vary in complexity in terms of the number of variables used (from 5 to 29), the type of variables (integer, real, category), the number of constraints (from 5 to 17) and their types (binary or continuous). In order to provide problems exhibiting dynamics that reflect real engineering challenges, they are extracted from an original numerical model of a concentrated solar power (CSP) power plant with molten salt thermal storage. The model simulates the performance of the power plant by using a high level modeling of each of its main components, namely, an heliostats field, a central cavity receiver, a molten salt heat storage, a steam generator and an idealized powerblock. The heliostats field layout is determined through a simple automatic strategy that finds the best individual positions on the field by considering their respective cosine efficiency, atmospheric scattering and spillage losses as a function of the design parameters. A Monte-Carlo integral method is used to evaluate the heliostats field's optical performance throughout the day so that shadowing effects between heliostats are considered, and the results of this evaluation provide the inputs to simulate the levels and temperatures of the thermal storage. The molten salt storage inventory is used to transfer thermal energy to the powerblock, which simulates a simple Rankine cycle with a single steam turbine. Auxiliary models are used to provide additional optimization constraints on the investment cost, parasitic losses or components failure. The results of preliminary optimizations performed with the NOMAD software using default settings are provided to show the validity of the problems.
Benchmarking distributed data warehouse solutions for storing genomic variant information
Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.
2017-01-01
Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
Autonomic Closure for Turbulent Flows Using Approximate Bayesian Computation
NASA Astrophysics Data System (ADS)
Doronina, Olga; Christopher, Jason; Hamlington, Peter; Dahm, Werner
2017-11-01
Autonomic closure is a new technique for achieving fully adaptive and physically accurate closure of coarse-grained turbulent flow governing equations, such as those solved in large eddy simulations (LES). Although autonomic closure has been shown in recent a priori tests to more accurately represent unclosed terms than do dynamic versions of traditional LES models, the computational cost of the approach makes it challenging to implement for simulations of practical turbulent flows at realistically high Reynolds numbers. The optimization step used in the approach introduces large matrices that must be inverted and is highly memory intensive. In order to reduce memory requirements, here we propose to use approximate Bayesian computation (ABC) in place of the optimization step, thereby yielding a computationally-efficient implementation of autonomic closure that trades memory-intensive for processor-intensive computations. The latter challenge can be overcome as co-processors such as general purpose graphical processing units become increasingly available on current generation petascale and exascale supercomputers. In this work, we outline the formulation of ABC-enabled autonomic closure and present initial results demonstrating the accuracy and computational cost of the approach.
Quantifying risk and benchmarking performance in the adult intensive care unit.
Higgins, Thomas L
2007-01-01
Morbidity, mortality, and length-of-stay outcomes in patients receiving critical care are difficult to interpret unless they are risk-stratified for diagnosis, presenting severity of illness, and other patient characteristics. Acuity adjustment systems for adults include the Acute Physiology And Chronic Health Evaluation (APACHE), the Mortality Probability Model (MPM), and the Simplified Acute Physiology Score (SAPS). All have recently been updated and recalibrated to reflect contemporary results. Specialized scores are also available for patient subpopulations where general acuity scores have drawbacks. Demand for outcomes data is likely to grow with pay-for-performance initiatives as well as for routine clinical, prognostic, administrative, and research applications. It is important for clinicians to understand how these scores are derived and how they are properly applied to quantify patient severity of illness and benchmark intensive care unit performance.
An automated protocol for performance benchmarking a widefield fluorescence microscope.
Halter, Michael; Bier, Elianna; DeRose, Paul C; Cooksey, Gregory A; Choquette, Steven J; Plant, Anne L; Elliott, John T
2014-11-01
Widefield fluorescence microscopy is a highly used tool for visually assessing biological samples and for quantifying cell responses. Despite its widespread use in high content analysis and other imaging applications, few published methods exist for evaluating and benchmarking the analytical performance of a microscope. Easy-to-use benchmarking methods would facilitate the use of fluorescence imaging as a quantitative analytical tool in research applications, and would aid the determination of instrumental method validation for commercial product development applications. We describe and evaluate an automated method to characterize a fluorescence imaging system's performance by benchmarking the detection threshold, saturation, and linear dynamic range to a reference material. The benchmarking procedure is demonstrated using two different materials as the reference material, uranyl-ion-doped glass and Schott 475 GG filter glass. Both are suitable candidate reference materials that are homogeneously fluorescent and highly photostable, and the Schott 475 GG filter glass is currently commercially available. In addition to benchmarking the analytical performance, we also demonstrate that the reference materials provide for accurate day to day intensity calibration. Published 2014 Wiley Periodicals Inc. Published 2014 Wiley Periodicals Inc. This article is a US government work and, as such, is in the public domain in the United States of America.
Finite element methods on supercomputers - The scatter-problem
NASA Technical Reports Server (NTRS)
Loehner, R.; Morgan, K.
1985-01-01
Certain problems arise in connection with the use of supercomputers for the implementation of finite-element methods. These problems are related to the desirability of utilizing the power of the supercomputer as fully as possible for the rapid execution of the required computations, taking into account the gain in speed possible with the aid of pipelining operations. For the finite-element method, the time-consuming operations may be divided into three categories. The first two present no problems, while the third type of operation can be a reason for the inefficient performance of finite-element programs. Two possibilities for overcoming certain difficulties are proposed, giving attention to a scatter-process.
Code IN Exhibits - Supercomputing 2000
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; McCann, Karen M.; Biswas, Rupak; VanderWijngaart, Rob F.; Kwak, Dochan (Technical Monitor)
2000-01-01
The creation of parameter study suites has recently become a more challenging problem as the parameter studies have become multi-tiered and the computational environment has become a supercomputer grid. The parameter spaces are vast, the individual problem sizes are getting larger, and researchers are seeking to combine several successive stages of parameterization and computation. Simultaneously, grid-based computing offers immense resource opportunities but at the expense of great difficulty of use. We present ILab, an advanced graphical user interface approach to this problem. Our novel strategy stresses intuitive visual design tools for parameter study creation and complex process specification, and also offers programming-free access to grid-based supercomputer resources and process automation.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers
NASA Technical Reports Server (NTRS)
Overman, Andrea L.; Poole, Eugene L.
1991-01-01
A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
The Ensembl genome database project.
Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M
2002-01-01
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
NSF Establishes First Four National Supercomputer Centers.
ERIC Educational Resources Information Center
Lepkowski, Wil
1985-01-01
The National Science Foundation (NSF) has awarded support for supercomputer centers at Cornell University, Princeton University, University of California (San Diego), and University of Illinois. These centers are to be the nucleus of a national academic network for use by scientists and engineers throughout the United States. (DH)
Library Services in a Supercomputer Center.
ERIC Educational Resources Information Center
Layman, Mary
1991-01-01
Describes library services that are offered at the San Diego Supercomputer Center (SDSC), which is located at the University of California at San Diego. Topics discussed include the user population; online searching; microcomputer use; electronic networks; current awareness programs; library catalogs; and the slide collection. A sidebar outlines…
Probing the cosmic causes of errors in supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Cosmic rays from outer space are causing errors in supercomputers. The neutrons that pass through the CPU may be causing binary data to flip leading to incorrect calculations. Los Alamos National Laboratory has developed detectors to determine how much data is being corrupted by these cosmic particles.
Benchmark carbon stocks from old-growth forests in northern New England, USA
Coeli M. Hoover; William B. Leak; Brian G. Keel
2012-01-01
Forests world-wide are recognized as important components of the global carbon cycle. Carbon sequestration has become a recognized forest management objective, but the full carbon storage potential of forests is not well understood. The premise of this study is that old-growth forests can be expected to provide a reasonable estimate of the upper limits of carbon...
Chen, Lili; Yuan, Zhiyou; Shao, Hongbo; Wang, Dexiang; Mu, Xingmin
2014-01-01
Thinning is a crucial practice in the forest ecosystem management. The soil infiltration rate and water storage capacity of pine-oak mixed forest under three different thinning intensity treatments (15%, 30%, and 60%) were studied in Qinling Mountains of China. The thinning operations had a significant influence on soil infiltration rate and water storage capacity. The soil infiltration rate and water storage capacity in different thinning treatments followed the order of control (nonthinning): <60%, <15%, and <30%. It demonstrated that thinning operation with 30% intensity can substantially improve soil infiltration rate and water storage capacity of pine-oak mixed forest in Qinling Mountains. The soil initial infiltration rate, stable infiltration rate, and average infiltration rate in thinning 30% treatment were significantly increased by 21.1%, 104.6%, and 60.9%, compared with the control. The soil maximal water storage capacity and noncapillary water storage capacity in thinning 30% treatment were significantly improved by 20.1% and 34.3% in contrast to the control. The soil infiltration rate and water storage capacity were significantly higher in the surface layer (0~20 cm) than in the deep layers (20~40 cm and 40~60 cm). We found that the soil property was closely related to soil infiltration rate and water storage capacity. PMID:24883372
NASA Technical Reports Server (NTRS)
Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
The Sky's the Limit When Super Students Meet Supercomputers.
ERIC Educational Resources Information Center
Trotter, Andrew
1991-01-01
In a few select high schools in the U.S., supercomputers are allowing talented students to attempt sophisticated research projects using simultaneous simulations of nature, culture, and technology not achievable by ordinary microcomputers. Schools can get their students online by entering contests and seeking grants and partnerships with…
NSF Says It Will Support Supercomputer Centers in California and Illinois.
ERIC Educational Resources Information Center
Strosnider, Kim; Young, Jeffrey R.
1997-01-01
The National Science Foundation will increase support for supercomputer centers at the University of California, San Diego and the University of Illinois, Urbana-Champaign, while leaving unclear the status of the program at Cornell University (New York) and a cooperative Carnegie-Mellon University (Pennsylvania) and University of Pittsburgh…
Access to Supercomputers. Higher Education Panel Report 69.
ERIC Educational Resources Information Center
Holmstrom, Engin Inel
This survey was conducted to provide the National Science Foundation with baseline information on current computer use in the nation's major research universities, including the actual and potential use of supercomputers. Questionnaires were sent to 207 doctorate-granting institutions; after follow-ups, 167 institutions (91% of the institutions…
NOAA announces significant investment in next generation of supercomputers
provide more timely, accurate weather forecasts. (Credit: istockphoto.com) Today, NOAA announced the next phase in the agency's efforts to increase supercomputing capacity to provide more timely, accurate turn will lead to more timely, accurate, and reliable forecasts." Ahead of this upgrade, each of
Developments in the simulation of compressible inviscid and viscous flow on supercomputers
NASA Technical Reports Server (NTRS)
Steger, J. L.; Buning, P. G.
1985-01-01
In anticipation of future supercomputers, finite difference codes are rapidly being extended to simulate three-dimensional compressible flow about complex configurations. Some of these developments are reviewed. The importance of computational flow visualization and diagnostic methods to three-dimensional flow simulation is also briefly discussed.
NASA Technical Reports Server (NTRS)
Smarr, Larry; Press, William; Arnett, David W.; Cameron, Alastair G. W.; Crutcher, Richard M.; Helfand, David J.; Horowitz, Paul; Kleinmann, Susan G.; Linsky, Jeffrey L.; Madore, Barry F.
1991-01-01
The applications of computers and data processing to astronomy are discussed. Among the topics covered are the emerging national information infrastructure, workstations and supercomputers, supertelescopes, digital astronomy, astrophysics in a numerical laboratory, community software, archiving of ground-based observations, dynamical simulations of complex systems, plasma astrophysics, and the remote control of fourth dimension supercomputers.
Brown, J B; Nakatsui, Masahiko; Okuno, Yasushi
2014-12-01
The cost of pharmaceutical R&D has risen enormously, both worldwide and in Japan. However, Japan faces a particularly difficult situation in that its population is aging rapidly, and the cost of pharmaceutical R&D affects not only the industry but the entire medical system as well. To attempt to reduce costs, the newly launched K supercomputer is available for big data drug discovery and structural simulation-based drug discovery. We have implemented both primary (direct) and secondary (infrastructure, data processing) methods for the two types of drug discovery, custom tailored to maximally use the 88 128 compute nodes/CPUs of K, and evaluated the implementations. We present two types of results. In the first, we executed the virtual screening of nearly 19 billion compound-protein interactions, and calculated the accuracy of predictions against publicly available experimental data. In the second investigation, we implemented a very computationally intensive binding free energy algorithm, and found that comparison of our binding free energies was considerably accurate when validated against another type of publicly available experimental data. The common feature of both result types is the scale at which computations were executed. The frameworks presented in this article provide prospectives and applications that, while tuned to the computing resources available in Japan, are equally applicable to any equivalent large-scale infrastructure provided elsewhere. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.
Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang
2016-03-01
Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.
ASTEC: Controls analysis for personal computers
NASA Technical Reports Server (NTRS)
Downing, John P.; Bauer, Frank H.; Thorpe, Christopher J.
1989-01-01
The ASTEC (Analysis and Simulation Tools for Engineering Controls) software is under development at Goddard Space Flight Center (GSFC). The design goal is to provide a wide selection of controls analysis tools at the personal computer level, as well as the capability to upload compute-intensive jobs to a mainframe or supercomputer. The project is a follow-on to the INCA (INteractive Controls Analysis) program that has been developed at GSFC over the past five years. While ASTEC makes use of the algorithms and expertise developed for the INCA program, the user interface was redesigned to take advantage of the capabilities of the personal computer. The design philosophy and the current capabilities of the ASTEC software are described.
Benchmarking ensemble streamflow prediction skill in the UK
NASA Astrophysics Data System (ADS)
Harrigan, Shaun; Prudhomme, Christel; Parry, Simon; Smith, Katie; Tanguy, Maliko
2018-03-01
Skilful hydrological forecasts at sub-seasonal to seasonal lead times would be extremely beneficial for decision-making in water resources management, hydropower operations, and agriculture, especially during drought conditions. Ensemble streamflow prediction (ESP) is a well-established method for generating an ensemble of streamflow forecasts in the absence of skilful future meteorological predictions, instead using initial hydrologic conditions (IHCs), such as soil moisture, groundwater, and snow, as the source of skill. We benchmark when and where the ESP method is skilful across a diverse sample of 314 catchments in the UK and explore the relationship between catchment storage and ESP skill. The GR4J hydrological model was forced with historic climate sequences to produce a 51-member ensemble of streamflow hindcasts. We evaluated forecast skill seamlessly from lead times of 1 day to 12 months initialized at the first of each month over a 50-year hindcast period from 1965 to 2015. Results showed ESP was skilful against a climatology benchmark forecast in the majority of catchments across all lead times up to a year ahead, but the degree of skill was strongly conditional on lead time, forecast initialization month, and individual catchment location and storage properties. UK-wide mean ESP skill decayed exponentially as a function of lead time with continuous ranked probability skill scores across the year of 0.75, 0.20, and 0.11 for 1-day, 1-month, and 3-month lead times, respectively. However, skill was not uniform across all initialization months. For lead times up to 1 month, ESP skill was higher than average when initialized in summer and lower in winter months, whereas for longer seasonal and annual lead times skill was higher when initialized in autumn and winter months and lowest in spring. ESP was most skilful in the south and east of the UK, where slower responding catchments with higher soil moisture and groundwater storage are mainly located; correlation between catchment base flow index (BFI) and ESP skill was very strong (Spearman's rank correlation coefficient = 0.90 at 1-month lead time). This was in contrast to the more highly responsive catchments in the north and west which were generally not skilful at seasonal lead times. Overall, this work provides scientific justification for when and where use of such a relatively simple forecasting approach is appropriate in the UK. This study, furthermore, creates a low cost benchmark against which potential skill improvements from more sophisticated hydro-meteorological ensemble prediction systems can be judged.
Microsupercapacitors as miniaturized energy-storage components for on-chip electronics
NASA Astrophysics Data System (ADS)
Kyeremateng, Nana Amponsah; Brousse, Thierry; Pech, David
2017-01-01
The push towards miniaturized electronics calls for the development of miniaturized energy-storage components that can enable sustained, autonomous operation of electronic devices for applications such as wearable gadgets and wireless sensor networks. Microsupercapacitors have been targeted as a viable route for this purpose, because, though storing less energy than microbatteries, they can be charged and discharged much more rapidly and have an almost unlimited lifetime. In this Review, we discuss the progress and the prospects of integrated miniaturized supercapacitors. In particular, we discuss their power performances and emphasize the need of a three-dimensional design to boost their energy-storage capacity. This is obtainable, for example, through self-supported nanostructured electrodes. We also critically evaluate the performance metrics currently used in the literature to characterize microsupercapacitors and offer general guidelines to benchmark performances towards prospective applications.
Microsupercapacitors as miniaturized energy-storage components for on-chip electronics.
Kyeremateng, Nana Amponsah; Brousse, Thierry; Pech, David
2017-01-01
The push towards miniaturized electronics calls for the development of miniaturized energy-storage components that can enable sustained, autonomous operation of electronic devices for applications such as wearable gadgets and wireless sensor networks. Microsupercapacitors have been targeted as a viable route for this purpose, because, though storing less energy than microbatteries, they can be charged and discharged much more rapidly and have an almost unlimited lifetime. In this Review, we discuss the progress and the prospects of integrated miniaturized supercapacitors. In particular, we discuss their power performances and emphasize the need of a three-dimensional design to boost their energy-storage capacity. This is obtainable, for example, through self-supported nanostructured electrodes. We also critically evaluate the performance metrics currently used in the literature to characterize microsupercapacitors and offer general guidelines to benchmark performances towards prospective applications.
Supercomputer use in orthopaedic biomechanics research: focus on functional adaptation of bone.
Hart, R T; Thongpreda, N; Van Buskirk, W C
1988-01-01
The authors describe two biomechanical analyses carried out using numerical methods. One is an analysis of the stress and strain in a human mandible, and the other analysis involves modeling the adaptive response of a sheep bone to mechanical loading. The computing environment required for the two types of analyses is discussed. It is shown that a simple stress analysis of a geometrically complex mandible can be accomplished using a minicomputer. However, more sophisticated analyses of the same model with dynamic loading or nonlinear materials would require supercomputer capabilities. A supercomputer is also required for modeling the adaptive response of living bone, even when simple geometric and material models are use.
NREL's Building-Integrated Supercomputer Provides Heating and Efficient Computing (Fact Sheet)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
2014-09-01
NREL's Energy Systems Integration Facility (ESIF) is meant to investigate new ways to integrate energy sources so they work together efficiently, and one of the key tools to that investigation, a new supercomputer, is itself a prime example of energy systems integration. NREL teamed with Hewlett-Packard (HP) and Intel to develop the innovative warm-water, liquid-cooled Peregrine supercomputer, which not only operates efficiently but also serves as the primary source of building heat for ESIF offices and laboratories. This innovative high-performance computer (HPC) can perform more than a quadrillion calculations per second as part of the world's most energy-efficient HPC datamore » center.« less
Supercomputer optimizations for stochastic optimal control applications
NASA Technical Reports Server (NTRS)
Chung, Siu-Leung; Hanson, Floyd B.; Xu, Huihuang
1991-01-01
Supercomputer optimizations for a computational method of solving stochastic, multibody, dynamic programming problems are presented. The computational method is valid for a general class of optimal control problems that are nonlinear, multibody dynamical systems, perturbed by general Markov noise in continuous time, i.e., nonsmooth Gaussian as well as jump Poisson random white noise. Optimization techniques for vector multiprocessors or vectorizing supercomputers include advanced data structures, loop restructuring, loop collapsing, blocking, and compiler directives. These advanced computing techniques and superconducting hardware help alleviate Bellman's curse of dimensionality in dynamic programming computations, by permitting the solution of large multibody problems. Possible applications include lumped flight dynamics models for uncertain environments, such as large scale and background random aerospace fluctuations.
Optimization of large matrix calculations for execution on the Cray X-MP vector supercomputer
NASA Technical Reports Server (NTRS)
Hornfeck, William A.
1988-01-01
A considerable volume of large computational computer codes were developed for NASA over the past twenty-five years. This code represents algorithms developed for machines of earlier generation. With the emergence of the vector supercomputer as a viable, commercially available machine, an opportunity exists to evaluate optimization strategies to improve the efficiency of existing software. This result is primarily due to architectural differences in the latest generation of large-scale machines and the earlier, mostly uniprocessor, machines. A sofware package being used by NASA to perform computations on large matrices is described, and a strategy for conversion to the Cray X-MP vector supercomputer is also described.
NAS Technical Summaries, March 1993 - February 1994
NASA Technical Reports Server (NTRS)
1995-01-01
NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1993-94 operational year concluded with 448 high-speed processor projects and 95 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
NAS technical summaries. Numerical aerodynamic simulation program, March 1992 - February 1993
NASA Technical Reports Server (NTRS)
1994-01-01
NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1992-93 operational year concluded with 399 high-speed processor projects and 91 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
Congressional Panel Seeks To Curb Access of Foreign Students to U.S. Supercomputers.
ERIC Educational Resources Information Center
Kiernan, Vincent
1999-01-01
Fearing security problems, a congressional committee on Chinese espionage recommends that foreign students and other foreign nationals be barred from using supercomputers at national laboratories unless they first obtain export licenses from the federal government. University officials dispute the data on which the report is based and find the…
The Age of the Supercomputer Gives Way to the Age of the Super Infrastructure.
ERIC Educational Resources Information Center
Young, Jeffrey R.
1997-01-01
In October 1997, the National Science Foundation will discontinue financial support for two university-based supercomputer facilities to concentrate resources on partnerships led by facilities at the University of California, San Diego and the University of Illinois, Urbana-Champaign. The reconfigured program will develop more user-friendly and…
The ChemViz Project: Using a Supercomputer To Illustrate Abstract Concepts in Chemistry.
ERIC Educational Resources Information Center
Beckwith, E. Kenneth; Nelson, Christopher
1998-01-01
Describes the Chemistry Visualization (ChemViz) Project, a Web venture maintained by the University of Illinois National Center for Supercomputing Applications (NCSA) that enables high school students to use computational chemistry as a technique for understanding abstract concepts. Discusses the evolution of computational chemistry and provides a…
NASA Astrophysics Data System (ADS)
Schulthess, Thomas C.
2013-03-01
The continued thousand-fold improvement in sustained application performance per decade on modern supercomputers keeps opening new opportunities for scientific simulations. But supercomputers have become very complex machines, built with thousands or tens of thousands of complex nodes consisting of multiple CPU cores or, most recently, a combination of CPU and GPU processors. Efficient simulations on such high-end computing systems require tailored algorithms that optimally map numerical methods to particular architectures. These intricacies will be illustrated with simulations of strongly correlated electron systems, where the development of quantum cluster methods, Monte Carlo techniques, as well as their optimal implementation by means of algorithms with improved data locality and high arithmetic density have gone hand in hand with evolving computer architectures. The present work would not have been possible without continued access to computing resources at the National Center for Computational Science of Oak Ridge National Laboratory, which is funded by the Facilities Division of the Office of Advanced Scientific Computing Research, and the Swiss National Supercomputing Center (CSCS) that is funded by ETH Zurich.
Extracting the Textual and Temporal Structure of Supercomputing Logs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, S; Singh, I; Chandra, A
2009-05-26
Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an onlinemore » clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.« less
Toward a Proof of Concept Cloud Framework for Physics Applications on Blue Gene Supercomputers
NASA Astrophysics Data System (ADS)
Dreher, Patrick; Scullin, William; Vouk, Mladen
2015-09-01
Traditional high performance supercomputers are capable of delivering large sustained state-of-the-art computational resources to physics applications over extended periods of time using batch processing mode operating environments. However, today there is an increasing demand for more complex workflows that involve large fluctuations in the levels of HPC physics computational requirements during the simulations. Some of the workflow components may also require a richer set of operating system features and schedulers than normally found in a batch oriented HPC environment. This paper reports on progress toward a proof of concept design that implements a cloud framework onto BG/P and BG/Q platforms at the Argonne Leadership Computing Facility. The BG/P implementation utilizes the Kittyhawk utility and the BG/Q platform uses an experimental heterogeneous FusedOS operating system environment. Both platforms use the Virtual Computing Laboratory as the cloud computing system embedded within the supercomputer. This proof of concept design allows a cloud to be configured so that it can capitalize on the specialized infrastructure capabilities of a supercomputer and the flexible cloud configurations without resorting to virtualization. Initial testing of the proof of concept system is done using the lattice QCD MILC code. These types of user reconfigurable environments have the potential to deliver experimental schedulers and operating systems within a working HPC environment for physics computations that may be different from the native OS and schedulers on production HPC supercomputers.
2018-01-01
Selective digestive decontamination (SDD, topical antibiotic regimens applied to the respiratory tract) appears effective for preventing ventilator associated pneumonia (VAP) in intensive care unit (ICU) patients. However, potential contextual effects of SDD on Staphylococcus aureus infections in the ICU remain unclear. The S. aureus ventilator associated pneumonia (S. aureus VAP), VAP overall and S. aureus bacteremia incidences within component (control and intervention) groups within 27 SDD studies were benchmarked against 115 observational groups. Component groups from 66 studies of various interventions other than SDD provided additional points of reference. In 27 SDD study control groups, the mean S. aureus VAP incidence is 9.6% (95% CI; 6.9–13.2) versus a benchmark derived from 115 observational groups being 4.8% (95% CI; 4.2–5.6). In nine SDD study control groups the mean S. aureus bacteremia incidence is 3.8% (95% CI; 2.1–5.7) versus a benchmark derived from 10 observational groups being 2.1% (95% CI; 1.1–4.1). The incidences of S. aureus VAP and S. aureus bacteremia within the control groups of SDD studies are each higher than literature derived benchmarks. Paradoxically, within the SDD intervention groups, the incidences of both S. aureus VAP and VAP overall are more similar to the benchmarks. PMID:29300363
Improved product energy intensity benchmarking metrics for thermally concentrated food products.
Walker, Michael E; Arnold, Craig S; Lettieri, David J; Hutchins, Margot J; Masanet, Eric
2014-10-21
Product energy intensity (PEI) metrics allow industry and policymakers to quantify manufacturing energy requirements on a product-output basis. However, complexities can arise for benchmarking of thermally concentrated products, particularly in the food processing industry, due to differences in outlet composition, feed material composition, and processing technology. This study analyzes tomato paste as a typical, high-volume concentrated product using a thermodynamics-based model. Results show that PEI for tomato pastes and purees varies from 1200 to 9700 kJ/kg over the range of 8%-40% outlet solids concentration for a 3-effect evaporator, and 980-7000 kJ/kg for a 5-effect evaporator. Further, the PEI for producing paste at 31% outlet solids concentration in a 3-effect evaporator varies from 13,000 kJ/kg at 3% feed solids concentration to 5900 kJ/kg at 6%; for a 5-effect evaporator, the variation is from 9200 kJ/kg at 3%, to 4300 kJ/kg at 6%. Methods to compare the PEI of different product concentrations on a standard basis are evaluated. This paper also presents methods to develop PEI benchmark values for multiple plants. These results focus on the case of a tomato paste processing facility, but can be extended to other products and industries that utilize thermal concentration.
Extreme Magnitude Earthquakes and their Economical Consequences
NASA Astrophysics Data System (ADS)
Chavez, M.; Cabrera, E.; Ashworth, M.; Perea, N.; Emerson, D.; Salazar, A.; Moulinec, C.
2011-12-01
The frequency of occurrence of extreme magnitude earthquakes varies from tens to thousands of years, depending on the considered seismotectonic region of the world. However, the human and economic losses when their hypocenters are located in the neighborhood of heavily populated and/or industrialized regions, can be very large, as recently observed for the 1985 Mw 8.01 Michoacan, Mexico and the 2011 Mw 9 Tohoku, Japan, earthquakes. Herewith, a methodology is proposed in order to estimate the probability of exceedance of: the intensities of extreme magnitude earthquakes, PEI and of their direct economical consequences PEDEC. The PEI are obtained by using supercomputing facilities to generate samples of the 3D propagation of extreme earthquake plausible scenarios, and enlarge those samples by Monte Carlo simulation. The PEDEC are computed by using appropriate vulnerability functions combined with the scenario intensity samples, and Monte Carlo simulation. An example of the application of the methodology due to the potential occurrence of extreme Mw 8.5 subduction earthquakes on Mexico City is presented.
NASA Astrophysics Data System (ADS)
McMahon, Matthew; Poole, Patrick; Willis, Christopher; Andereck, David; Schumacher, Douglass
2014-10-01
We recently introduced liquid crystal films as on-demand, variable thickness (50-5000 nanometers), low cost targets for intense laser experiments. Here we present the first particle-in-cell (PIC) simulations of short pulse laser excitation of liquid crystal targets treating Scarlet (OSU) class lasers using the PIC code LSP. In order to accurately model the target evolution, a low starting temperature and field ionization model are employed. This is essential as large starting temperatures, often used to achieve large Debye lengths, lead to expansion of the target causing significant reduction of the target density before the laser pulse can interact. We also present an investigation of the modification of laser pulses by very thin targets. This work was supported by the DARPA PULSE program through a grant from ARMDEC, by the US Department of Energy under Contract No. DE-NA0001976, and allocations of computing time from the Ohio Supercomputing Center.
Synthesis and gas adsorption study of porous metal-organic framework materials
NASA Astrophysics Data System (ADS)
Mu, Bin
Metal-organic frameworks (MOFs) or porous coordination polymers (PCPs) have become the focus of intense study over the past decade due to their potential for advancing a variety of applications including air purification, gas storage, adsorption separations, catalysis, gas sensing, drug delivery, and so on. These materials have some distinct advantages over traditional porous materials such as the well-defined structures, uniform pore sizes, chemically functionalized sorption sites, and potential for postsynthetic modification, etc. Thus, synthesis and adsorption studies of porous MOFs have increased substantially in recent years. Among various prospective applications, air purification is one of the most immediate concerns, which has urgent requirements to improve current nuclear, biological, and chemical (NBC) filters involving commercial and military purposes. Thus, the major goal of this funded project is to search, synthesize, and test these novel hybrid porous materials for adsorptive removal of toxic industrial chemicals (TICs) and chemical warfare agents (CWAs), and to install the benchmark for new-generation NBC filters. The objective of this study is three-fold: (i) Advance our understanding of coordination chemistry by synthesizing novel MOFs and characterizing these porous coordination polymers; (ii) Evaluate porous MOF materials for gasadsorption applications including CO2 capture, CH4 storage, other light gas adsorption and separations, and examine the chemical and physical properties of these solid adsorbents including thermal stability and heat capacity of MOFs; (iii) Evaluate porous MOF materials for next-generation NBC filter media by adsorption breakthrough measurements of TICs on MOFs, and advance our understanding about structureproperty relationships of these novel adsorbents.
NAFFS: network attached flash file system for cloud storage on portable consumer electronics
NASA Astrophysics Data System (ADS)
Han, Lin; Huang, Hao; Xie, Changsheng
Cloud storage technology has become a research hotspot in recent years, while the existing cloud storage services are mainly designed for data storage needs with stable high speed Internet connection. Mobile Internet connections are often unstable and the speed is relatively low. These native features of mobile Internet limit the use of cloud storage in portable consumer electronics. The Network Attached Flash File System (NAFFS) presented the idea of taking the portable device built-in NAND flash memory as the front-end cache of virtualized cloud storage device. Modern portable devices with Internet connection have built-in more than 1GB NAND Flash, which is quite enough for daily data storage. The data transfer rate of NAND flash device is much higher than mobile Internet connections[1], and its non-volatile feature makes it very suitable as the cache device of Internet cloud storage on portable device, which often have unstable power supply and intermittent Internet connection. In the present work, NAFFS is evaluated with several benchmarks, and its performance is compared with traditional network attached file systems, such as NFS. Our evaluation results indicate that the NAFFS achieves an average accessing speed of 3.38MB/s, which is about 3 times faster than directly accessing cloud storage by mobile Internet connection, and offers a more stable interface than that of directly using cloud storage API. Unstable Internet connection and sudden power off condition are tolerable, and no data in cache will be lost in such situation.
The impact of the U.S. supercomputing initiative will be global
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crawford, Dona
2016-01-15
Last July, President Obama issued an executive order that created a coordinated federal strategy for HPC research, development, and deployment called the U.S. National Strategic Computing Initiative (NSCI). However, this bold, necessary step toward building the next generation of supercomputers has inaugurated a new era for U.S. high performance computing (HPC).
Parallel-vector solution of large-scale structural analysis problems on supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1989-01-01
A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
Predicting Hurricanes with Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2010-01-01
Hurricane Emily, formed in the Atlantic Ocean on July 10, 2005, was the strongest hurricane ever to form before August. By checking computer models against the actual path of the storm, researchers can improve hurricane prediction. In 2010, NOAA researchers were awarded 25 million processor-hours on Argonne's BlueGene/P supercomputer for the project. Read more at http://go.usa.gov/OLh
NASA Technical Reports Server (NTRS)
Peterson, Victor L.; Kim, John; Holst, Terry L.; Deiwert, George S.; Cooper, David M.; Watson, Andrew B.; Bailey, F. Ron
1992-01-01
Report evaluates supercomputer needs of five key disciplines: turbulence physics, aerodynamics, aerothermodynamics, chemistry, and mathematical modeling of human vision. Predicts these fields will require computer speed greater than 10(Sup 18) floating-point operations per second (FLOP's) and memory capacity greater than 10(Sup 15) words. Also, new parallel computer architectures and new structured numerical methods will make necessary speed and capacity available.
Advances in petascale kinetic plasma simulation with VPIC and Roadrunner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Kevin J; Albright, Brian J; Yin, Lin
2009-01-01
VPIC, a first-principles 3d electromagnetic charge-conserving relativistic kinetic particle-in-cell (PIC) code, was recently adapted to run on Los Alamos's Roadrunner, the first supercomputer to break a petaflop (10{sup 15} floating point operations per second) in the TOP500 supercomputer performance rankings. They give a brief overview of the modeling capabilities and optimization techniques used in VPIC and the computational characteristics of petascale supercomputers like Roadrunner. They then discuss three applications enabled by VPIC's unprecedented performance on Roadrunner: modeling laser plasma interaction in upcoming inertial confinement fusion experiments at the National Ignition Facility (NIF), modeling short pulse laser GeV ion acceleration andmore » modeling reconnection in magnetic confinement fusion experiments.« less
Supercomputing Sheds Light on the Dark Universe
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Salman; Heitmann, Katrin
2012-11-15
At Argonne National Laboratory, scientists are using supercomputers to shed light on one of the great mysteries in science today, the Dark Universe. With Mira, a petascale supercomputer at the Argonne Leadership Computing Facility, a team led by physicists Salman Habib and Katrin Heitmann will run the largest, most complex simulation of the universe ever attempted. By contrasting the results from Mira with state-of-the-art telescope surveys, the scientists hope to gain new insights into the distribution of matter in the universe, advancing future investigations of dark energy and dark matter into a new realm. The team's research was named amore » finalist for the 2012 Gordon Bell Prize, an award recognizing outstanding achievement in high-performance computing.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curran, L.
1988-03-03
Interest has been building in recent months over the imminent arrival of a new class of supercomputer, called the ''supercomputer on a desk'' or the single-user model. Most observers expected the first such product to come from either of two startups, Ardent Computer Corp. or Stellar Computer Inc. But a surprise entry has shown up. Apollo Computer Inc. is launching a new work station this week that racks up an impressive list of industry first as it puts supercomputer power at the disposal of a single user. The new series 10000 from the Chelmsford, Mass., a company is built aroundmore » a reduced-instruction-set architecture that the company calls Prism, for parallel reduced-instruction-set multiprocessor. This article describes the 10000 and Prism.« less
NASA Technical Reports Server (NTRS)
Murman, E. M. (Editor); Abarbanel, S. S. (Editor)
1985-01-01
Current developments and future trends in the application of supercomputers to computational fluid dynamics are discussed in reviews and reports. Topics examined include algorithm development for personal-size supercomputers, a multiblock three-dimensional Euler code for out-of-core and multiprocessor calculations, simulation of compressible inviscid and viscous flow, high-resolution solutions of the Euler equations for vortex flows, algorithms for the Navier-Stokes equations, and viscous-flow simulation by FEM and related techniques. Consideration is given to marching iterative methods for the parabolized and thin-layer Navier-Stokes equations, multigrid solutions to quasi-elliptic schemes, secondary instability of free shear flows, simulation of turbulent flow, and problems connected with weather prediction.
BCYCLIC: A parallel block tridiagonal matrix cyclic solver
NASA Astrophysics Data System (ADS)
Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.
2010-09-01
A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
High End Computing Technologies for Earth Science Applications: Trends, Challenges, and Innovations
NASA Technical Reports Server (NTRS)
Parks, John (Technical Monitor); Biswas, Rupak; Yan, Jerry C.; Brooks, Walter F.; Sterling, Thomas L.
2003-01-01
Earth science applications of the future will stress the capabilities of even the highest performance supercomputers in the areas of raw compute power, mass storage management, and software environments. These NASA mission critical problems demand usable multi-petaflops and exabyte-scale systems to fully realize their science goals. With an exciting vision of the technologies needed, NASA has established a comprehensive program of advanced research in computer architecture, software tools, and device technology to ensure that, in partnership with US industry, it can meet these demanding requirements with reliable, cost effective, and usable ultra-scale systems. NASA will exploit, explore, and influence emerging high end computing architectures and technologies to accelerate the next generation of engineering, operations, and discovery processes for NASA Enterprises. This article captures this vision and describes the concepts, accomplishments, and the potential payoff of the key thrusts that will help meet the computational challenges in Earth science applications.
NASA Astrophysics Data System (ADS)
Noumaru, Junichi; Kawai, Jun A.; Schubert, Kiaina; Yagi, Masafumi; Takata, Tadafumi; Winegar, Tom; Scanlon, Tim; Nishida, Takuhiro; Fox, Camron; Hayasaka, James; Forester, Jason; Uchida, Kenji; Nakamura, Isamu; Tom, Richard; Koura, Norikazu; Yamamoto, Tadahiro; Tanoue, Toshiya; Yamada, Toru
2008-07-01
Subaru Telescope has recently replaced most equipment of Subaru Telescope Network II with the new equipment which includes 124TB of RAID system for data archive. Switching the data storage from tape to RAID enables users to access the data faster. The STN-III dropped some important components of STN-II, such as supercomputers, development & testing subsystem for Subaru Observation Control System, or data processing subsystem. On the other hand, we invested more computers to the remote operation system. Thanks to IT innovations, our LAN as well as the network between Hilo and summit were upgraded to gigabit network at the similar or even reduced cost from the previous system. As the result of the redesigning of the computer system by more focusing on the observatory operation, we greatly reduced the total cost for computer rental, purchase and maintenance.
Direct Solve of Electrically Large Integral Equations for Problem Sizes to 1M Unknowns
NASA Technical Reports Server (NTRS)
Shaeffer, John
2008-01-01
Matrix methods for solving integral equations via direct solve LU factorization are presently limited to weeks to months of very expensive supercomputer time for problems sizes of several hundred thousand unknowns. This report presents matrix LU factor solutions for electromagnetic scattering problems for problem sizes to one million unknowns with thousands of right hand sides that run in mere days on PC level hardware. This EM solution is accomplished by utilizing the numerical low rank nature of spatially blocked unknowns using the Adaptive Cross Approximation for compressing the rank deficient blocks of the system Z matrix, the L and U factors, the right hand side forcing function and the final current solution. This compressed matrix solution is applied to a frequency domain EM solution of Maxwell's equations using standard Method of Moments approach. Compressed matrix storage and operations count leads to orders of magnitude reduction in memory and run time.
Status and future perspective of applications of high temperature superconductors
NASA Astrophysics Data System (ADS)
Tanaka, Shoji
The material research on the high temperature superconductivity for the past ten years gave us sufficient information on the new phenomena of these new materials. It seems that new applications in a very wide range of industries are increasing rapidly. In this report three main topics of the applications are given ; [a] progress of the superconducting bulk materials and their applications to the flywheel electricity storage system and others, [b] progress in the development of superconducting tapes and their applications to power cables, the high field superconducting magnet for the SMES and for the pulling system of large silicon single crystal, and [c] development of new superconducting electronic devices (SFQ) and the possiblity of the application to next generation supercomputers. These examples show the great capability of the superconductivity technology and it is expected that the real superconductivity industry will take off around the year of 2005.
Benchmarking of Touschek Beam Lifetime Calculations for the Advanced Photon Source
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiao, A.; Yang, B.
2017-06-25
Particle loss from Touschek scattering is one of the most significant issues faced by present and future synchrotron light source storage rings. For example, the predicted, Touschek-dominated beam lifetime for the Advanced Photon Source (APS) Upgrade lattice in 48-bunch, 200-mA timing mode is only ~ 2 h. In order to understand the reliability of the predicted lifetime, a series of measurements with various beam parameters was performed on the present APS storage ring. This paper first describes the entire process of beam lifetime measurement, then compares measured lifetime with the calculated one by applying the measured beam parameters. The resultsmore » show very good agreement.« less
ELSI: A unified software interface for Kohn–Sham electronic structure solvers
Yu, Victor Wen-zhe; Corsetti, Fabiano; Garcia, Alberto; ...
2017-09-15
Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aimsmore » to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. As a result, comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.« less
Characterizing quantum supremacy in near-term devices
NASA Astrophysics Data System (ADS)
Boixo, Sergio; Isakov, Sergei V.; Smelyanskiy, Vadim N.; Babbush, Ryan; Ding, Nan; Jiang, Zhang; Bremner, Michael J.; Martinis, John M.; Neven, Hartmut
2018-06-01
A critical question for quantum computing in the near future is whether quantum devices without error correction can perform a well-defined computational task beyond the capabilities of supercomputers. Such a demonstration of what is referred to as quantum supremacy requires a reliable evaluation of the resources required to solve tasks with classical approaches. Here, we propose the task of sampling from the output distribution of random quantum circuits as a demonstration of quantum supremacy. We extend previous results in computational complexity to argue that this sampling task must take exponential time in a classical computer. We introduce cross-entropy benchmarking to obtain the experimental fidelity of complex multiqubit dynamics. This can be estimated and extrapolated to give a success metric for a quantum supremacy demonstration. We study the computational cost of relevant classical algorithms and conclude that quantum supremacy can be achieved with circuits in a two-dimensional lattice of 7 × 7 qubits and around 40 clock cycles. This requires an error rate of around 0.5% for two-qubit gates (0.05% for one-qubit gates), and it would demonstrate the basic building blocks for a fault-tolerant quantum computer.
ELSI: A unified software interface for Kohn-Sham electronic structure solvers
NASA Astrophysics Data System (ADS)
Yu, Victor Wen-zhe; Corsetti, Fabiano; García, Alberto; Huhn, William P.; Jacquelin, Mathias; Jia, Weile; Lange, Björn; Lin, Lin; Lu, Jianfeng; Mi, Wenhui; Seifitokaldani, Ali; Vázquez-Mayagoitia, Álvaro; Yang, Chao; Yang, Haizhao; Blum, Volker
2018-01-01
Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aims to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.
Optimizing legacy molecular dynamics software with directive-based offload
Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; ...
2015-05-14
The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also resultmore » in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.« less
ELSI: A unified software interface for Kohn–Sham electronic structure solvers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Victor Wen-zhe; Corsetti, Fabiano; Garcia, Alberto
Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aimsmore » to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. As a result, comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.« less
Spacecraft charging analysis with the implicit particle-in-cell code iPic3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deca, J.; Lapenta, G.; Marchand, R.
2013-10-15
We present the first results on the analysis of spacecraft charging with the implicit particle-in-cell code iPic3D, designed for running on massively parallel supercomputers. The numerical algorithm is presented, highlighting the implementation of the electrostatic solver and the immersed boundary algorithm; the latter which creates the possibility to handle complex spacecraft geometries. As a first step in the verification process, a comparison is made between the floating potential obtained with iPic3D and with Orbital Motion Limited theory for a spherical particle in a uniform stationary plasma. Second, the numerical model is verified for a CubeSat benchmark by comparing simulation resultsmore » with those of PTetra for space environment conditions with increasing levels of complexity. In particular, we consider spacecraft charging from plasma particle collection, photoelectron and secondary electron emission. The influence of a background magnetic field on the floating potential profile near the spacecraft is also considered. Although the numerical approaches in iPic3D and PTetra are rather different, good agreement is found between the two models, raising the level of confidence in both codes to predict and evaluate the complex plasma environment around spacecraft.« less
DelPhiPKa web server: predicting pKa of proteins, RNAs and DNAs.
Wang, Lin; Zhang, Min; Alexov, Emil
2016-02-15
A new pKa prediction web server is released, which implements DelPhi Gaussian dielectric function to calculate electrostatic potentials generated by charges of biomolecules. Topology parameters are extended to include atomic information of nucleotides of RNA and DNA, which extends the capability of pKa calculations beyond proteins. The web server allows the end-user to protonate the biomolecule at particular pH based on calculated pKa values and provides the downloadable file in PQR format. Several tests are performed to benchmark the accuracy and speed of the protocol. The web server follows a client-server architecture built on PHP and HTML and utilizes DelPhiPKa program. The computation is performed on the Palmetto supercomputer cluster and results/download links are given back to the end-user via http protocol. The web server takes advantage of MPI parallel implementation in DelPhiPKa and can run a single job on up to 24 CPUs. The DelPhiPKa web server is available at http://compbio.clemson.edu/pka_webserver. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Entropy, pricing and macroeconomics of pumped-storage systems
NASA Astrophysics Data System (ADS)
Karakatsanis, Georgios; Mamassis, Nikos; Koutsoyiannis, Demetris; Efstratiadis, Andreas
2014-05-01
We propose a pricing scheme for the enhancement of macroeconomic performance of pumped-storage systems, based on the statistical properties of both geophysical and economic variables. The main argument consists in the need of a context of economic values concerning the hub energy resource; defined as the resource that comprises the reference energy currency for all involved renewable energy sources (RES) and discounts all related uncertainty. In the case of pumped-storage systems the hub resource is the reservoir's water, as a benchmark for all connected intermittent RES. The uncertainty of all involved natural and economic processes is statistically quantifiable by entropy. It is the relation between the entropies of all involved RES that shapes the macroeconomic state of the integrated pumped-storage system. Consequently, there must be consideration on the entropy of wind, solar and precipitation patterns, as well as on the entropy of economic processes -such as demand preferences on either current energy use or storage for future availability. For pumped-storage macroeconomics, a price on the reservoir's capacity scarcity should also be imposed in order to shape a pricing field with upper and lower limits for the long-term stability of the pricing range and positive net energy benefits, which is the primary issue of the generalized deployment of pumped-storage technology. Keywords: Entropy, uncertainty, pricing, hub energy resource, RES, energy storage, capacity scarcity, macroeconomics
Forming an ad-hoc nearby storage, based on IKAROS and social networking services
NASA Astrophysics Data System (ADS)
Filippidis, Christos; Cotronis, Yiannis; Markou, Christos
2014-06-01
We present an ad-hoc "nearby" storage, based on IKAROS and social networking services, such as Facebook. By design, IKAROS is capable to increase or decrease the number of nodes of the I/O system instance on the fly, without bringing everything down or losing data. IKAROS is capable to decide the file partition distribution schema, by taking on account requests from the user or an application, as well as a domain or a Virtual Organization policy. In this way, it is possible to form multiple instances of smaller capacity higher bandwidth storage utilities capable to respond in an ad-hoc manner. This approach, focusing on flexibility, can scale both up and down and so can provide more cost effective infrastructures for both large scale and smaller size systems. A set of experiments is performed comparing IKAROS with PVFS2 by using multiple clients requests under HPC IOR benchmark and MPICH2.
Benchmarking Deep Learning Models on Large Healthcare Datasets.
Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan
2018-06-04
Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.
Effect of cold storage and packaging material on the major aroma components of sweet cream butter.
Lozano, Patricio R; Miracle, Evan R; Krause, Andrea J; Drake, Maryanne; Cadwallader, Keith R
2007-09-19
The major aroma compounds of commercial sweet cream AA butter quarters were analyzed by GC-olfactometry and GC-MS combined with dynamic headspace analysis (DHA) and solvent-assisted flavor evaporation (SAFE). In addition, the effect of long-term storage (0, 6, and 12 months) and type of wrapping material (wax parchment paper vs foil) on the aroma components and sensory properties of these butters kept under refrigerated (4 degrees C) and frozen (-20 degrees C) storage was evaluated. The most intense compounds in the aroma of pasteurized AA butter were butanoic acid, delta-octalactone, delta-decalactone, 1-octen-3-one, 2-acetyl-1-pyrroline, dimethyl trisulfide, and diacetyl. The intensities of lipid oxidation volatiles and methyl ketones increased as a function of storage time. Refrigerated storage caused greater flavor deterioration compared with frozen storage. The intensity and relative abundance of styrene increased as a function of time of storage at refrigeration temperature. Butter kept frozen for 12 months exhibited lower styrene levels and a flavor profile more similar to that of fresh butter compared to butter refrigerated for 12 months. Foil wrapping material performed better than wax parchment paper in preventing styrene migration into butter and in minimizing the formation of lipid oxidation and hydroxyl acid products that contribute to the loss of fresh butter flavor.
Pre-Hardware Optimization and Implementation Of Fast Optics Closed Control Loop Algorithms
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Lyon, Richard G.; Herman, Jay R.; Abuhassan, Nader
2004-01-01
One of the main heritage tools used in scientific and engineering data spectrum analysis is the Fourier Integral Transform and its high performance digital equivalent - the Fast Fourier Transform (FFT). The FFT is particularly useful in two-dimensional (2-D) image processing (FFT2) within optical systems control. However, timing constraints of a fast optics closed control loop would require a supercomputer to run the software implementation of the FFT2 and its inverse, as well as other image processing representative algorithm, such as numerical image folding and fringe feature extraction. A laboratory supercomputer is not always available even for ground operations and is not feasible for a night project. However, the computationally intensive algorithms still warrant alternative implementation using reconfigurable computing technologies (RC) such as Digital Signal Processors (DSP) and Field Programmable Gate Arrays (FPGA), which provide low cost compact super-computing capabilities. We present a new RC hardware implementation and utilization architecture that significantly reduces the computational complexity of a few basic image-processing algorithm, such as FFT2, image folding and phase diversity for the NASA Solar Viewing Interferometer Prototype (SVIP) using a cluster of DSPs and FPGAs. The DSP cluster utilization architecture also assures avoidance of a single point of failure, while using commercially available hardware. This, combined with the control algorithms pre-hardware optimization, or the first time allows construction of image-based 800 Hertz (Hz) optics closed control loops on-board a spacecraft, based on the SVIP ground instrument. That spacecraft is the proposed Earth Atmosphere Solar Occultation Imager (EASI) to study greenhouse gases CO2, C2H, H2O, O3, O2, N2O from Lagrange-2 point in space. This paper provides an advanced insight into a new type of science capabilities for future space exploration missions based on on-board image processing for control and for robotics missions using vision sensors. It presents a top-level description of technologies required for the design and construction of SVIP and EASI and to advance the spatial-spectral imaging and large-scale space interferometry science and engineering.
WWTP dynamic disturbance modelling--an essential module for long-term benchmarking development.
Gernaey, K V; Rosen, C; Jeppsson, U
2006-01-01
Intensive use of the benchmark simulation model No. 1 (BSM1), a protocol for objective comparison of the effectiveness of control strategies in biological nitrogen removal activated sludge plants, has also revealed a number of limitations. Preliminary definitions of the long-term benchmark simulation model No. 1 (BSM1_LT) and the benchmark simulation model No. 2 (BSM2) have been made to extend BSM1 for evaluation of process monitoring methods and plant-wide control strategies, respectively. Influent-related disturbances for BSM1_LT/BSM2 are to be generated with a model, and this paper provides a general overview of the modelling methods used. Typical influent dynamic phenomena generated with the BSM1_LT/BSM2 influent disturbance model, including diurnal, weekend, seasonal and holiday effects, as well as rainfall, are illustrated with simulation results. As a result of the work described in this paper, a proposed influent model/file has been released to the benchmark developers for evaluation purposes. Pending this evaluation, a final BSM1_LT/BSM2 influent disturbance model definition is foreseen. Preliminary simulations with dynamic influent data generated by the influent disturbance model indicate that default BSM1 activated sludge plant control strategies will need extensions for BSM1_LT/BSM2 to efficiently handle 1 year of influent dynamics.
None
2018-05-01
A new Idaho National Laboratory supercomputer is helping scientists create more realistic simulations of nuclear fuel. Dubbed "Ice Storm" this 2048-processor machine allows researchers to model and predict the complex physics behind nuclear reactor behavior. And with a new visualization lab, the team can see the results of its simulations on the big screen. For more information about INL research, visit http://www.facebook.com/idahonationallaboratory.
Open Skies Project Computational Fluid Dynamic Analysis
1994-03-01
109 -. -_ _ 9 . CONCLUSIONSI1 f 10. LIST OF REFERENCES _________ ___________112 APPENDIX A: Transition Prediction __________________116 B...Behind the Open Skies Plate 20 8. VSAERO Results on the Alternate Fairing 21 9 . Centerline Cp Comparisons 22 10. VSAERO Wing Effects Study Centerline C...problems. The assistance Mrs. Mary Ann Mages, at Kirtland Supercomputer Center ( PL /SCPR) gave by setting a precedent for supercomputer account
Porting Ordinary Applications to Blue Gene/Q Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maheshwari, Ketan C.; Wozniak, Justin M.; Armstrong, Timothy
2015-08-31
Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques-sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt'smore » sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.« less
Canopy storage capacity of xerophytic shrubs in Northwestern China
NASA Astrophysics Data System (ADS)
Wang, Xin-ping; Zhang, Ya-feng; Hu, Rui; Pan, Yan-xia; Berndtsson, Ronny
2012-08-01
SummaryThe capacity of shrub canopy water storage is a key factor in controlling the rainfall interception. Thus, it affects a variety of hydrological processes in water-limited arid desert ecosystems. Vast areas of revegetated desert ecosystems in Northwestern China are occupied by shrub and dwarf shrub communities. Yet, data are still scarce regarding their rainwater storage capacity. In this study, simulated rainfall tests were conducted in controlled conditions for three dominant xerophytic shrub types in the arid Tengger Desert. Eight rainfall intensities varying from 1.15 to 11.53 mm h-1 were used to determine the canopy water storage capacity. The simulated rainfall intensities were selected according to the long-term rainfall records in the study area. The results indicate that canopy storage capacity (expressed in water storage per leaf area, canopy projection area, biomass, and volume of shrub respectively) increased exponentially with increase in rainfall intensity for the selected shrubs. Linear relationships were found between canopy storage capacity and leaf area (LA) or leaf area index (LAI), although there was a striking difference in correlation between storage capacity and LA or LAI of Artemisia ordosica compared to Caragana korshinskii and Hedysarum scoparium. This is a result of differences in biometric characteristics, especially canopy morphology between the shrub species. Pearson correlation coefficient indicated that LA and dry biomass are better predictors as compared to canopy projection area and volume of samples for precise estimation of canopy water storage capacity. In terms of unit leaf area, mean storage capacity was 0.39 mm (range of 0.24-0.53 mm), 0.43 mm (range of 0.28-0.60 mm), and 0.61 mm (range of 0.29-0.89 mm) for C. korshinskii, H. scoparium, and A. ordosica, respectively. Correspondingly, divided per unit dry biomass, mean storage capacity was 0.51 g g-1 (range of 0.30-0.70 g g-1), 0.41 g g-1 (range of 0.26-0.57 g g-1), and 0.73 g g-1 (range of 0.38-1.05 g g-1) for C. korshinskii, H. scoparium, and A. ordosica, respectively, when the rainfall intensities ranged from 1.15, 2.31, 3.46, 4.61, 6.92, 9.23 to 11.53 mm h-1. The needle-leaved species A. ordosica had a higher canopy water storage capacity than the ovate-leaved species C. korshinskii and H. scoparium at the same magnitude of rainfall intensity, except for C. korshinskii when it was expressed in unit of canopy projection area. Consequently, A. ordosica will generate higher interception losses as compared to C. korshinskii and H. scoparium. This is especially the case as it often forms dense dwarf shrub communities despite its small size.
Site in a box: Improving the Tier 3 experience
NASA Astrophysics Data System (ADS)
Dost, J. M.; Fajardo, E. M.; Jones, T. R.; Martin, T.; Tadel, A.; Tadel, M.; Würthwein, F.
2017-10-01
The Pacific Research Platform is an initiative to interconnect Science DMZs between campuses across the West Coast of the United States over a 100 gbps network. The LHC @ UC is a proof of concept pilot project that focuses on interconnecting 6 University of California campuses. It is spearheaded by computing specialists from the UCSD Tier 2 Center in collaboration with the San Diego Supercomputer Center. A machine has been shipped to each campus extending the concept of the Data Transfer Node to a cluster in a box that is fully integrated into the local compute, storage, and networking infrastructure. The node contains a full HTCondor batch system, and also an XRootD proxy cache. User jobs routed to the DTN can run on 40 additional slots provided by the machine, and can also flock to a common GlideinWMS pilot pool, which sends jobs out to any of the participating UCs, as well as to Comet, the new supercomputer at SDSC. In addition, a common XRootD federation has been created to interconnect the UCs and give the ability to arbitrarily export data from the home university, to make it available wherever the jobs run. The UC level federation also statically redirects to either the ATLAS FAX or CMS AAA federation respectively to make globally published datasets available, depending on end user VO membership credentials. XRootD read operations from the federation transfer through the nearest DTN proxy cache located at the site where the jobs run. This reduces wide area network overhead for subsequent accesses, and improves overall read performance. Details on the technical implementation, challenges faced and overcome in setting up the infrastructure, and an analysis of usage patterns and system scalability will be presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nash, T.; Atac, R.; Cook, A.
1989-03-06
The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
A Communication-Optimal Framework for Contracting Distributed Tensors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; NIkam, Akshay; Lai, Pai-Wei
Tensor contractions are extremely compute intensive generalized matrix multiplication operations encountered in many computational science fields, such as quantum chemistry and nuclear physics. Unlike distributed matrix multiplication, which has been extensively studied, limited work has been done in understanding distributed tensor contractions. In this paper, we characterize distributed tensor contraction algorithms on torus networks. We develop a framework with three fundamental communication operators to generate communication-efficient contraction algorithms for arbitrary tensor contractions. We show that for a given amount of memory per processor, our framework is communication optimal for all tensor contractions. We demonstrate performance and scalability of our frameworkmore » on up to 262,144 cores of BG/Q supercomputer using five tensor contraction examples.« less
Recent advances in lossy compression of scientific floating-point data
NASA Astrophysics Data System (ADS)
Lindstrom, P.
2017-12-01
With a continuing exponential trend in supercomputer performance, ever larger data sets are being generated through numerical simulation. Bandwidth and storage capacity are, however, not keeping pace with this increase in data size, causing significant data movement bottlenecks in simulation codes and substantial monetary costs associated with archiving vast volumes of data. Worse yet, ever smaller fractions of data generated can be stored for further analysis, where scientists frequently rely on decimating or averaging large data sets in time and/or space. One way to mitigate these problems is to employ data compression to reduce data volumes. However, lossless compression of floating-point data can achieve only very modest size reductions on the order of 10-50%. We present ZFP and FPZIP, two state-of-the-art lossy compressors for structured floating-point data that routinely achieve one to two orders of magnitude reduction with little to no impact on the accuracy of visualization and quantitative data analysis. We provide examples of the use of such lossy compressors in climate and seismic modeling applications to effectively accelerate I/O and reduce storage requirements. We further discuss how the design decisions behind these and other compressors impact error distributions and other statistical and differential properties, including derived quantities of interest relevant to each science application.
Yang, Meng; Li, Yong Fu; Li, Yong Chun; Xiao, Yong Heng; Yue, Tian; Jiang, Pei Kun; Zhou, Guo Mo; Liu, Juan
2016-11-18
In order to elucidate the effects of intensive management on soil carbon pool, nitrogen pool, enzyme activities in Moso bamboo (Phyllostachys pubescens) plantations, we collected soil samples from the soil surface (0-20 cm) and subsurface (20-40 cm) layers in the adjacent Moso bamboo plantations with extensive and intensive managements in Sankou Township, Lin'an City, Zhejiang Province. We determined different forms of C, N and soil invertase, urease, catalase and acid phosphatase activities. The results showed that long-term intensive management of Moso bamboo plantations significantly decreased the content and storage of soil organic carbon (SOC), with the SOC storage in the soil surface and subsurface layers decreased by 13.2% and 18.0%, respectively. After 15 years' intensive management of Masoo bamboo plantations, the contents of soil water soluble carbon (WSOC), hot water soluble carbon (HWSOC), microbial carbon (MBC) and readily oxidizable carbon (ROC) were significantly decreased in the soil surface and subsurface layers. The soil N storage in the soil surface and subsurface layers in intensively managed Moso bamboo plantations increased by 50.8% and 36.6%, respectively. Intensive management significantly increased the contents of nitrate-N (NO 3 - -N) and ammonium-N (NH 4 + -N), but decreased the contents of water-soluble nitrogen (WSON) and microbial biomass nitrogen (MBN). After 15 years' intensive management of Masoo bamboo plantations, the soil invertase, urease, catalase and acid phosphatase activities in the soil surface layer were significantly decreased, the soil acid phosphatase activity in the soil subsurface layer were significantly decreased, and other enzyme activities in the soil subsurface layer did not change. In conclusion, long-term intensive management led to a significant decline of soil organic carbon storage, soil labile carbon and microbial activity in Moso bamboo plantations. Therefore, we should consider the use of organic fertilizer in the intensive mana-gement process for the sustainable management of Moso bamboo plantations in the future.
STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.
Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X
2009-08-01
This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.
Japanese project aims at supercomputer that executes 10 gflops
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burskey, D.
1984-05-03
Dubbed supercom by its multicompany design team, the decade-long project's goal is an engineering supercomputer that can execute 10 billion floating-point operations/s-about 20 times faster than today's supercomputers. The project, guided by Japan's Ministry of International Trade and Industry (MITI) and the Agency of Industrial Science and Technology encompasses three parallel research programs, all aimed at some angle of the superconductor. One program should lead to superfast logic and memory circuits, another to a system architecture that will afford the best performance, and the last to the software that will ultimately control the computer. The work on logic and memorymore » chips is based on: GAAS circuit; Josephson junction devices; and high electron mobility transistor structures. The architecture will involve parallel processing.« less
Sirocco Storage Server v. pre-alpha 0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curry, Matthew L.; Danielson, Geoffrey; Ward, H. Lee
Sirocco is a parallel storage system under development, designed for write-intensive workloads on large-scale HPC platforms. It implements a keyvalue object store on top of a set of loosely federated storage servers that cooperate to ensure data integrity and performance. It includes support for a range of different types of storage transactions. This software release constitutes a conformant storage server, along with the client-side libraries to access the storage over a network.
NASA Astrophysics Data System (ADS)
Wyborn, L. A.; Evans, B. J. K.; Pugh, T.; Lescinsky, D. T.; Foster, C.; Uhlherr, A.
2014-12-01
The National Computational Infrastructure (NCI) at the Australian National University (ANU) is a partnership between CSIRO, ANU, Bureau of Meteorology (BoM) and Geoscience Australia. Recent investments in a 1.2 PFlop Supercomputer (Raijin), ~ 20 PB data storage using Lustre filesystems and a 3000 core high performance cloud have created a hybrid platform for higher performance computing and data-intensive science to enable large scale earth and climate systems modelling and analysis. There are > 3000 users actively logging in and > 600 projects on the NCI system. Efficiently scaling and adapting data and software systems to petascale infrastructures requires the collaborative development of an architecture that is designed, programmed and operated to enable users to interactively invoke different forms of in-situ computation over complex and large scale data collections. NCI makes available major and long tail data collections from both the government and research sectors based on six themes: 1) weather, climate and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology and 6) astronomy, bio and social. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. Collections are the operational form for data management and access. Similar data types from individual custodians are managed cohesively. Use of international standards for discovery and interoperability allow complex interactions within and between the collections. This design facilitates a transdisciplinary approach to research and enables a shift from small scale, 'stove-piped' science efforts to large scale, collaborative systems science. This new and complex infrastructure requires a move to shared, globally trusted software frameworks that can be maintained and updated. Workflow engines become essential and need to integrate provenance, versioning, traceability, repeatability and publication. There are also human resource challenges as highly skilled HPC/HPD specialists, specialist programmers, and data scientists are required whose skills can support scaling to the new paradigm of effective and efficient data-intensive earth science analytics on petascale, and soon to be exascale systems.
NASA Astrophysics Data System (ADS)
Zhizhin, M.; Poyda, A.; Velikhov, V.; Novikov, A.; Polyakov, A.
2016-02-01
All Most of the remote sensing applications rely on the daytime visible and infrared images of the Earth surface. Increase in the number of satellites, their spatial resolution as well as the number of the simultaneously observed spectral bands ensure a steady growth of the data volumes and computational complexity in the remote sensing sciences. Recent advance in the night time remote sensing is related to the enhanced sensitivity of the on-board instruments and to the unique opportunity to observe “pure” emitters in visible infrared spectra without contamination from solar heat and reflected light. A candidate set of the night-time emitters observable from the low-orbiting and geostationary satellites include steady state and temporal changes in the city and traffic electric lights, fishing boats, high-temperature industrial objects such as steel mills, oil cracking refineries and power plants, forest and agricultural fires, gas flares, volcanic eruptions and similar catastrophic events. Current satellite instruments can detect at night 10 times more of such objects compared to daytime. We will present a new data-intensive workflow of the night time remote sensing algorithms for map-reduce processing of visible and infrared images from the multispectral radiometers flown by the modern NOAA/NASA Suomi NPP and the USGS Landsat 8 satellites. Similar radiometers are installed on the new generation of the US geostationary GOES-R satellite to be launched in 2016. The new set of algorithms allows us to detect with confidence and track the abrupt changes and long-term trends in the energy of city lights, number of fishing boats, as well as the size, geometry, temperature of gas flares and to estimate monthly and early flared gas volumes by site or by country. For real-time analysis of the night time multispectral satellite images with global coverage we need gigabit network, petabyte data storage and parallel compute cluster with more than 20 nodes. To meet the processing requirements, we have used the supercomputer at the Kurchatov Institute in Moscow.
Alternative industrial carbon emissions benchmark based on input-output analysis
NASA Astrophysics Data System (ADS)
Han, Mengyao; Ji, Xi
2016-12-01
Some problems exist in the current carbon emissions benchmark setting systems. The primary consideration for industrial carbon emissions standards highly relate to direct carbon emissions (power-related emissions) and only a portion of indirect emissions are considered in the current carbon emissions accounting processes. This practice is insufficient and may cause double counting to some extent due to mixed emission sources. To better integrate and quantify direct and indirect carbon emissions, an embodied industrial carbon emissions benchmark setting method is proposed to guide the establishment of carbon emissions benchmarks based on input-output analysis. This method attempts to link direct carbon emissions with inter-industrial economic exchanges and systematically quantifies carbon emissions embodied in total product delivery chains. The purpose of this study is to design a practical new set of embodied intensity-based benchmarks for both direct and indirect carbon emissions. Beijing, at the first level of carbon emissions trading pilot schemes in China, plays a significant role in the establishment of these schemes and is chosen as an example in this study. The newly proposed method tends to relate emissions directly to each responsibility in a practical way through the measurement of complex production and supply chains and reduce carbon emissions from their original sources. This method is expected to be developed under uncertain internal and external contexts and is further expected to be generalized to guide the establishment of industrial benchmarks for carbon emissions trading schemes in China and other countries.
Toward a Big Data Science: A challenge of "Science Cloud"
NASA Astrophysics Data System (ADS)
Murata, Ken T.; Watanabe, Hidenobu
2013-04-01
During these 50 years, along with appearance and development of high-performance computers (and super-computers), numerical simulation is considered to be a third methodology for science, following theoretical (first) and experimental and/or observational (second) approaches. The variety of data yielded by the second approaches has been getting more and more. It is due to the progress of technologies of experiments and observations. The amount of the data generated by the third methodologies has been getting larger and larger. It is because of tremendous development and programming techniques of super computers. Most of the data files created by both experiments/observations and numerical simulations are saved in digital formats and analyzed on computers. The researchers (domain experts) are interested in not only how to make experiments and/or observations or perform numerical simulations, but what information (new findings) to extract from the data. However, data does not usually tell anything about the science; sciences are implicitly hidden in the data. Researchers have to extract information to find new sciences from the data files. This is a basic concept of data intensive (data oriented) science for Big Data. As the scales of experiments and/or observations and numerical simulations get larger, new techniques and facilities are required to extract information from a large amount of data files. The technique is called as informatics as a fourth methodology for new sciences. Any methodologies must work on their facilities: for example, space environment are observed via spacecraft and numerical simulations are performed on super-computers, respectively in space science. The facility of the informatics, which deals with large-scale data, is a computational cloud system for science. This paper is to propose a cloud system for informatics, which has been developed at NICT (National Institute of Information and Communications Technology), Japan. The NICT science cloud, we named as OneSpaceNet (OSN), is the first open cloud system for scientists who are going to carry out their informatics for their own science. The science cloud is not for simple uses. Many functions are expected to the science cloud; such as data standardization, data collection and crawling, large and distributed data storage system, security and reliability, database and meta-database, data stewardship, long-term data preservation, data rescue and preservation, data mining, parallel processing, data publication and provision, semantic web, 3D and 4D visualization, out-reach and in-reach, and capacity buildings. Figure (not shown here) is a schematic picture of the NICT science cloud. Both types of data from observation and simulation are stored in the storage system in the science cloud. It should be noted that there are two types of data in observation. One is from archive site out of the cloud: this is a data to be downloaded through the Internet to the cloud. The other one is data from the equipment directly connected to the science cloud. They are often called as sensor clouds. In the present talk, we first introduce the NICT science cloud. We next demonstrate the efficiency of the science cloud, showing several scientific results which we achieved with this cloud system. Through the discussions and demonstrations, the potential performance of sciences cloud will be revealed for any research fields.
The storage system of PCM based on random access file system
NASA Astrophysics Data System (ADS)
Han, Wenbing; Chen, Xiaogang; Zhou, Mi; Li, Shunfen; Li, Gezi; Song, Zhitang
2016-10-01
Emerging memory technologies such as Phase change memory (PCM) tend to offer fast, random access to persistent storage with better scalability. It's a hot topic of academic and industrial research to establish PCM in storage hierarchy to narrow the performance gap. However, the existing file systems do not perform well with the emerging PCM storage, which access storage medium via a slow, block-based interface. In this paper, we propose a novel file system, RAFS, to bring about good performance of PCM, which is built in the embedded platform. We attach PCM chips to the memory bus and build RAFS on the physical address space. In the proposed file system, we simplify traditional system architecture to eliminate block-related operations and layers. Furthermore, we adopt memory mapping and bypassed page cache to reduce copy overhead between the process address space and storage device. XIP mechanisms are also supported in RAFS. To the best of our knowledge, we are among the first to implement file system on real PCM chips. We have analyzed and evaluated its performance with IOZONE benchmark tools. Our experimental results show that the RAFS on PCM outperforms Ext4fs on SDRAM with small record lengths. Based on DRAM, RAFS is significantly faster than Ext4fs by 18% to 250%.
Japanese supercomputer technology.
Buzbee, B L; Ewald, R H; Worlton, W J
1982-12-17
Under the auspices of the Ministry for International Trade and Industry the Japanese have launched a National Superspeed Computer Project intended to produce high-performance computers for scientific computation and a Fifth-Generation Computer Project intended to incorporate and exploit concepts of artificial intelligence. If these projects are successful, which appears likely, advanced economic and military research in the United States may become dependent on access to supercomputers of foreign manufacture.
Supercomputer Simulations Help Develop New Approach to Fight Antibiotic Resistance
Zgurskaya, Helen; Smith, Jeremy
2018-06-13
ORNL leveraged powerful supercomputing to support research led by University of Oklahoma scientists to identify chemicals that seek out and disrupt bacterial proteins called efflux pumps, known to be a major cause of antibiotic resistance. By running simulations on Titan, the team selected molecules most likely to target and potentially disable the assembly of efflux pumps found in E. coli bacteria cells.
Effect of storage time on gene expression data acquired from unfrozen archived newborn blood spots.
Ho, Nhan T; Busik, Julia V; Resau, James H; Paneth, Nigel; Khoo, Sok Kean
2016-11-01
Unfrozen archived newborn blood spots (NBS) have been shown to retain sufficient messenger RNA (mRNA) for gene expression profiling. However, the effect of storage time at ambient temperature for NBS samples in relation to the quality of gene expression data is relatively unknown. Here, we evaluated mRNA expression from quantitative real-time PCR (qRT-PCR) and microarray data obtained from NBS samples stored at ambient temperature to determine the effect of storage time on the quality of gene expression. These data were generated in a previous case-control study examining NBS in 53 children with cerebral palsy (CP) and 53 matched controls. NBS sample storage period ranged from 3 to 16years at ambient temperature. We found persistently low RNA integrity numbers (RIN=2.3±0.71) and 28S/18S rRNA ratios (~0) across NBS samples for all storage periods. In both qRT-PCR and microarray data, the expression of three common housekeeping genes-beta cytoskeletal actin (ACTB), glyceraldehyde 3-phosphate dehydrogenase (GAPDH), and peptidylprolyl isomerase A (PPIA)-decreased with increased storage time. Median values of each microarray probe intensity at log 2 scale also decreased over time. After eight years of storage, probe intensity values were largely reduced to background intensity levels. Of 21,500 genes tested, 89% significantly decreased in signal intensity, with 13,551, 10,730, and 9925 genes detected within 5years, > 5 to <10years, and >10years of storage, respectively. We also examined the expression of two gender-specific genes (X inactivation-specific transcript, XIST and lysine-specific demethylase 5D, KDM5D) and seven gene sets representing the inflammatory, hypoxic, coagulative, and thyroidal pathways hypothesized to be related to CP risk to determine the effect of storage time on the detection of these biologically relevant genes. We found the gender-specific genes and CP-related gene sets detectable in all storage periods, but exhibited differential expression (between male vs. female or CP vs. control) only within the first six years of storage. We concluded that gene expression data quality deteriorates in unfrozen archived NBS over time and that differential gene expression profiling and analysis is recommended for those NBS samples collected and stored within six years at ambient temperature. Copyright © 2016 Elsevier Inc. All rights reserved.
Cloud-Based Numerical Weather Prediction for Near Real-Time Forecasting and Disaster Response
NASA Technical Reports Server (NTRS)
Molthan, Andrew; Case, Jonathan; Venners, Jason; Schroeder, Richard; Checchi, Milton; Zavodsky, Bradley; Limaye, Ashutosh; O'Brien, Raymond
2015-01-01
The use of cloud computing resources continues to grow within the public and private sector components of the weather enterprise as users become more familiar with cloud-computing concepts, and competition among service providers continues to reduce costs and other barriers to entry. Cloud resources can also provide capabilities similar to high-performance computing environments, supporting multi-node systems required for near real-time, regional weather predictions. Referred to as "Infrastructure as a Service", or IaaS, the use of cloud-based computing hardware in an on-demand payment system allows for rapid deployment of a modeling system in environments lacking access to a large, supercomputing infrastructure. Use of IaaS capabilities to support regional weather prediction may be of particular interest to developing countries that have not yet established large supercomputing resources, but would otherwise benefit from a regional weather forecasting capability. Recently, collaborators from NASA Marshall Space Flight Center and Ames Research Center have developed a scripted, on-demand capability for launching the NOAA/NWS Science and Training Resource Center (STRC) Environmental Modeling System (EMS), which includes pre-compiled binaries of the latest version of the Weather Research and Forecasting (WRF) model. The WRF-EMS provides scripting for downloading appropriate initial and boundary conditions from global models, along with higher-resolution vegetation, land surface, and sea surface temperature data sets provided by the NASA Short-term Prediction Research and Transition (SPoRT) Center. This presentation will provide an overview of the modeling system capabilities and benchmarks performed on the Amazon Elastic Compute Cloud (EC2) environment. In addition, the presentation will discuss future opportunities to deploy the system in support of weather prediction in developing countries supported by NASA's SERVIR Project, which provides capacity building activities in environmental monitoring and prediction across a growing number of regional hubs throughout the world. Capacity-building applications that extend numerical weather prediction to developing countries are intended to provide near real-time applications to benefit public health, safety, and economic interests, but may have a greater impact during disaster events by providing a source for local predictions of weather-related hazards, or impacts that local weather events may have during the recovery phase.
Aviation Research and the Internet
NASA Technical Reports Server (NTRS)
Scott, Antoinette M.
1995-01-01
The Internet is a network of networks. It was originally funded by the Defense Advanced Research Projects Agency or DOD/DARPA and evolved in part from the connection of supercomputer sites across the United States. The National Science Foundation (NSF) made the most of their supercomputers by connecting the sites to each other. This made the supercomputers more efficient and now allows scientists, engineers and researchers to access the supercomputers from their own labs and offices. The high speed networks that connect the NSF supercomputers form the backbone of the Internet. The World Wide Web (WWW) is a menu system. It gathers Internet resources from all over the world into a series of screens that appear on your computer. The WWW is also a distributed. The distributed system stores data information on many computers (servers). These servers can go out and get data when you ask for it. Hypermedia is the base of the WWW. One can 'click' on a section and visit other hypermedia (pages). Our approach to demonstrating the importance of aviation research through the Internet began with learning how to put pages on the Internet (on-line) ourselves. We were assigned two aviation companies; Vision Micro Systems Inc. and Innovative Aerodynamic Technologies (IAT). We developed home pages for these SBIR companies. The equipment used to create the pages were the UNIX and Macintosh machines. HTML Supertext software was used to write the pages and the Sharp JX600S scanner to scan the images. As a result, with the use of the UNIX, Macintosh, Sun, PC, and AXIL machines, we were able to present our home pages to over 800,000 visitors.
Next Generation Security for the 10,240 Processor Columbia System
NASA Technical Reports Server (NTRS)
Hinke, Thomas; Kolano, Paul; Shaw, Derek; Keller, Chris; Tweton, Dave; Welch, Todd; Liu, Wen (Betty)
2005-01-01
This presentation includes a discussion of the Columbia 10,240-processor system located at the NASA Advanced Supercomputing (NAS) division at the NASA Ames Research Center which supports each of NASA's four missions: science, exploration systems, aeronautics, and space operations. It is comprised of 20 Silicon Graphics nodes, each consisting of 512 Itanium II processors. A 64 processor Columbia front-end system supports users as they prepare their jobs and then submits them to the PBS system. Columbia nodes and front-end systems use the Linux OS. Prior to SC04, the Columbia system was used to attain a processing speed of 51.87 TeraFlops, which made it number two on the Top 500 list of the world's supercomputers and the world's fastest "operational" supercomputer since it was fully engaged in supporting NASA users.
Peng, Bo; Kowalski, Karol
2017-09-12
The representation and storage of two-electron integral tensors are vital in large-scale applications of accurate electronic structure methods. Low-rank representation and efficient storage strategy of integral tensors can significantly reduce the numerical overhead and consequently time-to-solution of these methods. In this work, by combining pivoted incomplete Cholesky decomposition (CD) with a follow-up truncated singular vector decomposition (SVD), we develop a decomposition strategy to approximately represent the two-electron integral tensor in terms of low-rank vectors. A systematic benchmark test on a series of 1-D, 2-D, and 3-D carbon-hydrogen systems demonstrates high efficiency and scalability of the compound two-step decomposition of the two-electron integral tensor in our implementation. For the size of the atomic basis set, N b , ranging from ∼100 up to ∼2,000, the observed numerical scaling of our implementation shows [Formula: see text] versus [Formula: see text] cost of performing single CD on the two-electron integral tensor in most of the other implementations. More importantly, this decomposition strategy can significantly reduce the storage requirement of the atomic orbital (AO) two-electron integral tensor from [Formula: see text] to [Formula: see text] with moderate decomposition thresholds. The accuracy tests have been performed using ground- and excited-state formulations of coupled cluster formalism employing single and double excitations (CCSD) on several benchmark systems including the C 60 molecule described by nearly 1,400 basis functions. The results show that the decomposition thresholds can be generally set to 10 -4 to 10 -3 to give acceptable compromise between efficiency and accuracy.
NASA Astrophysics Data System (ADS)
Trindade, B. C.; Reed, P. M.
2017-12-01
The growing access and reduced cost for computing power in recent years has promoted rapid development and application of multi-objective water supply portfolio planning. As this trend continues there is a pressing need for flexible risk-based simulation frameworks and improved algorithm benchmarking for emerging classes of water supply planning and management problems. This work contributes the Water Utilities Management and Planning (WUMP) model: a generalizable and open source simulation framework designed to capture how water utilities can minimize operational and financial risks by regionally coordinating planning and management choices, i.e. making more efficient and coordinated use of restrictions, water transfers and financial hedging combined with possible construction of new infrastructure. We introduce the WUMP simulation framework as part of a new multi-objective benchmark problem for planning and management of regionally integrated water utility companies. In this problem, a group of fictitious water utilities seek to balance the use of the mentioned reliability driven actions (e.g., restrictions, water transfers and infrastructure pathways) and their inherent financial risks. Several traits of this problem make it ideal for a benchmark problem, namely the presence of (1) strong non-linearities and discontinuities in the Pareto front caused by the step-wise nature of the decision making formulation and by the abrupt addition of storage through infrastructure construction, (2) noise due to the stochastic nature of the streamflows and water demands, and (3) non-separability resulting from the cooperative formulation of the problem, in which decisions made by stakeholder may substantially impact others. Both the open source WUMP simulation framework and its demonstration in a challenging benchmarking example hold value for promoting broader advances in urban water supply portfolio planning for regions confronting change.
CFD applications: The Lockheed perspective
NASA Technical Reports Server (NTRS)
Miranda, Luis R.
1987-01-01
The Numerical Aerodynamic Simulator (NAS) epitomizes the coming of age of supercomputing and opens exciting horizons in the world of numerical simulation. An overview of supercomputing at Lockheed Corporation in the area of Computational Fluid Dynamics (CFD) is presented. This overview will focus on developments and applications of CFD as an aircraft design tool and will attempt to present an assessment, withing this context, of the state-of-the-art in CFD methodology.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
A Heterogeneous High-Performance System for Computational and Computer Science
2016-11-15
Patents Submitted Patents Awarded Awards Graduate Students Names of Post Doctorates Names of Faculty Supported Names of Under Graduate students supported...team of research faculty from the departments of computer science and natural science at Bowie State University. The supercomputer is not only to...accelerated HPC systems. The supercomputer is also ideal for the research conducted in the Department of Natural Science, as research faculty work on
LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments
2015-11-20
1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming
Advanced Numerical Techniques of Performance Evaluation. Volume 1
1990-06-01
system scheduling3thread. The scheduling thread then runs any other ready thread that can be found. A thread can only sleep or switch out on itself...Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers C...Kuck 1987] C.D. Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Comp
Large-scale atomistic calculations of clusters in intense x-ray pulses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ho, Phay J.; Knight, Chris
Here, we present the methodology of our recently developed Monte-Carlo/ Molecular-Dynamics method for studying the fundamental ultrafast dynamics induced by high-fluence, high-intensity x-ray free electron laser (XFEL) pulses in clusters. The quantum nature of the initiating ionization process is accounted for by a Monte Carlo method to calculate probabilities of electronic transitions, including photo absorption, inner-shell relaxation, photon scattering, electron collision and recombination dynamics, and thus track the transient electronic configurations explicitly. The freed electrons and ions are followed by classical particle trajectories using a molecular dynamics algorithm. These calculations reveal the surprising role of electron-ion recombination processes that leadmore » to the development of nonuniform spatial charge density profiles in x-ray excited clusters over femtosecond timescales. In the high-intensity limit, it is important to include the recombination dynamics in the calculated scattering response even for a 2- fs pulse. We also demonstrate that our numerical codes and algorithms can make e!cient use of the computational power of massively parallel supercomputers to investigate the intense-field dynamics in systems with increasing complexity and size at the ultrafast timescale and in non-linear x-ray interaction regimes. In particular, picosecond trajectories of XFEL clusters with attosecond time resolution containing millions of particles can be e!ciently computed on upwards of 262,144 processes.« less
Large-scale atomistic calculations of clusters in intense x-ray pulses
Ho, Phay J.; Knight, Chris
2017-04-28
Here, we present the methodology of our recently developed Monte-Carlo/ Molecular-Dynamics method for studying the fundamental ultrafast dynamics induced by high-fluence, high-intensity x-ray free electron laser (XFEL) pulses in clusters. The quantum nature of the initiating ionization process is accounted for by a Monte Carlo method to calculate probabilities of electronic transitions, including photo absorption, inner-shell relaxation, photon scattering, electron collision and recombination dynamics, and thus track the transient electronic configurations explicitly. The freed electrons and ions are followed by classical particle trajectories using a molecular dynamics algorithm. These calculations reveal the surprising role of electron-ion recombination processes that leadmore » to the development of nonuniform spatial charge density profiles in x-ray excited clusters over femtosecond timescales. In the high-intensity limit, it is important to include the recombination dynamics in the calculated scattering response even for a 2- fs pulse. We also demonstrate that our numerical codes and algorithms can make e!cient use of the computational power of massively parallel supercomputers to investigate the intense-field dynamics in systems with increasing complexity and size at the ultrafast timescale and in non-linear x-ray interaction regimes. In particular, picosecond trajectories of XFEL clusters with attosecond time resolution containing millions of particles can be e!ciently computed on upwards of 262,144 processes.« less
A novel content-based active contour model for brain tumor segmentation.
Sachdeva, Jainy; Kumar, Vinod; Gupta, Indra; Khandelwal, Niranjan; Ahuja, Chirag Kamal
2012-06-01
Brain tumor segmentation is a crucial step in surgical and treatment planning. Intensity-based active contour models such as gradient vector flow (GVF), magneto static active contour (MAC) and fluid vector flow (FVF) have been proposed to segment homogeneous objects/tumors in medical images. In this study, extensive experiments are done to analyze the performance of intensity-based techniques for homogeneous tumors on brain magnetic resonance (MR) images. The analysis shows that the state-of-art methods fail to segment homogeneous tumors against similar background or when these tumors show partial diversity toward the background. They also have preconvergence problem in case of false edges/saddle points. However, the presence of weak edges and diffused edges (due to edema around the tumor) leads to oversegmentation by intensity-based techniques. Therefore, the proposed method content-based active contour (CBAC) uses both intensity and texture information present within the active contour to overcome above-stated problems capturing large range in an image. It also proposes a novel use of Gray-Level Co-occurrence Matrix to define texture space for tumor segmentation. The effectiveness of this method is tested on two different real data sets (55 patients - more than 600 images) containing five different types of homogeneous, heterogeneous, diffused tumors and synthetic images (non-MR benchmark images). Remarkable results are obtained in segmenting homogeneous tumors of uniform intensity, complex content heterogeneous, diffused tumors on MR images (T1-weighted, postcontrast T1-weighted and T2-weighted) and synthetic images (non-MR benchmark images of varying intensity, texture, noise content and false edges). Further, tumor volume is efficiently extracted from 2-dimensional slices and is named as 2.5-dimensional segmentation. Copyright © 2012 Elsevier Inc. All rights reserved.
Reactor Pressure Vessel Fracture Analysis Capabilities in Grizzly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spencer, Benjamin; Backman, Marie; Chakraborty, Pritam
2015-03-01
Efforts have been underway to develop fracture mechanics capabilities in the Grizzly code to enable it to be used to perform deterministic fracture assessments of degraded reactor pressure vessels (RPVs). Development in prior years has resulted a capability to calculate -integrals. For this application, these are used to calculate stress intensity factors for cracks to be used in deterministic linear elastic fracture mechanics (LEFM) assessments of fracture in degraded RPVs. The -integral can only be used to evaluate stress intensity factors for axis-aligned flaws because it can only be used to obtain the stress intensity factor for pure Mode Imore » loading. Off-axis flaws will be subjected to mixed-mode loading. For this reason, work has continued to expand the set of fracture mechanics capabilities to permit it to evaluate off-axis flaws. This report documents the following work to enhance Grizzly’s engineering fracture mechanics capabilities for RPVs: • Interaction Integral and -stress: To obtain mixed-mode stress intensity factors, a capability to evaluate interaction integrals for 2D or 3D flaws has been developed. A -stress evaluation capability has been developed to evaluate the constraint at crack tips in 2D or 3D. Initial verification testing of these capabilities is documented here. • Benchmarking for axis-aligned flaws: Grizzly’s capabilities to evaluate stress intensity factors for axis-aligned flaws have been benchmarked against calculations for the same conditions in FAVOR. • Off-axis flaw demonstration: The newly-developed interaction integral capabilities are demon- strated in an application to calculate the mixed-mode stress intensity factors for off-axis flaws. • Other code enhancements: Other enhancements to the thermomechanics capabilities that relate to the solution of the engineering RPV fracture problem are documented here.« less
Requirements for benchmarking personal image retrieval systems
NASA Astrophysics Data System (ADS)
Bouguet, Jean-Yves; Dulong, Carole; Kozintsev, Igor; Wu, Yi
2006-01-01
It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.1 For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in2: - data sets - query tasks - ground truth - evaluation measures - benchmarking events. This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.
[Quality management in intensive care medicine].
Martin, J; Braun, J-P
2014-02-01
Treatment of critical ill patients in the intensive care unit is tantamount to well-designed risk or quality management. Several tools of quality management and quality assurance have been developed in intensive care medicine. In addition to external quality assurance by benchmarking with regard to the intensive care medicine, peer review procedures have been established for external quality assurance in recent years. In the process of peer review of an intensive care unit (ICU), external physicians and nurses visit the ICU, evaluate on-site proceedings, and discuss with the managing team of the ICU possibilities for optimization. Furthermore, internal quality management in the ICU is possible based on the 10 quality indicators of the German Interdisciplinary Society for Intensive Care Medicine (DIVI, "Deutschen Interdisziplinären Vereinigung für Intensiv- und Notfallmedizin"). Thereby every ICU has numerous possibilities to improve their quality management system.
[Quality management in intensive care medicine].
Martin, J; Braun, J-P
2013-09-01
Treatment of critical ill patients in the intensive care unit is tantamount to well-designed risk or quality management. Several tools of quality management and quality assurance have been developed in intensive care medicine. In addition to extern quality assurance by benchmarking with regard to the intensive care medicine, peer review procedures have been established for external quality assurance in recent years. In the process of peer review of an intensive care unit (ICU), external physicians and nurses visit the ICU, evaluate on-site proceedings, and discuss with the managing team of the ICU possibilities for optimization. Furthermore, internal quality management in the ICU is possible based on the 10 quality indicators of the German Interdisciplinary Society for Intensive Care Medicine (DIVI, "Deutschen Interdisziplinären Vereinigung für Intensiv- und Notfallmedizin"). Thereby every ICU has numerous possibilities to improve their quality management system.
Study of Storage Ring Free-Electron Laser Using Experimental and Simulation Approaches
NASA Astrophysics Data System (ADS)
Jia, Botao
2011-12-01
The Duke electron storage ring, first commissioned in November of 1994, has been developed as a dedicated driver for storage ring free-electron lasers (SRFELs) operating in a wide wavelength range from infrared, to visible, to ultraviolet (UV) and vacuum ultraviolet (VUV). The storage ring has a long straight section for various insertion devices and can be operated in a wide energy range (0.25 GeV to 1.15 GeV). Commissioned in 1995, the first free-electron laser (FEL) on the Duke storage ring was the OK-4 FEL, an optical klystron with two planar undulators sandwiching a buncher magnet. In 2005, the OK-5 FEL with two helical undulators was commissioned. Operating four undulators---two OK-4 and two OK-5 undulators, the world's first distributed optical klystron FEL was brought to operation in 2005. Via Compton scattering of FEL photons and electrons in the storage ring, the Duke FEL drives the world's most powerful, nearly monochromatic, and polarized Compton gamma-ray source, the High Intensity Gamma-ray Source (HIgammaS). Today, a variety of configurations of the storage ring FELs at Duke have been used in a wide range of research areas from nuclear physics to biophysics, from chemical and medical research to industrial applications. The capability of accurately measuring the storage ring electron beam energy spread is crucial for understanding the longitudinal beam dynamics and the dynamics of the storage ring FEL. In this dissertation, we have successfully developed a noninvasive, versatile, and accurate method to measure the energy spread using optical klystron radiation. Novel numerical methods based upon the Gauss-Hermite expansion have been developed to treat both spectral broadening and modulation on an equal footing. Through properly configuring the optical klystron, this energy spread measurement method has a large dynamic range. In addition, a model-based scheme has been developed for correcting the electron beam emittance related inhomogeneous spectral broadening effect, to further enhance the accuracy of measuring the electron beam energy spread. Taking advantage of the direct measurement method of the electron beam energy spread, we have developed another novel technique to simultaneously measure the FEL power, electron beam energy spread, and other beam parameters. This allowed us to study the FEL power in a systematic manner for the first time. Based on the experimental findings and results of the theoretical predictions, we have proposed a compact formula to predict the FEL power using only the knowledge of electron beam current, beam energy, and bunch length. As part of the dissertation work, we have developed a self-consistent numerical model to study the storage ring FEL. The simulation program models the electron beam propagation along the storage ring, multi-turn FEL interaction in the undulators, gradual intra-cavity optical power buildup, etc. This simulation code captures the main features of a storage ring FEL at different time and space scales. The simulated FEL gain has been benchmarked against measured gain and calculated gain with good agreement. The simulation package can provide comprehensive information about the FEL gain, optical pulse growth, electron beam properties, etc. In the near future, we plan to further improve the simulation model, by including additional physics effects such as microwave instability, to make it a more useful tool for FEL research.
Benchmarking of Improved DPAC Transient Deflagration Analysis Code
Laurinat, James E.; Hensel, Steve J.
2017-09-27
The deflagration pressure analysis code (DPAC) has been upgraded for use in modeling hydrogen deflagration transients. The upgraded code is benchmarked using data from vented hydrogen deflagration tests conducted at the HYDRO-SC Test Facility at the University of Pisa. DPAC originally was written to calculate peak pressures for deflagrations in radioactive waste storage tanks and process facilities at the Savannah River Site. Upgrades include the addition of a laminar flame speed correlation for hydrogen deflagrations and a mechanistic model for turbulent flame propagation, incorporation of inertial effects during venting, and inclusion of the effect of water vapor condensation on vesselmore » walls. In addition, DPAC has been coupled with chemical equilibrium with applications (CEA), a NASA combustion chemistry code. The deflagration tests are modeled as end-to-end deflagrations. As a result, the improved DPAC code successfully predicts both the peak pressures during the deflagration tests and the times at which the pressure peaks.« less
NASA Technical Reports Server (NTRS)
Tuey, Richard C.; Moore, Fred W.; Ryan, Christine A.
1995-01-01
The report is presented in four sections: The Introduction describes the duplicating configuration under evaluation and the Background contains a chronological description of the evaluation segmented by phases 1 and 2. This section includes the evaluation schedule, printing and duplicating requirements, storage and communication requirements, electronic publishing system configuration, existing processes and proposed processes, billing rates, costs and productivity analysis, and the return on investment based upon the data gathered to date. The third section contains the phase 1 comparative cost and productivity analysis. This analysis demonstrated that LaRC should proceed with a 90-day evaluation of the DocuTech and follow with a phase 2 cycle to actually demonstrate that the proposed system would meet the needs of LaRC's printing and duplicating requirements, benchmark results, cost comparisons, benchmark observations, and recommendations. These are documented after the recommendations.
Benchmarking of Improved DPAC Transient Deflagration Analysis Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laurinat, James E.; Hensel, Steve J.
The deflagration pressure analysis code (DPAC) has been upgraded for use in modeling hydrogen deflagration transients. The upgraded code is benchmarked using data from vented hydrogen deflagration tests conducted at the HYDRO-SC Test Facility at the University of Pisa. DPAC originally was written to calculate peak pressures for deflagrations in radioactive waste storage tanks and process facilities at the Savannah River Site. Upgrades include the addition of a laminar flame speed correlation for hydrogen deflagrations and a mechanistic model for turbulent flame propagation, incorporation of inertial effects during venting, and inclusion of the effect of water vapor condensation on vesselmore » walls. In addition, DPAC has been coupled with chemical equilibrium with applications (CEA), a NASA combustion chemistry code. The deflagration tests are modeled as end-to-end deflagrations. As a result, the improved DPAC code successfully predicts both the peak pressures during the deflagration tests and the times at which the pressure peaks.« less
NASA Astrophysics Data System (ADS)
Davis, G. A.; Battistuz, B.; Foley, S.; Vernon, F. L.; Eakins, J. A.
2009-12-01
Since April 2004 the Earthscope USArray Transportable Array (TA) network has grown to over 400 broadband seismic stations that stream multi-channel data in near real-time to the Array Network Facility in San Diego. In total, over 1.7 terabytes per year of 24-bit, 40 samples-per-second seismic and state of health data is recorded from the stations. The ANF provides analysts access to real-time and archived data, as well as state-of-health data, metadata, and interactive tools for station engineers and the public via a website. Additional processing and recovery of missing data from on-site recorders (balers) at the stations is performed before the final data is transmitted to the IRIS Data Management Center (DMC). Assembly of the final data set requires additional storage and processing capabilities to combine the real-time data with baler data. The infrastructure supporting these diverse computational and storage needs currently consists of twelve virtualized Sun Solaris Zones executing on nine physical server systems. The servers are protected against failure by redundant power, storage, and networking connections. Storage needs are provided by a hybrid iSCSI and Fiber Channel Storage Area Network (SAN) with access to over 40 terabytes of RAID 5 and 6 storage. Processing tasks are assigned to systems based on parallelization and floating-point calculation needs. On-site buffering at the data-loggers provide protection in case of short-term network or hardware problems, while backup acquisition systems at the San Diego Supercomputer Center and the DMC protect against catastrophic failure of the primary site. Configuration management and monitoring of these systems is accomplished with open-source (Cfengine, Nagios, Solaris Community Software) and commercial tools (Intermapper). In the evolution from a single server to multiple virtualized server instances, Sun Cluster software was evaluated and found to be unstable in our environment. Shared filesystem architectures using PxFS and QFS were found to be incompatible with our software architecture, so sharing of data between systems is accomplished via traditional NFS. Linux was found to be limited in terms of deployment flexibility and consistency between versions. Despite the experimentation with various technologies, our current virtualized architecture is stable to the point of an average daily real time data return rate of 92.34% over the entire lifetime of the project to date.
Approximate methods in gamma-ray skyshine calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faw, R.E.; Roseberry, M.L.; Shultis, J.K.
1985-11-01
Gamma-ray skyshine, an important component of the radiation field in the environment of a nuclear power plant, has recently been studied in relation to storage of spent fuel and nuclear waste. This paper reviews benchmark skyshine experiments and transport calculations against which computational procedures may be tested. The paper also addresses the applicability of simplified computational methods involving single-scattering approximations. One such method, suitable for microcomputer implementation, is described and results are compared with other work.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
NASA's Pleiades Supercomputer Crunches Data For Groundbreaking Analysis and Visualizations
2016-11-23
The Pleiades supercomputer at NASA's Ames Research Center, recently named the 13th fastest computer in the world, provides scientists and researchers high-fidelity numerical modeling of complex systems and processes. By using detailed analyses and visualizations of large-scale data, Pleiades is helping to advance human knowledge and technology, from designing the next generation of aircraft and spacecraft to understanding the Earth's climate and the mysteries of our galaxy.
A Long History of Supercomputing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grider, Gary
As part of its national security science mission, Los Alamos National Laboratory and HPC have a long, entwined history dating back to the earliest days of computing. From bringing the first problem to the nation’s first computer to building the first machine to break the petaflop barrier, Los Alamos holds many “firsts” in HPC breakthroughs. Today, supercomputers are integral to stockpile stewardship and the Laboratory continues to work with vendors in developing the future of HPC.
Introducing Argonne’s Theta Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Theta, the Argonne Leadership Computing Facility’s (ALCF) new Intel-Cray supercomputer, is officially open to the research community. Theta’s massively parallel, many-core architecture puts the ALCF on the path to Aurora, the facility’s future Intel-Cray system. Capable of nearly 10 quadrillion calculations per second, Theta enables researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.
NASA Advanced Supercomputing Facility Expansion
NASA Technical Reports Server (NTRS)
Thigpen, William W.
2017-01-01
The NASA Advanced Supercomputing (NAS) Division enables advances in high-end computing technologies and in modeling and simulation methods to tackle some of the toughest science and engineering challenges facing NASA today. The name "NAS" has long been associated with leadership and innovation throughout the high-end computing (HEC) community. We play a significant role in shaping HEC standards and paradigms, and provide leadership in the areas of large-scale InfiniBand fabrics, Lustre open-source filesystems, and hyperwall technologies. We provide an integrated high-end computing environment to accelerate NASA missions and make revolutionary advances in science. Pleiades, a petaflop-scale supercomputer, is used by scientists throughout the U.S. to support NASA missions, and is ranked among the most powerful systems in the world. One of our key focus areas is in modeling and simulation to support NASA's real-world engineering applications and make fundamental advances in modeling and simulation methods.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Graphics supercomputer for computational fluid dynamics research
NASA Astrophysics Data System (ADS)
Liaw, Goang S.
1994-11-01
The objective of this project is to purchase a state-of-the-art graphics supercomputer to improve the Computational Fluid Dynamics (CFD) research capability at Alabama A & M University (AAMU) and to support the Air Force research projects. A cutting-edge graphics supercomputer system, Onyx VTX, from Silicon Graphics Computer Systems (SGI), was purchased and installed. Other equipment including a desktop personal computer, PC-486 DX2 with a built-in 10-BaseT Ethernet card, a 10-BaseT hub, an Apple Laser Printer Select 360, and a notebook computer from Zenith were also purchased. A reading room has been converted to a research computer lab by adding some furniture and an air conditioning unit in order to provide an appropriate working environments for researchers and the purchase equipment. All the purchased equipment were successfully installed and are fully functional. Several research projects, including two existing Air Force projects, are being performed using these facilities.