US Department of Energy High School Student Supercomputing Honors Program: A follow-up assessment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1987-01-01
The US DOE High School Student Supercomputing Honors Program was designed to recognize high school students with superior skills in mathematics and computer science and to provide them with formal training and experience with advanced computer equipment. This document reports on the participants who attended the first such program, which was held at the National Magnetic Fusion Energy Computer Center at the Lawrence Livermore National Laboratory (LLNL) during August 1985.
The Sky's the Limit When Super Students Meet Supercomputers.
ERIC Educational Resources Information Center
Trotter, Andrew
1991-01-01
In a few select high schools in the U.S., supercomputers are allowing talented students to attempt sophisticated research projects using simultaneous simulations of nature, culture, and technology not achievable by ordinary microcomputers. Schools can get their students online by entering contests and seeking grants and partnerships with…
The ChemViz Project: Using a Supercomputer To Illustrate Abstract Concepts in Chemistry.
ERIC Educational Resources Information Center
Beckwith, E. Kenneth; Nelson, Christopher
1998-01-01
Describes the Chemistry Visualization (ChemViz) Project, a Web venture maintained by the University of Illinois National Center for Supercomputing Applications (NCSA) that enables high school students to use computational chemistry as a technique for understanding abstract concepts. Discusses the evolution of computational chemistry and provides a…
Adventures in supercomputing: An innovative program for high school teachers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oliver, C.E.; Hicks, H.R.; Summers, B.G.
1994-12-31
Within the realm of education, seldom does an innovative program become available with the potential to change an educator`s teaching methodology. Adventures in Supercomputing (AiS), sponsored by the U.S. Department of Energy (DOE), is such a program. It is a program for high school teachers that changes the teacher paradigm from a teacher-directed approach of teaching to a student-centered approach. {open_quotes}A student-centered classroom offers better opportunities for development of internal motivation, planning skills, goal setting and perseverance than does the traditional teacher-directed mode{close_quotes}. Not only is the process of teaching changed, but the cross-curricula integration within the AiS materials ismore » remarkable. Written from a teacher`s perspective, this paper will describe the AiS program and its effects on teachers and students, primarily at Wartburg Central High School, in Wartburg, Tennessee. The AiS program in Tennessee is sponsored by Oak Ridge National Laboratory (ORNL).« less
When Rural Reality Goes Virtual.
ERIC Educational Resources Information Center
Husain, Dilshad D.
1998-01-01
In rural towns where sparse population and few business are barriers, virtual reality may be the only way to bring work-based learning to students. A partnership between a small-town high school, the Ohio Supercomputer Center, and a high-tech business will enable students to explore the workplace using virtual reality. (JOW)
Scientific Visualization in High Speed Network Environments
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kutler, Paul (Technical Monitor)
1997-01-01
In several cases, new visualization techniques have vastly increased the researcher's ability to analyze and comprehend data. Similarly, the role of networks in providing an efficient supercomputing environment have become more critical and continue to grow at a faster rate than the increase in the processing capabilities of supercomputers. A close relationship between scientific visualization and high-speed networks in providing an important link to support efficient supercomputing is identified. The two technologies are driven by the increasing complexities and volume of supercomputer data. The interaction of scientific visualization and high-speed networks in a Computational Fluid Dynamics simulation/visualization environment are given. Current capabilities supported by high speed networks, supercomputers, and high-performance graphics workstations at the Numerical Aerodynamic Simulation Facility (NAS) at NASA Ames Research Center are described. Applied research in providing a supercomputer visualization environment to support future computational requirements are summarized.
ERIC Educational Resources Information Center
General Accounting Office, Washington, DC. Information Management and Technology Div.
This report was prepared in response to a request for information on supercomputers and high-speed networks from the Senate Committee on Commerce, Science, and Transportation, and the House Committee on Science, Space, and Technology. The following information was requested: (1) examples of how various industries are using supercomputers to…
NASA Technical Reports Server (NTRS)
Babrauckas, Theresa
2000-01-01
The Affordable High Performance Computing (AHPC) project demonstrated that high-performance computing based on a distributed network of computer workstations is a cost-effective alternative to vector supercomputers for running CPU and memory intensive design and analysis tools. The AHPC project created an integrated system called a Network Supercomputer. By connecting computer work-stations through a network and utilizing the workstations when they are idle, the resulting distributed-workstation environment has the same performance and reliability levels as the Cray C90 vector Supercomputer at less than 25 percent of the C90 cost. In fact, the cost comparison between a Cray C90 Supercomputer and Sun workstations showed that the number of distributed networked workstations equivalent to a C90 costs approximately 8 percent of the C90.
The role of graphics super-workstations in a supercomputing environment
NASA Technical Reports Server (NTRS)
Levin, E.
1989-01-01
A new class of very powerful workstations has recently become available which integrate near supercomputer computational performance with very powerful and high quality graphics capability. These graphics super-workstations are expected to play an increasingly important role in providing an enhanced environment for supercomputer users. Their potential uses include: off-loading the supercomputer (by serving as stand-alone processors, by post-processing of the output of supercomputer calculations, and by distributed or shared processing), scientific visualization (understanding of results, communication of results), and by real time interaction with the supercomputer (to steer an iterative computation, to abort a bad run, or to explore and develop new algorithms).
Adventures in supercomputing, a K-12 program in computational science: An assessment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oliver, C.E.; Hicks, H.R.; Iles-Brechak, K.D.
1994-10-01
In this paper, the authors describe only those elements of the Department of Energy Adventures in Supercomputing (AiS) program for high school teachers, such as school selection, which have a direct bearing on assessment. Schools submit an application to participate in the AiS program. They propose a team of at least two teachers to implement the AiS curriculum. The applications are evaluated by selection committees in each of the five participating states to determine which schools are the most qualified to carry out the program and reach a significant number of women, minorities, and economically disadvantaged students, all of whommore » have historically been underrepresented in the sciences. Typically, selected schools either have a large disadvantaged student population, or the applying teachers propose specific means to attract these segments of their student body into AiS classes. Some areas with AiS schools have significant numbers of minority students, some have economically disadvantaged, usually rural, students, and all areas have the potential to reach a higher proportion of women than technical classes usually attract. This report presents preliminary findings based on three types of data: demographic, student journals, and contextual. Demographic information is obtained for both students and teachers. Students have been asked to maintain journals which include replies to specific questions that are posed each month. An analysis of the answers to these questions helps to form a picture of how students progress through the course of the school year. Onsite visits by assessment professionals conducting student and teacher interviews, provide a more in depth, qualitative basis for understanding student motivations.« less
TOP500 Supercomputers for June 2004
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
2004-06-23
23rd Edition of TOP500 List of World's Fastest Supercomputers Released: Japan's Earth Simulator Enters Third Year in Top Position MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 23rd edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2004) at the International Supercomputer Conference in Heidelberg, Germany.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolfe, A.
1986-03-10
Supercomputing software is moving into high gear, spurred by the rapid spread of supercomputers into new applications. The critical challenge is how to develop tools that will make it easier for programmers to write applications that take advantage of vectorizing in the classical supercomputer and the parallelism that is emerging in supercomputers and minisupercomputers. Writing parallel software is a challenge that every programmer must face because parallel architectures are springing up across the range of computing. Cray is developing a host of tools for programmers. Tools to support multitasking (in supercomputer parlance, multitasking means dividing up a single program tomore » run on multiple processors) are high on Cray's agenda. On tap for multitasking is Premult, dubbed a microtasking tool. As a preprocessor for Cray's CFT77 FORTRAN compiler, Premult will provide fine-grain multitasking.« less
NASA's supercomputing experience
NASA Technical Reports Server (NTRS)
Bailey, F. Ron
1990-01-01
A brief overview of NASA's recent experience in supercomputing is presented from two perspectives: early systems development and advanced supercomputing applications. NASA's role in supercomputing systems development is illustrated by discussion of activities carried out by the Numerical Aerodynamical Simulation Program. Current capabilities in advanced technology applications are illustrated with examples in turbulence physics, aerodynamics, aerothermodynamics, chemistry, and structural mechanics. Capabilities in science applications are illustrated by examples in astrophysics and atmospheric modeling. Future directions and NASA's new High Performance Computing Program are briefly discussed.
Will Moores law be sufficient?
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeBenedictis, Erik P.
2004-07-01
It seems well understood that supercomputer simulation is an enabler for scientific discoveries, weapons, and other activities of value to society. It also seems widely believed that Moore's Law will make progressively more powerful supercomputers over time and thus enable more of these contributions. This paper seeks to add detail to these arguments, revealing them to be generally correct but not a smooth and effortless progression. This paper will review some key problems that can be solved with supercomputer simulation, showing that more powerful supercomputers will be useful up to a very high yet finite limit of around 1021 FLOPSmore » (1 Zettaflops) . The review will also show the basic nature of these extreme problems. This paper will review work by others showing that the theoretical maximum supercomputer power is very high indeed, but will explain how a straightforward extrapolation of Moore's Law will lead to technological maturity in a few decades. The power of a supercomputer at the maturity of Moore's Law will be very high by today's standards at 1016-1019 FLOPS (100 Petaflops to 10 Exaflops), depending on architecture, but distinctly below the level required for the most ambitious applications. Having established that Moore's Law will not be that last word in supercomputing, this paper will explore the nearer term issue of what a supercomputer will look like at maturity of Moore's Law. Our approach will quantify the maximum performance as permitted by the laws of physics for extension of current technology and then find a design that approaches this limit closely. We study a 'multi-architecture' for supercomputers that combines a microprocessor with other 'advanced' concepts and find it can reach the limits as well. This approach should be quite viable in the future because the microprocessor would provide compatibility with existing codes and programming styles while the 'advanced' features would provide a boost to the limits of performance.« less
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Supercomputer Provides Molecular Insight into Cellulose (Fact Sheet)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
2011-02-01
Groundbreaking research at the National Renewable Energy Laboratory (NREL) has used supercomputing simulations to calculate the work that enzymes must do to deconstruct cellulose, which is a fundamental step in biomass conversion technologies for biofuels production. NREL used the new high-performance supercomputer Red Mesa to conduct several million central processing unit (CPU) hours of simulation.
A high level language for a high performance computer
NASA Technical Reports Server (NTRS)
Perrott, R. H.
1978-01-01
The proposed computational aerodynamic facility will join the ranks of the supercomputers due to its architecture and increased execution speed. At present, the languages used to program these supercomputers have been modifications of programming languages which were designed many years ago for sequential machines. A new programming language should be developed based on the techniques which have proved valuable for sequential programming languages and incorporating the algorithmic techniques required for these supercomputers. The design objectives for such a language are outlined.
TOP500 Supercomputers for November 2003
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
2003-11-16
22nd Edition of TOP500 List of World s Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 22nd edition of the TOP500 list of the worlds fastest supercomputers was released today (November 16, 2003). The Earth Simulator supercomputer retains the number one position with its Linpack benchmark performance of 35.86 Tflop/s (''teraflops'' or trillions of calculations per second). It was built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan.
Desktop supercomputer: what can it do?
NASA Astrophysics Data System (ADS)
Bogdanov, A.; Degtyarev, A.; Korkhov, V.
2017-12-01
The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yilk, Todd
The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL
Yilk, Todd
2018-02-17
The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Job Management Requirements for NAS Parallel Systems and Clusters
NASA Technical Reports Server (NTRS)
Saphir, William; Tanner, Leigh Ann; Traversat, Bernard
1995-01-01
A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
NASA Technical Reports Server (NTRS)
1991-01-01
Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
Characterizing output bottlenecks in a supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Bing; Chase, Jeffrey; Dillow, David A
2012-01-01
Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic,more » contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.« less
Technology advances and market forces: Their impact on high performance architectures
NASA Technical Reports Server (NTRS)
Best, D. R.
1978-01-01
Reasonable projections into future supercomputer architectures and technology require an analysis of the computer industry market environment, the current capabilities and trends within the component industry, and the research activities on computer architecture in the industrial and academic communities. Management, programmer, architect, and user must cooperate to increase the efficiency of supercomputer development efforts. Care must be taken to match the funding, compiler, architecture and application with greater attention to testability, maintainability, reliability, and usability than supercomputer development programs of the past.
NASA Astrophysics Data System (ADS)
Watari, S.; Morikawa, Y.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Kato, H.; Shimojo, S.; Murata, K. T.
2010-12-01
In the Solar-Terrestrial Physics (STP) field, spatio-temporal resolution of computer simulations is getting higher and higher because of tremendous advancement of supercomputers. A more advanced technology is Grid Computing that integrates distributed computational resources to provide scalable computing resources. In the simulation research, it is effective that a researcher oneself designs his physical model, performs calculations with a supercomputer, and analyzes and visualizes for consideration by a familiar method. A supercomputer is far from an analysis and visualization environment. In general, a researcher analyzes and visualizes in the workstation (WS) managed at hand because the installation and the operation of software in the WS are easy. Therefore, it is necessary to copy the data from the supercomputer to WS manually. Time necessary for the data transfer through long delay network disturbs high-accuracy simulations actually. In terms of usefulness, integrating a supercomputer and an analysis and visualization environment seamlessly with a researcher's familiar method is important. NICT has been developing a cloud computing environment (NICT Space Weather Cloud). In the NICT Space Weather Cloud, disk servers are located near its supercomputer and WSs for data analysis and visualization. They are connected to JGN2plus that is high-speed network for research and development. Distributed virtual high-capacity storage is also constructed by Grid Datafarm (Gfarm v2). Huge-size data output from the supercomputer is transferred to the virtual storage through JGN2plus. A researcher can concentrate on the research by a familiar method without regard to distance between a supercomputer and an analysis and visualization environment. Now, total 16 disk servers are setup in NICT headquarters (at Koganei, Tokyo), JGN2plus NOC (at Otemachi, Tokyo), Okinawa Subtropical Environment Remote-Sensing Center, and Cybermedia Center, Osaka University. They are connected on JGN2plus, and they constitute 1PB (physical size) virtual storage by Gfarm v2. These disk servers are connected with supercomputers of NICT and Osaka University. A system that data output from the supercomputers are automatically transferred to the virtual storage had been built up. Transfer rate is about 50 GB/hrs by actual measurement. It is estimated that the performance is reasonable for a certain simulation and analysis for reconstruction of coronal magnetic field. This research is assumed an experiment of the system, and the verification of practicality is advanced at the same time. Herein we introduce an overview of the space weather cloud system so far we have developed. We also demonstrate several scientific results using the space weather cloud system. We also introduce several web applications of the cloud as a service of the space weather cloud, which is named as "e-SpaceWeather" (e-SW). The e-SW provides with a variety of space weather online services from many aspects.
Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarje, Abhinav; Jacobsen, Douglas W.; Williams, Samuel W.
The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.
A mass storage system for supercomputers based on Unix
NASA Technical Reports Server (NTRS)
Richards, J.; Kummell, T.; Zarlengo, D. G.
1988-01-01
The authors present the design, implementation, and utilization of a large mass storage subsystem (MSS) for the numerical aerodynamics simulation. The MSS supports a large networked, multivendor Unix-based supercomputing facility. The MSS at Ames Research Center provides all processors on the numerical aerodynamics system processing network, from workstations to supercomputers, the ability to store large amounts of data in a highly accessible, long-term repository. The MSS uses Unix System V and is capable of storing hundreds of thousands of files ranging from a few bytes to 2 Gb in size.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
GREEN SUPERCOMPUTING IN A DESKTOP BOX
DOE Office of Scientific and Technical Information (OSTI.GOV)
HSU, CHUNG-HSING; FENG, WU-CHUN; CHING, AVERY
2007-01-17
The computer workstation, introduced by Sun Microsystems in 1982, was the tool of choice for scientists and engineers as an interactive computing environment for the development of scientific codes. However, by the mid-1990s, the performance of workstations began to lag behind high-end commodity PCs. This, coupled with the disappearance of BSD-based operating systems in workstations and the emergence of Linux as an open-source operating system for PCs, arguably led to the demise of the workstation as we knew it. Around the same time, computational scientists started to leverage PCs running Linux to create a commodity-based (Beowulf) cluster that provided dedicatedmore » computer cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large supercomputers, i.e., supercomputing for the few. However, as the cluster movement has matured, with respect to cluster hardware and open-source software, these clusters have become much more like their large-scale supercomputing brethren - a shared (and power-hungry) datacenter resource that must reside in a machine-cooled room in order to operate properly. Consequently, the above observations, when coupled with the ever-increasing performance gap between the PC and cluster supercomputer, provide the motivation for a 'green' desktop supercomputer - a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation 1 'pizza box' workstation. In this paper, they present the hardware and software architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop supercomputer that achieves 14 Gflops on Linpack but sips only 185 watts of power at load, resulting in a performance-power ratio that is over 300% better than their reference SMP platform.« less
NASA Astrophysics Data System (ADS)
Schulthess, Thomas C.
2013-03-01
The continued thousand-fold improvement in sustained application performance per decade on modern supercomputers keeps opening new opportunities for scientific simulations. But supercomputers have become very complex machines, built with thousands or tens of thousands of complex nodes consisting of multiple CPU cores or, most recently, a combination of CPU and GPU processors. Efficient simulations on such high-end computing systems require tailored algorithms that optimally map numerical methods to particular architectures. These intricacies will be illustrated with simulations of strongly correlated electron systems, where the development of quantum cluster methods, Monte Carlo techniques, as well as their optimal implementation by means of algorithms with improved data locality and high arithmetic density have gone hand in hand with evolving computer architectures. The present work would not have been possible without continued access to computing resources at the National Center for Computational Science of Oak Ridge National Laboratory, which is funded by the Facilities Division of the Office of Advanced Scientific Computing Research, and the Swiss National Supercomputing Center (CSCS) that is funded by ETH Zurich.
NASA Astrophysics Data System (ADS)
Voronin, A. A.; Panchenko, V. Ya; Zheltikov, A. M.
2016-06-01
High-intensity ultrashort laser pulses propagating in gas media or in condensed matter undergo complex nonlinear spatiotemporal evolution where temporal transformations of optical field waveforms are strongly coupled to an intricate beam dynamics and ultrafast field-induced ionization processes. At the level of laser peak powers orders of magnitude above the critical power of self-focusing, the beam exhibits modulation instabilities, producing random field hot spots and breaking up into multiple noise-seeded filaments. This problem is described by a (3 + 1)-dimensional nonlinear field evolution equation, which needs to be solved jointly with the equation for ultrafast ionization of a medium. Analysis of this problem, which is equivalent to solving a billion-dimensional evolution problem, is only possible by means of supercomputer simulations augmented with coordinated big-data processing of large volumes of information acquired through theory-guiding experiments and supercomputations. Here, we review the main challenges of supercomputations and big-data processing encountered in strong-field ultrafast optical physics and discuss strategies to confront these challenges.
Flux-Level Transit Injection Experiments with NASA Pleiades Supercomputer
NASA Astrophysics Data System (ADS)
Li, Jie; Burke, Christopher J.; Catanzarite, Joseph; Seader, Shawn; Haas, Michael R.; Batalha, Natalie; Henze, Christopher; Christiansen, Jessie; Kepler Project, NASA Advanced Supercomputing Division
2016-06-01
Flux-Level Transit Injection (FLTI) experiments are executed with NASA's Pleiades supercomputer for the Kepler Mission. The latest release (9.3, January 2016) of the Kepler Science Operations Center Pipeline is used in the FLTI experiments. Their purpose is to validate the Analytic Completeness Model (ACM), which can be computed for all Kepler target stars, thereby enabling exoplanet occurrence rate studies. Pleiades, a facility of NASA's Advanced Supercomputing Division, is one of the world's most powerful supercomputers and represents NASA's state-of-the-art technology. We discuss the details of implementing the FLTI experiments on the Pleiades supercomputer. For example, taking into account that ~16 injections are generated by one core of the Pleiades processors in an hour, the “shallow” FLTI experiment, in which ~2000 injections are required per target star, can be done for 16% of all Kepler target stars in about 200 hours. Stripping down the transit search to bare bones, i.e. only searching adjacent high/low periods at high/low pulse durations, makes the computationally intensive FLTI experiments affordable. The design of the FLTI experiments and the analysis of the resulting data are presented in “Validating an Analytic Completeness Model for Kepler Target Stars Based on Flux-level Transit Injection Experiments” by Catanzarite et al. (#2494058).Kepler was selected as the 10th mission of the Discovery Program. Funding for the Kepler Mission has been provided by the NASA Science Mission Directorate.
NASA Advanced Supercomputing Facility Expansion
NASA Technical Reports Server (NTRS)
Thigpen, William W.
2017-01-01
The NASA Advanced Supercomputing (NAS) Division enables advances in high-end computing technologies and in modeling and simulation methods to tackle some of the toughest science and engineering challenges facing NASA today. The name "NAS" has long been associated with leadership and innovation throughout the high-end computing (HEC) community. We play a significant role in shaping HEC standards and paradigms, and provide leadership in the areas of large-scale InfiniBand fabrics, Lustre open-source filesystems, and hyperwall technologies. We provide an integrated high-end computing environment to accelerate NASA missions and make revolutionary advances in science. Pleiades, a petaflop-scale supercomputer, is used by scientists throughout the U.S. to support NASA missions, and is ranked among the most powerful systems in the world. One of our key focus areas is in modeling and simulation to support NASA's real-world engineering applications and make fundamental advances in modeling and simulation methods.
TOP500 Supercomputers for June 2003
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
2003-06-23
21st Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 21st edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2003). The Earth Simulator supercomputer built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan, with its Linpack benchmark performance of 35.86 Tflop/s (teraflops or trillions of calculations per second), retains the number one position. The number 2 position is held by the re-measured ASCI Q system at Los Alamosmore » National Laboratory. With 13.88 Tflop/s, it is the second system ever to exceed the 10 Tflop/smark. ASCIQ was built by Hewlett-Packard and is based on the AlphaServerSC computer system.« less
The impact of the U.S. supercomputing initiative will be global
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crawford, Dona
2016-01-15
Last July, President Obama issued an executive order that created a coordinated federal strategy for HPC research, development, and deployment called the U.S. National Strategic Computing Initiative (NSCI). However, this bold, necessary step toward building the next generation of supercomputers has inaugurated a new era for U.S. high performance computing (HPC).
Parallel-vector solution of large-scale structural analysis problems on supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1989-01-01
A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
Porting Ordinary Applications to Blue Gene/Q Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maheshwari, Ketan C.; Wozniak, Justin M.; Armstrong, Timothy
2015-08-31
Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques-sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt'smore » sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.« less
Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Younge, Andrew J.; Pedretti, Kevin; Grant, Ryan
While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In thismore » paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.« less
2014-09-01
simulation time frame from 30 days to one year. This was enabled by porting the simulation to the Pleiades supercomputer at NASA Ames Research Center, a...including the motivation for changes to our past approach. We then present the software implementation (3) on the NASA Ames Pleiades supercomputer...significantly updated since last year’s paper [25]. The main incentive for that was the shift to a highly parallel approach in order to utilize the Pleiades
Role of HPC in Advancing Computational Aeroelasticity
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.
2004-01-01
On behalf of the High Performance Computing and Modernization Program (HPCMP) and NASA Advanced Supercomputing Division (NAS) a study is conducted to assess the role of supercomputers on computational aeroelasticity of aerospace vehicles. The study is mostly based on the responses to a web based questionnaire that was designed to capture the nuances of high performance computational aeroelasticity, particularly on parallel computers. A procedure is presented to assign a fidelity-complexity index to each application. Case studies based on major applications using HPCMP resources are presented.
PerSEUS: Ultra-Low-Power High Performance Computing for Plasma Simulations
NASA Astrophysics Data System (ADS)
Doxas, I.; Andreou, A.; Lyon, J.; Angelopoulos, V.; Lu, S.; Pritchett, P. L.
2017-12-01
Peta-op SupErcomputing Unconventional System (PerSEUS) aims to explore the use for High Performance Scientific Computing (HPC) of ultra-low-power mixed signal unconventional computational elements developed by Johns Hopkins University (JHU), and demonstrate that capability on both fluid and particle Plasma codes. We will describe the JHU Mixed-signal Unconventional Supercomputing Elements (MUSE), and report initial results for the Lyon-Fedder-Mobarry (LFM) global magnetospheric MHD code, and a UCLA general purpose relativistic Particle-In-Cell (PIC) code.
NREL's Building-Integrated Supercomputer Provides Heating and Efficient Computing (Fact Sheet)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
2014-09-01
NREL's Energy Systems Integration Facility (ESIF) is meant to investigate new ways to integrate energy sources so they work together efficiently, and one of the key tools to that investigation, a new supercomputer, is itself a prime example of energy systems integration. NREL teamed with Hewlett-Packard (HP) and Intel to develop the innovative warm-water, liquid-cooled Peregrine supercomputer, which not only operates efficiently but also serves as the primary source of building heat for ESIF offices and laboratories. This innovative high-performance computer (HPC) can perform more than a quadrillion calculations per second as part of the world's most energy-efficient HPC datamore » center.« less
NAS Technical Summaries, March 1993 - February 1994
NASA Technical Reports Server (NTRS)
1995-01-01
NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1993-94 operational year concluded with 448 high-speed processor projects and 95 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
NAS technical summaries. Numerical aerodynamic simulation program, March 1992 - February 1993
NASA Technical Reports Server (NTRS)
1994-01-01
NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1992-93 operational year concluded with 399 high-speed processor projects and 91 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
Japanese supercomputer technology.
Buzbee, B L; Ewald, R H; Worlton, W J
1982-12-17
Under the auspices of the Ministry for International Trade and Industry the Japanese have launched a National Superspeed Computer Project intended to produce high-performance computers for scientific computation and a Fifth-Generation Computer Project intended to incorporate and exploit concepts of artificial intelligence. If these projects are successful, which appears likely, advanced economic and military research in the United States may become dependent on access to supercomputers of foreign manufacture.
Comprehensive efficiency analysis of supercomputer resource usage based on system monitoring data
NASA Astrophysics Data System (ADS)
Mamaeva, A. A.; Shaykhislamov, D. I.; Voevodin, Vad V.; Zhumatiy, S. A.
2018-03-01
One of the main problems of modern supercomputers is the low efficiency of their usage, which leads to the significant idle time of computational resources, and, in turn, to the decrease in speed of scientific research. This paper presents three approaches to study the efficiency of supercomputer resource usage based on monitoring data analysis. The first approach performs an analysis of computing resource utilization statistics, which allows to identify different typical classes of programs, to explore the structure of the supercomputer job flow and to track overall trends in the supercomputer behavior. The second approach is aimed specifically at analyzing off-the-shelf software packages and libraries installed on the supercomputer, since efficiency of their usage is becoming an increasingly important factor for the efficient functioning of the entire supercomputer. Within the third approach, abnormal jobs – jobs with abnormally inefficient behavior that differs significantly from the standard behavior of the overall supercomputer job flow – are being detected. For each approach, the results obtained in practice in the Supercomputer Center of Moscow State University are demonstrated.
Optimization of Supercomputer Use on EADS II System
NASA Technical Reports Server (NTRS)
Ahmed, Ardsher
1998-01-01
The main objective of this research was to optimize supercomputer use to achieve better throughput and utilization of supercomputers and to help facilitate the movement of non-supercomputing (inappropriate for supercomputer) codes to mid-range systems for better use of Government resources at Marshall Space Flight Center (MSFC). This work involved the survey of architectures available on EADS II and monitoring customer (user) applications running on a CRAY T90 system.
Supercomputer applications in molecular modeling.
Gund, T M
1988-01-01
An overview of the functions performed by molecular modeling is given. Molecular modeling techniques benefiting from supercomputing are described, namely, conformation, search, deriving bioactive conformations, pharmacophoric pattern searching, receptor mapping, and electrostatic properties. The use of supercomputers for problems that are computationally intensive, such as protein structure prediction, protein dynamics and reactivity, protein conformations, and energetics of binding is also examined. The current status of supercomputing and supercomputer resources are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De, K; Jha, S; Klimentov, A
2016-01-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), MIRA supercomputer at Argonne Leadership Computing Facilities (ALCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava and others). Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full production for the ATLAS experiment since September 2015. We will present our current accomplishments with running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.« less
A Heterogeneous High-Performance System for Computational and Computer Science
2016-11-15
Patents Submitted Patents Awarded Awards Graduate Students Names of Post Doctorates Names of Faculty Supported Names of Under Graduate students supported...team of research faculty from the departments of computer science and natural science at Bowie State University. The supercomputer is not only to...accelerated HPC systems. The supercomputer is also ideal for the research conducted in the Department of Natural Science, as research faculty work on
Supercomputing Sheds Light on the Dark Universe
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Salman; Heitmann, Katrin
2012-11-15
At Argonne National Laboratory, scientists are using supercomputers to shed light on one of the great mysteries in science today, the Dark Universe. With Mira, a petascale supercomputer at the Argonne Leadership Computing Facility, a team led by physicists Salman Habib and Katrin Heitmann will run the largest, most complex simulation of the universe ever attempted. By contrasting the results from Mira with state-of-the-art telescope surveys, the scientists hope to gain new insights into the distribution of matter in the universe, advancing future investigations of dark energy and dark matter into a new realm. The team's research was named amore » finalist for the 2012 Gordon Bell Prize, an award recognizing outstanding achievement in high-performance computing.« less
NASA Technical Reports Server (NTRS)
Murman, E. M. (Editor); Abarbanel, S. S. (Editor)
1985-01-01
Current developments and future trends in the application of supercomputers to computational fluid dynamics are discussed in reviews and reports. Topics examined include algorithm development for personal-size supercomputers, a multiblock three-dimensional Euler code for out-of-core and multiprocessor calculations, simulation of compressible inviscid and viscous flow, high-resolution solutions of the Euler equations for vortex flows, algorithms for the Navier-Stokes equations, and viscous-flow simulation by FEM and related techniques. Consideration is given to marching iterative methods for the parabolized and thin-layer Navier-Stokes equations, multigrid solutions to quasi-elliptic schemes, secondary instability of free shear flows, simulation of turbulent flow, and problems connected with weather prediction.
Integration of Panda Workload Management System with supercomputers
NASA Astrophysics Data System (ADS)
De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Novikov, A.; Oleynik, D.; Panitkin, S.; Poyda, A.; Read, K. F.; Ryabinkin, E.; Teslyuk, A.; Velikhov, V.; Wells, J. C.; Wenaus, T.
2016-09-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), Supercomputer at the National Research Center "Kurchatov Institute", IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run singlethreaded workloads in parallel on Titan's multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms. We will present our current accomplishments in running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facility's infrastructure for High Energy and Nuclear Physics, as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2010 CFR
2010-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2014 CFR
2014-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2012 CFR
2012-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2013 CFR
2013-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2011 CFR
2011-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
Data-intensive computing on numerically-insensitive supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahrens, James P; Fasel, Patricia K; Habib, Salman
2010-12-03
With the advent of the era of petascale supercomputing, via the delivery of the Roadrunner supercomputing platform at Los Alamos National Laboratory, there is a pressing need to address the problem of visualizing massive petascale-sized results. In this presentation, I discuss progress on a number of approaches including in-situ analysis, multi-resolution out-of-core streaming and interactive rendering on the supercomputing platform. These approaches are placed in context by the emerging area of data-intensive supercomputing.
Computer Electromagnetics and Supercomputer Architecture
NASA Technical Reports Server (NTRS)
Cwik, Tom
1993-01-01
The dramatic increase in performance over the last decade for microporcessor computations is compared with that for the supercomputer computations. This performance, the projected performance, and a number of other issues such as cost and the inherent pysical limitations in curent supercomputer technology have naturally led to parallel supercomputers and ensemble of interconnected microprocessors.
NASA's Participation in the National Computational Grid
NASA Technical Reports Server (NTRS)
Feiereisen, William J.; Zornetzer, Steve F. (Technical Monitor)
1998-01-01
Over the last several years it has become evident that the character of NASA's supercomputing needs has changed. One of the major missions of the agency is to support the design and manufacture of aero- and space-vehicles with technologies that will significantly reduce their cost. It is becoming clear that improvements in the process of aerospace design and manufacturing will require a high performance information infrastructure that allows geographically dispersed teams to draw upon resources that are broader than traditional supercomputing. A computational grid draws together our information resources into one system. We can foresee the time when a Grid will allow engineers and scientists to use the tools of supercomputers, databases and on line experimental devices in a virtual environment to collaborate with distant colleagues. The concept of a computational grid has been spoken of for many years, but several events in recent times are conspiring to allow us to actually build one. In late 1997 the National Science Foundation initiated the Partnerships for Advanced Computational Infrastructure (PACI) which is built around the idea of distributed high performance computing. The Alliance lead, by the National Computational Science Alliance (NCSA), and the National Partnership for Advanced Computational Infrastructure (NPACI), lead by the San Diego Supercomputing Center, have been instrumental in drawing together the "Grid Community" to identify the technology bottlenecks and propose a research agenda to address them. During the same period NASA has begun to reformulate parts of two major high performance computing research programs to concentrate on distributed high performance computing and has banded together with the PACI centers to address the research agenda in common.
NASA's Pleiades Supercomputer Crunches Data For Groundbreaking Analysis and Visualizations
2016-11-23
The Pleiades supercomputer at NASA's Ames Research Center, recently named the 13th fastest computer in the world, provides scientists and researchers high-fidelity numerical modeling of complex systems and processes. By using detailed analyses and visualizations of large-scale data, Pleiades is helping to advance human knowledge and technology, from designing the next generation of aircraft and spacecraft to understanding the Earth's climate and the mysteries of our galaxy.
Edison - A New Cray Supercomputer Advances Discovery at NERSC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy
2014-02-06
When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Edison - A New Cray Supercomputer Advances Discovery at NERSC
Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy; Trebotich, David; Broughton, Jeff; Antypas, Katie; Lukic, Zarija, Borrill, Julian; Draney, Brent; Chen, Jackie
2018-01-16
When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Ellingson, Sally R; Dakshanamurthy, Sivanesan; Brown, Milton; Smith, Jeremy C; Baudry, Jerome
2014-04-25
In this paper we give the current state of high-throughput virtual screening. We describe a case study of using a task-parallel MPI (Message Passing Interface) version of Autodock4 [1], [2] to run a virtual high-throughput screen of one-million compounds on the Jaguar Cray XK6 Supercomputer at Oak Ridge National Laboratory. We include a description of scripts developed to increase the efficiency of the predocking file preparation and postdocking analysis. A detailed tutorial, scripts, and source code for this MPI version of Autodock4 are available online at http://www.bio.utk.edu/baudrylab/autodockmpi.htm.
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
NASA Astrophysics Data System (ADS)
Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.
2015-12-01
Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
20th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 20th edition of the TOP500 list of the world's fastest supercomputers was released today (November 15, 2002). The Earth Simulator supercomputer installed earlier this year at the Earth Simulator Center in Yokohama, Japan, is with its Linpack benchmark performance of 35.86 Tflop/s (trillions of calculations per second) retains the number one position. The No.2 and No.3 positions are held by two new, identical ASCI Q systems at Los Alamos National Laboratorymore » (7.73Tflop/s each). These systems are built by Hewlett-Packard and based on the Alpha Server SC computer system.« less
Japanese project aims at supercomputer that executes 10 gflops
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burskey, D.
1984-05-03
Dubbed supercom by its multicompany design team, the decade-long project's goal is an engineering supercomputer that can execute 10 billion floating-point operations/s-about 20 times faster than today's supercomputers. The project, guided by Japan's Ministry of International Trade and Industry (MITI) and the Agency of Industrial Science and Technology encompasses three parallel research programs, all aimed at some angle of the superconductor. One program should lead to superfast logic and memory circuits, another to a system architecture that will afford the best performance, and the last to the software that will ultimately control the computer. The work on logic and memorymore » chips is based on: GAAS circuit; Josephson junction devices; and high electron mobility transistor structures. The architecture will involve parallel processing.« less
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Compute Server Performance Results
NASA Technical Reports Server (NTRS)
Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)
1994-01-01
Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,
Opportunities for leveraging OS virtualization in high-end supercomputing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bridges, Patrick G.; Pedretti, Kevin Thomas Tauke
2010-11-01
This paper examines potential motivations for incorporating virtualization support in the system software stacks of high-end capability supercomputers. We advocate that this will increase the flexibility of these platforms significantly and enable new capabilities that are not possible with current fixed software stacks. Our results indicate that compute, virtual memory, and I/O virtualization overheads are low and can be further mitigated by utilizing well-known techniques such as large paging and VMM bypass. Furthermore, since the addition of virtualization support does not affect the performance of applications using the traditional native environment, there is essentially no disadvantage to its addition.
Unified, Cross-Platform, Open-Source Library Package for High-Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kozacik, Stephen
Compute power is continually increasing, but this increased performance is largely found in sophisticated computing devices and supercomputer resources that are difficult to use, resulting in under-utilization. We developed a unified set of programming tools that will allow users to take full advantage of the new technology by allowing them to work at a level abstracted away from the platform specifics, encouraging the use of modern computing systems, including government-funded supercomputer facilities.
NASA Astrophysics Data System (ADS)
Tripathi, Vijay S.; Yeh, G. T.
1993-06-01
Sophisticated and highly computation-intensive models of transport of reactive contaminants in groundwater have been developed in recent years. Application of such models to real-world contaminant transport problems, e.g., simulation of groundwater transport of 10-15 chemically reactive elements (e.g., toxic metals) and relevant complexes and minerals in two and three dimensions over a distance of several hundred meters, requires high-performance computers including supercomputers. Although not widely recognized as such, the computational complexity and demand of these models compare with well-known computation-intensive applications including weather forecasting and quantum chemical calculations. A survey of the performance of a variety of available hardware, as measured by the run times for a reactive transport model HYDROGEOCHEM, showed that while supercomputers provide the fastest execution times for such problems, relatively low-cost reduced instruction set computer (RISC) based scalar computers provide the best performance-to-price ratio. Because supercomputers like the Cray X-MP are inherently multiuser resources, often the RISC computers also provide much better turnaround times. Furthermore, RISC-based workstations provide the best platforms for "visualization" of groundwater flow and contaminant plumes. The most notable result, however, is that current workstations costing less than $10,000 provide performance within a factor of 5 of a Cray X-MP.
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H.
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less
INTEGRATION OF PANDA WORKLOAD MANAGEMENT SYSTEM WITH SUPERCOMPUTERS
DOE Office of Scientific and Technical Information (OSTI.GOV)
De, K; Jha, S; Maeno, T
Abstract The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the funda- mental nature of matter and the basic forces that shape our universe, and were recently credited for the dis- covery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Datamore » Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data cen- ters are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Com- puting Facility (OLCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single- threaded workloads in parallel on Titan s multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms. We will present our current accom- plishments in running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facility s infrastructure for High Energy and Nuclear Physics, as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.« less
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-08-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-01-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089
Affordable and accurate large-scale hybrid-functional calculations on GPU-accelerated supercomputers
NASA Astrophysics Data System (ADS)
Ratcliff, Laura E.; Degomme, A.; Flores-Livas, José A.; Goedecker, Stefan; Genovese, Luigi
2018-03-01
Performing high accuracy hybrid functional calculations for condensed matter systems containing a large number of atoms is at present computationally very demanding or even out of reach if high quality basis sets are used. We present a highly optimized multiple graphics processing unit implementation of the exact exchange operator which allows one to perform fast hybrid functional density-functional theory (DFT) calculations with systematic basis sets without additional approximations for up to a thousand atoms. With this method hybrid DFT calculations of high quality become accessible on state-of-the-art supercomputers within a time-to-solution that is of the same order of magnitude as traditional semilocal-GGA functionals. The method is implemented in a portable open-source library.
Automatic discovery of the communication network topology for building a supercomputer model
NASA Astrophysics Data System (ADS)
Sobolev, Sergey; Stefanov, Konstantin; Voevodin, Vadim
2016-10-01
The Research Computing Center of Lomonosov Moscow State University is developing the Octotron software suite for automatic monitoring and mitigation of emergency situations in supercomputers so as to maximize hardware reliability. The suite is based on a software model of the supercomputer. The model uses a graph to describe the computing system components and their interconnections. One of the most complex components of a supercomputer that needs to be included in the model is its communication network. This work describes the proposed approach for automatically discovering the Ethernet communication network topology in a supercomputer and its description in terms of the Octotron model. This suite automatically detects computing nodes and switches, collects information about them and identifies their interconnections. The application of this approach is demonstrated on the "Lomonosov" and "Lomonosov-2" supercomputers.
Parallel-Vector Algorithm For Rapid Structural Anlysis
NASA Technical Reports Server (NTRS)
Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.
1993-01-01
New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
Toward a Proof of Concept Cloud Framework for Physics Applications on Blue Gene Supercomputers
NASA Astrophysics Data System (ADS)
Dreher, Patrick; Scullin, William; Vouk, Mladen
2015-09-01
Traditional high performance supercomputers are capable of delivering large sustained state-of-the-art computational resources to physics applications over extended periods of time using batch processing mode operating environments. However, today there is an increasing demand for more complex workflows that involve large fluctuations in the levels of HPC physics computational requirements during the simulations. Some of the workflow components may also require a richer set of operating system features and schedulers than normally found in a batch oriented HPC environment. This paper reports on progress toward a proof of concept design that implements a cloud framework onto BG/P and BG/Q platforms at the Argonne Leadership Computing Facility. The BG/P implementation utilizes the Kittyhawk utility and the BG/Q platform uses an experimental heterogeneous FusedOS operating system environment. Both platforms use the Virtual Computing Laboratory as the cloud computing system embedded within the supercomputer. This proof of concept design allows a cloud to be configured so that it can capitalize on the specialized infrastructure capabilities of a supercomputer and the flexible cloud configurations without resorting to virtualization. Initial testing of the proof of concept system is done using the lattice QCD MILC code. These types of user reconfigurable environments have the potential to deliver experimental schedulers and operating systems within a working HPC environment for physics computations that may be different from the native OS and schedulers on production HPC supercomputers.
Automotive applications of superconductors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ginsberg, M.
1987-01-01
These proceedings compile papers on supercomputers in the automobile industry. Titles include: An automotive engineer's guide to the effective use of scalar, vector, and parallel computers; fluid mechanics, finite elements, and supercomputers; and Automotive crashworthiness performance on a supercomputer.
NASA Astrophysics Data System (ADS)
Fukazawa, K.; Walker, R. J.; Kimura, T.; Tsuchiya, F.; Murakami, G.; Kita, H.; Tao, C.; Murata, K. T.
2016-12-01
Planetary magnetospheres are very large, while phenomena within them occur on meso- and micro-scales. These scales range from 10s of planetary radii to kilometers. To understand dynamics in these multi-scale systems, numerical simulations have been performed by using the supercomputer systems. We have studied the magnetospheres of Earth, Jupiter and Saturn by using 3-dimensional magnetohydrodynamic (MHD) simulations for a long time, however, we have not obtained the phenomena near the limits of the MHD approximation. In particular, we have not studied meso-scale phenomena that can be addressed by using MHD.Recently we performed our MHD simulation of Earth's magnetosphere by using the K-computer which is the first 10PFlops supercomputer and obtained multi-scale flow vorticity for the both northward and southward IMF. Furthermore, we have access to supercomputer systems which have Xeon, SPARC64, and vector-type CPUs and can compare simulation results between the different systems. Finally, we have compared the results of our parameter survey of the magnetosphere with observations from the HISAKI spacecraft.We have encountered a number of difficulties effectively using the latest supercomputer systems. First the size of simulation output increases greatly. Now a simulation group produces over 1PB of output. Storage and analysis of this much data is difficult. The traditional way to analyze simulation results is to move the results to the investigator's home computer. This takes over three months using an end-to-end 10Gbps network. In reality, there are problems at some nodes such as firewalls that can increase the transfer time to over one year. Another issue is post-processing. It is hard to treat a few TB of simulation output due to the memory limitations of a post-processing computer. To overcome these issues, we have developed and introduced the parallel network storage, the highly efficient network protocol and the CUI based visualization tools.In this study, we will show the latest simulation results using the petascale supercomputer and problems from the use of these supercomputer systems.
NASA Astrophysics Data System (ADS)
Klimentov, A.; De, K.; Jha, S.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Wells, J.; Wenaus, T.
2016-10-01
The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than grid can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility. Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full pro duction for the ATLAS since September 2015. We will present our current accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.
Essential issues in multiprocessor systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gajski, D.D.; Peir, J.K.
1985-06-01
During the past several years, a great number of proposals have been made with the objective to increase supercomputer performance by an order of magnitude on the basis of a utilization of new computer architectures. The present paper is concerned with a suitable classification scheme for comparing these architectures. It is pointed out that there are basically four schools of thought as to the most important factor for an enhancement of computer performance. According to one school, the development of faster circuits will make it possible to retain present architectures, except, possibly, for a mechanism providing synchronization of parallel processes.more » A second school assigns priority to the optimization and vectorization of compilers, which will detect parallelism and help users to write better parallel programs. A third school believes in the predominant importance of new parallel algorithms, while the fourth school supports new models of computation. The merits of the four approaches are critically evaluated. 50 references.« less
Particle simulation on heterogeneous distributed supercomputers
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.; Dagum, Leonardo
1993-01-01
We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks: a high-performance parallel interface (HIPPI) and an optical network (UltraNet). This is the first application to use this configuration at NASA Ames Research Center. We describe our experience implementing and using the application and report the results of several timing measurements. We show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.
Improved Access to Supercomputers Boosts Chemical Applications.
ERIC Educational Resources Information Center
Borman, Stu
1989-01-01
Supercomputing is described in terms of computing power and abilities. The increase in availability of supercomputers for use in chemical calculations and modeling are reported. Efforts of the National Science Foundation and Cray Research are highlighted. (CW)
Sign: large-scale gene network estimation environment for high performance computing.
Tamada, Yoshinori; Shimamura, Teppei; Yamaguchi, Rui; Imoto, Seiya; Nagasaki, Masao; Miyano, Satoru
2011-01-01
Our research group is currently developing software for estimating large-scale gene networks from gene expression data. The software, called SiGN, is specifically designed for the Japanese flagship supercomputer "K computer" which is planned to achieve 10 petaflops in 2012, and other high performance computing environments including Human Genome Center (HGC) supercomputer system. SiGN is a collection of gene network estimation software with three different sub-programs: SiGN-BN, SiGN-SSM and SiGN-L1. In these three programs, five different models are available: static and dynamic nonparametric Bayesian networks, state space models, graphical Gaussian models, and vector autoregressive models. All these models require a huge amount of computational resources for estimating large-scale gene networks and therefore are designed to be able to exploit the speed of 10 petaflops. The software will be available freely for "K computer" and HGC supercomputer system users. The estimated networks can be viewed and analyzed by Cell Illustrator Online and SBiP (Systems Biology integrative Pipeline). The software project web site is available at http://sign.hgc.jp/ .
Optical clock distribution in supercomputers using polyimide-based waveguides
NASA Astrophysics Data System (ADS)
Bihari, Bipin; Gan, Jianhua; Wu, Linghui; Liu, Yujie; Tang, Suning; Chen, Ray T.
1999-04-01
Guided-wave optics is a promising way to deliver high-speed clock-signal in supercomputer with minimized clock-skew. Si- CMOS compatible polymer-based waveguides for optoelectronic interconnects and packaging have been fabricated and characterized. A 1-to-48 fanout optoelectronic interconnection layer (OIL) structure based on Ultradel 9120/9020 for the high-speed massive clock signal distribution for a Cray T-90 supercomputer board has been constructed. The OIL employs multimode polymeric channel waveguides in conjunction with surface-normal waveguide output coupler and 1-to-2 splitters. Surface-normal couplers can couple the optical clock signals into and out from the H-tree polyimide waveguides surface-normally, which facilitates the integration of photodetectors to convert optical-signal to electrical-signal. A 45-degree surface- normal couplers has been integrated at each output end. The measured output coupling efficiency is nearly 100 percent. The output profile from 45-degree surface-normal coupler were calculated using Fresnel approximation. the theoretical result is in good agreement with experimental result. A total insertion loss of 7.98 dB at 850 nm was measured experimentally.
A History of High-Performance Computing
NASA Technical Reports Server (NTRS)
2006-01-01
Faster than most speedy computers. More powerful than its NASA data-processing predecessors. Able to leap large, mission-related computational problems in a single bound. Clearly, it s neither a bird nor a plane, nor does it need to don a red cape, because it s super in its own way. It's Columbia, NASA s newest supercomputer and one of the world s most powerful production/processing units. Named Columbia to honor the STS-107 Space Shuttle Columbia crewmembers, the new supercomputer is making it possible for NASA to achieve breakthroughs in science and engineering, fulfilling the Agency s missions, and, ultimately, the Vision for Space Exploration. Shortly after being built in 2004, Columbia achieved a benchmark rating of 51.9 teraflop/s on 10,240 processors, making it the world s fastest operational computer at the time of completion. Putting this speed into perspective, 20 years ago, the most powerful computer at NASA s Ames Research Center, home of the NASA Advanced Supercomputing Division (NAS), ran at a speed of about 1 gigaflop (one billion calculations per second). The Columbia supercomputer is 50,000 times faster than this computer and offers a tenfold increase in capacity over the prior system housed at Ames. What s more, Columbia is considered the world s largest Linux-based, shared-memory system. The system is offering immeasurable benefits to society and is the zenith of years of NASA/private industry collaboration that has spawned new generations of commercial, high-speed computing systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murphy, Richard C.
2009-09-01
This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential ofmore » PIM-based architectures to support mission critical Sandia applications and an emerging class of more data intensive informatics applications. This work has resulted in a stronger architecture/implementation collaboration between 1400 and 1700. Additionally, key technology components have impacted vendor roadmaps, and we are in the process of pursuing these new collaborations. This work has the potential to impact future supercomputer design and construction, reducing power and increasing performance. This final report is organized as follow: this summary chapter discusses the impact of the project (Section 1), provides an enumeration of publications and other public discussion of the work (Section 1), and concludes with a discussion of future work and impact from the project (Section 1). The appendix contains reprints of the refereed publications resulting from this work.« less
Towards Efficient Supercomputing: Searching for the Right Efficiency Metric
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hsu, Chung-Hsing; Kuehn, Jeffery A; Poole, Stephen W
2012-01-01
The efficiency of supercomputing has traditionally been in the execution time. In early 2000 s, the concept of total cost of ownership was re-introduced, with the introduction of efficiency measure to include aspects such as energy and space. Yet the supercomputing community has never agreed upon a metric that can cover these aspects altogether and also provide a fair basis for comparison. This paper exam- ines the metrics that have been proposed in the past decade, and proposes a vector-valued metric for efficient supercom- puting. Using this metric, the paper presents a study of where the supercomputing industry has beenmore » and how it stands today with respect to efficient supercomputing.« less
OpenMP Performance on the Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Haoqiang, Jin; Hood, Robert
2005-01-01
This presentation discusses Columbia World Class Supercomputer which is one of the world's fastest supercomputers providing 61 TFLOPs (10/20/04). Conceived, designed, built, and deployed in just 120 days. A 20-node supercomputer built on proven 512-processor nodes. The largest SGI system in the world with over 10,000 Intel Itanium 2 processors and provides the largest node size incorporating commodity parts (512) and the largest shared-memory environment (2048) with 88% efficiency tops the scalar systems on the Top500 list.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Timothy J.
2016-03-01
While benchmarking software is useful for testing the performance limits and stability of Argonne National Laboratory’s new Theta supercomputer, there is no substitute for running real applications to explore the system’s potential. The Argonne Leadership Computing Facility’s Theta Early Science Program, modeled after its highly successful code migration program for the Mira supercomputer, has one primary aim: to deliver science on day one. Here is a closer look at the type of science problems that will be getting early access to Theta, a next-generation machine being rolled out this year.
Merging the Machines of Modern Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolf, Laura; Collins, Jim
Two recent projects have harnessed supercomputing resources at the US Department of Energy’s Argonne National Laboratory in a novel way to support major fusion science and particle collider experiments. Using leadership computing resources, one team ran fine-grid analysis of real-time data to make near-real-time adjustments to an ongoing experiment, while a second team is working to integrate Argonne’s supercomputers into the Large Hadron Collider/ATLAS workflow. Together these efforts represent a new paradigm of the high-performance computing center as a partner in experimental science.
High performance computing applications in neurobiological research
NASA Technical Reports Server (NTRS)
Ross, Muriel D.; Cheng, Rei; Doshay, David G.; Linton, Samuel W.; Montgomery, Kevin; Parnas, Bruce R.
1994-01-01
The human nervous system is a massively parallel processor of information. The vast numbers of neurons, synapses and circuits is daunting to those seeking to understand the neural basis of consciousness and intellect. Pervading obstacles are lack of knowledge of the detailed, three-dimensional (3-D) organization of even a simple neural system and the paucity of large scale, biologically relevant computer simulations. We use high performance graphics workstations and supercomputers to study the 3-D organization of gravity sensors as a prototype architecture foreshadowing more complex systems. Scaled-down simulations run on a Silicon Graphics workstation and scale-up, three-dimensional versions run on the Cray Y-MP and CM5 supercomputers.
Multi-petascale highly efficient parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time andmore » supports DMA functionality allowing for parallel processing message-passing.« less
Close to real life. [solving for transonic flow about lifting airfoils using supercomputers
NASA Technical Reports Server (NTRS)
Peterson, Victor L.; Bailey, F. Ron
1988-01-01
NASA's Numerical Aerodynamic Simulation (NAS) facility for CFD modeling of highly complex aerodynamic flows employs as its basic hardware two Cray-2s, an ETA-10 Model Q, an Amdahl 5880 mainframe computer that furnishes both support processing and access to 300 Gbytes of disk storage, several minicomputers and superminicomputers, and a Thinking Machines 16,000-device 'connection machine' processor. NAS, which was the first supercomputer facility to standardize operating-system and communication software on all processors, has done important Space Shuttle aerodynamics simulations and will be critical to the configurational refinement of the National Aerospace Plane and its intergrated powerplant, which will involve complex, high temperature reactive gasdynamic computations.
Role of High-End Computing in Meeting NASA's Science and Engineering Challenges
NASA Technical Reports Server (NTRS)
Biswas, Rupak
2006-01-01
High-End Computing (HEC) has always played a major role in meeting the modeling and simulation needs of various NASA missions. With NASA's newest 62 teraflops Columbia supercomputer, HEC is having an even greater impact within the Agency and beyond. Significant cutting-edge science and engineering simulations in the areas of space exploration, Shuttle operations, Earth sciences, and aeronautics research, are already occurring on Columbia, demonstrating its ability to accelerate NASA s exploration vision. The talk will describe how the integrated supercomputing production environment is being used to reduce design cycle time, accelerate scientific discovery, conduct parametric analysis of multiple scenarios, and enhance safety during the life cycle of NASA missions.
Supercomputer networking for space science applications
NASA Technical Reports Server (NTRS)
Edelson, B. I.
1992-01-01
The initial design of a supercomputer network topology including the design of the communications nodes along with the communications interface hardware and software is covered. Several space science applications that are proposed experiments by GSFC and JPL for a supercomputer network using the NASA ACTS satellite are also reported.
High Performance Computing at NASA
NASA Technical Reports Server (NTRS)
Bailey, David H.; Cooper, D. M. (Technical Monitor)
1994-01-01
The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
Most Social Scientists Shun Free Use of Supercomputers.
ERIC Educational Resources Information Center
Kiernan, Vincent
1998-01-01
Social scientists, who frequently complain that the federal government spends too little on them, are passing up what scholars in the physical and natural sciences see as the government's best give-aways: free access to supercomputers. Some social scientists say the supercomputers are difficult to use; others find desktop computers provide…
A fault tolerant spacecraft supercomputer to enable a new class of scientific discovery
NASA Technical Reports Server (NTRS)
Katz, D. S.; McVittie, T. I.; Silliman, A. G., Jr.
2000-01-01
The goal of the Remote Exploration and Experimentation (REE) Project is to move supercomputeing into space in a coste effective manner and to allow the use of inexpensive, state of the art, commercial-off-the-shelf components and subsystems in these space-based supercomputers.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Pope, Adrian; Finkel, Hal; Frontiere, Nicholas; Heitmann, Katrin; Daniel, David; Fasel, Patricia; Morozov, Vitali; Zagaris, George; Peterka, Tom; Vishwanath, Venkatram; Lukić, Zarija; Sehrish, Saba; Liao, Wei-keng
2016-01-01
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Salman; Pope, Adrian; Finkel, Hal
2016-01-01
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the ‘Dark Universe’, dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers thatmore » enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC’s design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.« less
A Look at the Impact of High-End Computing Technologies on NASA Missions
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Dunbar, Jill; Hardman, John; Bailey, F. Ron; Wheeler, Lorien; Rogers, Stuart
2012-01-01
From its bold start nearly 30 years ago and continuing today, the NASA Advanced Supercomputing (NAS) facility at Ames Research Center has enabled remarkable breakthroughs in the space agency s science and engineering missions. Throughout this time, NAS experts have influenced the state-of-the-art in high-performance computing (HPC) and related technologies such as scientific visualization, system benchmarking, batch scheduling, and grid environments. We highlight the pioneering achievements and innovations originating from and made possible by NAS resources and know-how, from early supercomputing environment design and software development, to long-term simulation and analyses critical to design safe Space Shuttle operations and associated spinoff technologies, to the highly successful Kepler Mission s discovery of new planets now capturing the world s imagination.
An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.
Yang, Xi; Wu, Chengkun; Lu, Kai; Fang, Lin; Zhang, Yong; Li, Shengkang; Guo, Guixin; Du, YunFei
2017-12-01
Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the "allocate-when-needed" paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.
Distributed user services for supercomputers
NASA Technical Reports Server (NTRS)
Sowizral, Henry A.
1989-01-01
User-service operations at supercomputer facilities are examined. The question is whether a single, possibly distributed, user-services organization could be shared by NASA's supercomputer sites in support of a diverse, geographically dispersed, user community. A possible structure for such an organization is identified as well as some of the technologies needed in operating such an organization.
Aviation Research and the Internet
NASA Technical Reports Server (NTRS)
Scott, Antoinette M.
1995-01-01
The Internet is a network of networks. It was originally funded by the Defense Advanced Research Projects Agency or DOD/DARPA and evolved in part from the connection of supercomputer sites across the United States. The National Science Foundation (NSF) made the most of their supercomputers by connecting the sites to each other. This made the supercomputers more efficient and now allows scientists, engineers and researchers to access the supercomputers from their own labs and offices. The high speed networks that connect the NSF supercomputers form the backbone of the Internet. The World Wide Web (WWW) is a menu system. It gathers Internet resources from all over the world into a series of screens that appear on your computer. The WWW is also a distributed. The distributed system stores data information on many computers (servers). These servers can go out and get data when you ask for it. Hypermedia is the base of the WWW. One can 'click' on a section and visit other hypermedia (pages). Our approach to demonstrating the importance of aviation research through the Internet began with learning how to put pages on the Internet (on-line) ourselves. We were assigned two aviation companies; Vision Micro Systems Inc. and Innovative Aerodynamic Technologies (IAT). We developed home pages for these SBIR companies. The equipment used to create the pages were the UNIX and Macintosh machines. HTML Supertext software was used to write the pages and the Sharp JX600S scanner to scan the images. As a result, with the use of the UNIX, Macintosh, Sun, PC, and AXIL machines, we were able to present our home pages to over 800,000 visitors.
Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-01-01
This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-03-10
This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Towards future high performance computing: What will change? How can we be efficient?
NASA Astrophysics Data System (ADS)
Düben, Peter
2017-04-01
How can we make the most out of "exascale" supercomputers that will be available soon and enable us to calculate an amazing number of 1,000,000,000,000,000,000 real numbers operations within a single second? How do we need to design applications to use these machines efficiently? What are the limits? We will discuss opportunities and limits of the use of future high performance computers from the perspective of Earth System Modelling. We will provide an overview about future challenges and outline how numerical application will need to be changed to run efficiently on supercomputers in the future. We will also discuss how different disciplines can support each other and talk about data handling and numerical precision of data.
Input/output behavior of supercomputing applications
NASA Technical Reports Server (NTRS)
Miller, Ethan L.
1991-01-01
The collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations are described. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designer to optimize I/O hardware and file system algorithms to that model. Second, the buffering simulations show what resources are needed to maximize the CPU utilization of a supercomputer given a very bursty I/O request rate. By using read-ahead and write-behind in a large solid stated disk, one or two applications were sufficient to fully utilize a Cray Y-MP CPU.
Prospects for Boiling of Subcooled Dielectric Liquids for Supercomputer Cooling
NASA Astrophysics Data System (ADS)
Zeigarnik, Yu. A.; Vasil'ev, N. V.; Druzhinin, E. A.; Kalmykov, I. V.; Kosoi, A. S.; Khodakov, K. A.
2018-02-01
It is shown experimentally that using forced-convection boiling of dielectric coolants of the Novec 649 Refrigerant subcooled relative to the saturation temperature makes possible removing heat flow rates up to 100 W/cm2 from modern supercomputer chip interface. This fact creates prerequisites for the application of dielectric liquids in cooling systems of modern supercomputers with increased requirements for their operating reliability.
BigData and computing challenges in high energy and nuclear physics
NASA Astrophysics Data System (ADS)
Klimentov, A.; Grigorieva, M.; Kiryanov, A.; Zarochentsev, A.
2017-06-01
In this contribution we discuss the various aspects of the computing resource needs experiments in High Energy and Nuclear Physics, in particular at the Large Hadron Collider. This will evolve in the future when moving from LHC to HL-LHC in ten years from now, when the already exascale levels of data we are processing could increase by a further order of magnitude. The distributed computing environment has been a great success and the inclusion of new super-computing facilities, cloud computing and volunteering computing for the future is a big challenge, which we are successfully mastering with a considerable contribution from many super-computing centres around the world, academic and commercial cloud providers. We also discuss R&D computing projects started recently in National Research Center ``Kurchatov Institute''
Jungheim, L N; Boyd, D B; Indelicato, J M; Pasini, C E; Preston, D A; Alborn, W E
1991-05-01
Bicyclic tetrahydropyridazinones, such as 13, where X are strongly electron-withdrawing groups, were synthesized to investigate their antibacterial activity. These delta-lactams are homologues of bicyclic pyrazolidinones 15, which were the first non-beta-lactam containing compounds reported to bind to penicillin-binding proteins (PBPs). The delta-lactam compounds exhibit poor antibacterial activity despite having reactivity comparable to the gamma-lactams. Molecular modeling based on semiempirical molecular orbital calculations on a Cray X-MP supercomputer, predicted that the reason for the inactivity is steric bulk hindering high affinity of the compounds to PBPs, as well as high conformational flexibility of the tetrahydropyridazinone ring hampering effective alignment of the molecule in the active site. Subsequent PBP binding experiments confirmed that this class of compound does not bind to PBPs.
National Test Facility civilian agency use of supercomputers not feasible
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1994-12-01
Based on interviews with civilian agencies cited in the House report (DOE, DoEd, HHS, FEMA, NOAA), none would be able to make effective use of NTF`s excess supercomputing capabilities. These agencies stated they could not use the resources primarily because (1) NTF`s supercomputers are older machines whose performance and costs cannot match those of more advanced computers available from other sources and (2) some agencies have not yet developed applications requiring supercomputer capabilities or do not have funding to support such activities. In addition, future support for the hardware and software at NTF is uncertain, making any investment by anmore » outside user risky.« less
Kriging for Spatial-Temporal Data on the Bridges Supercomputer
NASA Astrophysics Data System (ADS)
Hodgess, E. M.
2017-12-01
Currently, kriging of spatial-temporal data is slow and limited to relatively small vector sizes. We have developed a method on the Bridges supercomputer, at the Pittsburgh supercomputer center, which uses a combination of the tools R, Fortran, the Message Passage Interface (MPI), OpenACC, and special R packages for big data. This combination of tools now permits us to complete tasks which could previously not be completed, or takes literally hours to complete. We ran simulation studies from a laptop against the supercomputer. We also look at "real world" data sets, such as the Irish wind data, and some weather data. We compare the timings. We note that the timings are suprising good.
Multiple DNA and protein sequence alignment on a workstation and a supercomputer.
Tajima, K
1988-11-01
This paper describes a multiple alignment method using a workstation and supercomputer. The method is based on the alignment of a set of aligned sequences with the new sequence, and uses a recursive procedure of such alignment. The alignment is executed in a reasonable computation time on diverse levels from a workstation to a supercomputer, from the viewpoint of alignment results and computational speed by parallel processing. The application of the algorithm is illustrated by several examples of multiple alignment of 12 amino acid and DNA sequences of HIV (human immunodeficiency virus) env genes. Colour graphic programs on a workstation and parallel processing on a supercomputer are discussed.
NASA Astrophysics Data System (ADS)
Belyaev, A.; Berezhnaya, A.; Betev, L.; Buncic, P.; De, K.; Drizhuk, D.; Klimentov, A.; Lazin, Y.; Lyalin, I.; Mashinistov, R.; Novikov, A.; Oleynik, D.; Polyakov, A.; Poyda, A.; Ryabinkin, E.; Teslyuk, A.; Tkachenko, I.; Yasnopolskiy, L.
2015-12-01
The LHC experiments are preparing for the precision measurements and further discoveries that will be made possible by higher LHC energies from April 2015 (LHC Run2). The need for simulation, data processing and analysis would overwhelm the expected capacity of grid infrastructure computing facilities deployed by the Worldwide LHC Computing Grid (WLCG). To meet this challenge the integration of the opportunistic resources into LHC computing model is highly important. The Tier-1 facility at Kurchatov Institute (NRC-KI) in Moscow is a part of WLCG and it will process, simulate and store up to 10% of total data obtained from ALICE, ATLAS and LHCb experiments. In addition Kurchatov Institute has supercomputers with peak performance 0.12 PFLOPS. The delegation of even a fraction of supercomputing resources to the LHC Computing will notably increase total capacity. In 2014 the development a portal combining a Tier-1 and a supercomputer in Kurchatov Institute was started to provide common interfaces and storage. The portal will be used not only for HENP experiments, but also by other data- and compute-intensive sciences like biology with genome sequencing analysis; astrophysics with cosmic rays analysis, antimatter and dark matter search, etc.
Argonne wins four R&D 100 Awards | Argonne National Laboratory
. High-Energy Concentration-Gradient Cathode Material for Plug-in Hybrids and All-Electric Vehicles converting discovery science into innovative, high-impact products, processes and systems." Globus scientific facilities (such as supercomputing centers and high energy physics experiments), cloud storage
Impact of the Columbia Supercomputer on NASA Space and Exploration Mission
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Kwak, Dochan; Kiris, Cetin; Lawrence, Scott
2006-01-01
NASA's 10,240-processor Columbia supercomputer gained worldwide recognition in 2004 for increasing the space agency's computing capability ten-fold, and enabling U.S. scientists and engineers to perform significant, breakthrough simulations. Columbia has amply demonstrated its capability to accelerate NASA's key missions, including space operations, exploration systems, science, and aeronautics. Columbia is part of an integrated high-end computing (HEC) environment comprised of massive storage and archive systems, high-speed networking, high-fidelity modeling and simulation tools, application performance optimization, and advanced data analysis and visualization. In this paper, we illustrate the impact Columbia is having on NASA's numerous space and exploration applications, such as the development of the Crew Exploration and Launch Vehicles (CEV/CLV), effects of long-duration human presence in space, and damage assessment and repair recommendations for remaining shuttle flights. We conclude by discussing HEC challenges that must be overcome to solve space-related science problems in the future.
NASA Technical Reports Server (NTRS)
Kutler, Paul; Yee, Helen
1987-01-01
Topics addressed include: numerical aerodynamic simulation; computational mechanics; supercomputers; aerospace propulsion systems; computational modeling in ballistics; turbulence modeling; computational chemistry; computational fluid dynamics; and computational astrophysics.
NAS technical summaries: Numerical aerodynamic simulation program, March 1991 - February 1992
NASA Technical Reports Server (NTRS)
1992-01-01
NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefiting other supercomputer centers in Government and industry. This report contains selected scientific results from the 1991-92 NAS Operational Year, March 4, 1991 to March 3, 1992, which is the fifth year of operation. During this year, the scientific community was given access to a Cray-2 and a Cray Y-MP. The Cray-2, the first generation supercomputer, has four processors, 256 megawords of central memory, and a total sustained speed of 250 million floating point operations per second. The Cray Y-MP, the second generation supercomputer, has eight processors and a total sustained speed of one billion floating point operations per second. Additional memory was installed this year, doubling capacity from 128 to 256 megawords of solid-state storage-device memory. Because of its higher performance, the Cray Y-MP delivered approximately 77 percent of the total number of supercomputer hours used during this year.
Seismic signal processing on heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Fichtner, Andreas
2015-04-01
The processing of seismic signals - including the correlation of massive ambient noise data sets - represents an important part of a wide range of seismological applications. It is characterized by large data volumes as well as high computational input/output intensity. Development of efficient approaches towards seismic signal processing on emerging high performance computing systems is therefore essential. Heterogeneous supercomputing systems introduced in the recent years provide numerous computing nodes interconnected via high throughput networks, every node containing a mix of processing elements of different architectures, like several sequential processor cores and one or a few graphical processing units (GPU) serving as accelerators. A typical representative of such computing systems is "Piz Daint", a supercomputer of the Cray XC 30 family operated by the Swiss National Supercomputing Center (CSCS), which we used in this research. Heterogeneous supercomputers provide an opportunity for manifold application performance increase and are more energy-efficient, however they have much higher hardware complexity and are therefore much more difficult to program. The programming effort may be substantially reduced by the introduction of modular libraries of software components that can be reused for a wide class of seismology applications. The ultimate goal of this research is design of a prototype for such library suitable for implementing various seismic signal processing applications on heterogeneous systems. As a representative use case we have chosen an ambient noise correlation application. Ambient noise interferometry has developed into one of the most powerful tools to image and monitor the Earth's interior. Future applications will require the extraction of increasingly small details from noise recordings. To meet this demand, more advanced correlation techniques combined with very large data volumes are needed. This poses new computational problems that require dedicated HPC solutions. The chosen application is using a wide range of common signal processing methods, which include various IIR filter designs, amplitude and phase correlation, computing the analytic signal, and discrete Fourier transforms. Furthermore, various processing methods specific for seismology, like rotation of seismic traces, are used. Efficient implementation of all these methods on the GPU-accelerated systems represents several challenges. In particular, it requires a careful distribution of work between the sequential processors and accelerators. Furthermore, since the application is designed to process very large volumes of data, special attention had to be paid to the efficient use of the available memory and networking hardware resources in order to reduce intensity of data input and output. In our contribution we will explain the software architecture as well as principal engineering decisions used to address these challenges. We will also describe the programming model based on C++ and CUDA that we used to develop the software. Finally, we will demonstrate performance improvements achieved by using the heterogeneous computing architecture. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d26.
Challenges in scaling NLO generators to leadership computers
NASA Astrophysics Data System (ADS)
Benjamin, D.; Childers, JT; Hoeche, S.; LeCompte, T.; Uram, T.
2017-10-01
Exascale computing resources are roughly a decade away and will be capable of 100 times more computing than current supercomputers. In the last year, Energy Frontier experiments crossed a milestone of 100 million core-hours used at the Argonne Leadership Computing Facility, Oak Ridge Leadership Computing Facility, and NERSC. The Fortran-based leading-order parton generator called Alpgen was successfully scaled to millions of threads to achieve this level of usage on Mira. Sherpa and MadGraph are next-to-leading order generators used heavily by LHC experiments for simulation. Integration times for high-multiplicity or rare processes can take a week or more on standard Grid machines, even using all 16-cores. We will describe our ongoing work to scale the Sherpa generator to thousands of threads on leadership-class machines and reduce run-times to less than a day. This work allows the experiments to leverage large-scale parallel supercomputers for event generation today, freeing tens of millions of grid hours for other work, and paving the way for future applications (simulation, reconstruction) on these and future supercomputers.
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Flow visualization of CFD using graphics workstations
NASA Technical Reports Server (NTRS)
Lasinski, Thomas; Buning, Pieter; Choi, Diana; Rogers, Stuart; Bancroft, Gordon
1987-01-01
High performance graphics workstations are used to visualize the fluid flow dynamics obtained from supercomputer solutions of computational fluid dynamic programs. The visualizations can be done independently on the workstation or while the workstation is connected to the supercomputer in a distributed computing mode. In the distributed mode, the supercomputer interactively performs the computationally intensive graphics rendering tasks while the workstation performs the viewing tasks. A major advantage of the workstations is that the viewers can interactively change their viewing position while watching the dynamics of the flow fields. An overview of the computer hardware and software required to create these displays is presented. For complex scenes the workstation cannot create the displays fast enough for good motion analysis. For these cases, the animation sequences are recorded on video tape or 16 mm film a frame at a time and played back at the desired speed. The additional software and hardware required to create these video tapes or 16 mm movies are also described. Photographs illustrating current visualization techniques are discussed. Examples of the use of the workstations for flow visualization through animation are available on video tape.
Promoting High-Performance Computing and Communications. A CBO Study.
ERIC Educational Resources Information Center
Webre, Philip
In 1991 the Federal Government initiated the multiagency High Performance Computing and Communications program (HPCC) to further the development of U.S. supercomputer technology and high-speed computer network technology. This overview by the Congressional Budget Office (CBO) concentrates on obstacles that might prevent the growth of the…
High-Performance Computing User Facility | Computational Science | NREL
User Facility High-Performance Computing User Facility The High-Performance Computing User Facility technologies. Photo of the Peregrine supercomputer The High Performance Computing (HPC) User Facility provides Gyrfalcon Mass Storage System. Access Our HPC User Facility Learn more about these systems and how to access
Development of seismic tomography software for hybrid supercomputers
NASA Astrophysics Data System (ADS)
Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton
2015-04-01
Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.
Supercomputing resources empowering superstack with interactive and integrated systems
NASA Astrophysics Data System (ADS)
Rückemann, Claus-Peter
2012-09-01
This paper presents the results from the development and implementation of Superstack algorithms to be dynamically used with integrated systems and supercomputing resources. Processing of geophysical data, thus named geoprocessing, is an essential part of the analysis of geoscientific data. The theory of Superstack algorithms and the practical application on modern computing architectures was inspired by developments introduced with processing of seismic data on mainframes and within the last years leading to high end scientific computing applications. There are several stacking algorithms known but with low signal to noise ratio in seismic data the use of iterative algorithms like the Superstack can support analysis and interpretation. The new Superstack algorithms are in use with wave theory and optical phenomena on highly performant computing resources for huge data sets as well as for sophisticated application scenarios in geosciences and archaeology.
Tools for 3D scientific visualization in computational aerodynamics at NASA Ames Research Center
NASA Technical Reports Server (NTRS)
Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val
1989-01-01
Hardware, software, and techniques used by the Fluid Dynamics Division (NASA) for performing visualization of computational aerodynamics, which can be applied to the visualization of flow fields from computer simulations of fluid dynamics about the Space Shuttle, are discussed. Three visualization techniques applied, post-processing, tracking, and steering, are described, as well as the post-processing software packages used, PLOT3D, SURF (Surface Modeller), GAS (Graphical Animation System), and FAST (Flow Analysis software Toolkit). Using post-processing methods a flow simulation was executed on a supercomputer and, after the simulation was complete, the results were processed for viewing. It is shown that the high-resolution, high-performance three-dimensional workstation combined with specially developed display and animation software provides a good tool for analyzing flow field solutions obtained from supercomputers.
Using a multifrontal sparse solver in a high performance, finite element code
NASA Technical Reports Server (NTRS)
King, Scott D.; Lucas, Robert; Raefsky, Arthur
1990-01-01
We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
Network issues for large mass storage requirements
NASA Technical Reports Server (NTRS)
Perdue, James
1992-01-01
File Servers and Supercomputing environments need high performance networks to balance the I/O requirements seen in today's demanding computing scenarios. UltraNet is one solution which permits both high aggregate transfer rates and high task-to-task transfer rates as demonstrated in actual tests. UltraNet provides this capability as both a Server-to-Server and Server-to-Client access network giving the supercomputing center the following advantages highest performance Transport Level connections (to 40 MBytes/sec effective rates); matches the throughput of the emerging high performance disk technologies, such as RAID, parallel head transfer devices and software striping; supports standard network and file system applications using SOCKET's based application program interface such as FTP, rcp, rdump, etc.; supports access to the Network File System (NFS) and LARGE aggregate bandwidth for large NFS usage; provides access to a distributed, hierarchical data server capability using DISCOS UniTree product; supports file server solutions available from multiple vendors, including Cray, Convex, Alliant, FPS, IBM, and others.
Color graphics, interactive processing, and the supercomputer
NASA Technical Reports Server (NTRS)
Smith-Taylor, Rudeen
1987-01-01
The development of a common graphics environment for the NASA Langley Research Center user community and the integration of a supercomputer into this environment is examined. The initial computer hardware, the software graphics packages, and their configurations are described. The addition of improved computer graphics capability to the supercomputer, and the utilization of the graphic software and hardware are discussed. Consideration is given to the interactive processing system which supports the computer in an interactive debugging, processing, and graphics environment.
Automated Help System For A Supercomputer
NASA Technical Reports Server (NTRS)
Callas, George P.; Schulbach, Catherine H.; Younkin, Michael
1994-01-01
Expert-system software developed to provide automated system of user-helping displays in supercomputer system at Ames Research Center Advanced Computer Facility. Users located at remote computer terminals connected to supercomputer and each other via gateway computers, local-area networks, telephone lines, and satellite links. Automated help system answers routine user inquiries about how to use services of computer system. Available 24 hours per day and reduces burden on human experts, freeing them to concentrate on helping users with complicated problems.
Ohue, Masahito; Shimoda, Takehiro; Suzuki, Shuji; Matsuzaki, Yuri; Ishida, Takashi; Akiyama, Yutaka
2014-11-15
The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling. MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
NASA Astrophysics Data System (ADS)
DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug
2018-03-01
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
NASA Advanced Supercomputing (NAS) User Services Group
NASA Technical Reports Server (NTRS)
Pandori, John; Hamilton, Chris; Niggley, C. E.; Parks, John W. (Technical Monitor)
2002-01-01
This viewgraph presentation provides an overview of NAS (NASA Advanced Supercomputing), its goals, and its mainframe computer assets. Also covered are its functions, including systems monitoring and technical support.
Final Scientific Report: A Scalable Development Environment for Peta-Scale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karbach, Carsten; Frings, Wolfgang
2013-02-22
This document is the final scientific report of the project DE-SC000120 (A scalable Development Environment for Peta-Scale Computing). The objective of this project is the extension of the Parallel Tools Platform (PTP) for applying it to peta-scale systems. PTP is an integrated development environment for parallel applications. It comprises code analysis, performance tuning, parallel debugging and system monitoring. The contribution of the Juelich Supercomputing Centre (JSC) aims to provide a scalable solution for system monitoring of supercomputers. This includes the development of a new communication protocol for exchanging status data between the target remote system and the client running PTP.more » The communication has to work for high latency. PTP needs to be implemented robustly and should hide the complexity of the supercomputer's architecture in order to provide a transparent access to various remote systems via a uniform user interface. This simplifies the porting of applications to different systems, because PTP functions as abstraction layer between parallel application developer and compute resources. The common requirement for all PTP components is that they have to interact with the remote supercomputer. E.g. applications are built remotely and performance tools are attached to job submissions and their output data resides on the remote system. Status data has to be collected by evaluating outputs of the remote job scheduler and the parallel debugger needs to control an application executed on the supercomputer. The challenge is to provide this functionality for peta-scale systems in real-time. The client server architecture of the established monitoring application LLview, developed by the JSC, can be applied to PTP's system monitoring. LLview provides a well-arranged overview of the supercomputer's current status. A set of statistics, a list of running and queued jobs as well as a node display mapping running jobs to their compute resources form the user display of LLview. These monitoring features have to be integrated into the development environment. Besides showing the current status PTP's monitoring also needs to allow for submitting and canceling user jobs. Monitoring peta-scale systems especially deals with presenting the large amount of status data in a useful manner. Users require to select arbitrary levels of detail. The monitoring views have to provide a quick overview of the system state, but also need to allow for zooming into specific parts of the system, into which the user is interested in. At present, the major batch systems running on supercomputers are PBS, TORQUE, ALPS and LoadLeveler, which have to be supported by both the monitoring and the job controlling component. Finally, PTP needs to be designed as generic as possible, so that it can be extended for future batch systems.« less
LANL Studies Earth's Magnetosphere
Daughton, Bill
2018-02-13
A new 3-D supercomputer model presents a new theory of how magnetic reconnection works in high-temperature plasmas. This Los Alamos National Laboratory research supports an upcoming NASA mission to study Earth's magnetosphere in greater detail than ever.
NSF Commits to Supercomputers.
ERIC Educational Resources Information Center
Waldrop, M. Mitchell
1985-01-01
The National Science Foundation (NSF) has allocated at least $200 million over the next five years to support four new supercomputer centers. Issues and trends related to this NSF initiative are examined. (JN)
COOP 3D ARPA Experiment 109 National Center for Atmospheric Research
NASA Technical Reports Server (NTRS)
1998-01-01
Coupled atmospheric and hydrodynamic forecast models were executed on the supercomputing resources of the National Center for Atmospheric Research (NCAR) in Boulder, Colorado and the Ohio Supercomputing Center (OSC)in Columbus, Ohio. respectively. The interoperation of the forecast models on these geographically diverse, high performance Cray platforms required the transfer of large three dimensional data sets at very high information rates. High capacity, terrestrial fiber optic transmission system technologies were integrated with those of an experimental high speed communications satellite in Geosynchronous Earth Orbit (GEO) to test the integration of the two systems. Operation over a spacecraft in GEO orbit required modification of the standard configuration of legacy data communications protocols to facilitate their ability to perform efficiently in the changing environment characteristic of a hybrid network. The success of this performance tuning enabled the use of such an architecture to facilitate high data rate, fiber optic quality data communications between high performance systems not accessible to standard terrestrial fiber transmission systems. Thus obviating the performance degradation often found in contemporary earth/satellite hybrids.
Mira: Argonne's 10-petaflops supercomputer
Papka, Michael; Coghlan, Susan; Isaacs, Eric; Peters, Mark; Messina, Paul
2018-02-13
Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Adventures in Computational Grids
NASA Technical Reports Server (NTRS)
Walatka, Pamela P.; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Sometimes one supercomputer is not enough. Or your local supercomputers are busy, or not configured for your job. Or you don't have any supercomputers. You might be trying to simulate worldwide weather changes in real time, requiring more compute power than you could get from any one machine. Or you might be collecting microbiological samples on an island, and need to examine them with a special microscope located on the other side of the continent. These are the times when you need a computational grid.
Mira: Argonne's 10-petaflops supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Papka, Michael; Coghlan, Susan; Isaacs, Eric
2013-07-03
Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Breakthrough: NETL's Simulation-Based Engineering User Center (SBEUC)
Guenther, Chris
2018-05-23
The National Energy Technology Laboratory relies on supercomputers to develop many novel ideas that become tomorrow's energy solutions. Supercomputers provide a cost-effective, efficient platform for research and usher technologies into widespread use faster to bring benefits to the nation. In 2013, Secretary of Energy Dr. Ernest Moniz dedicated NETL's new supercomputer, the Simulation Based Engineering User Center, or SBEUC. The SBEUC is dedicated to fossil energy research and is a collaborative tool for all of NETL and our regional university partners.
Floating point arithmetic in future supercomputers
NASA Technical Reports Server (NTRS)
Bailey, David H.; Barton, John T.; Simon, Horst D.; Fouts, Martin J.
1989-01-01
Considerations in the floating-point design of a supercomputer are discussed. Particular attention is given to word size, hardware support for extended precision, format, and accuracy characteristics. These issues are discussed from the perspective of the Numerical Aerodynamic Simulation Systems Division at NASA Ames. The features believed to be most important for a future supercomputer floating-point design include: (1) a 64-bit IEEE floating-point format with 11 exponent bits, 52 mantissa bits, and one sign bit and (2) hardware support for reasonably fast double-precision arithmetic.
Breakthrough: NETL's Simulation-Based Engineering User Center (SBEUC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guenther, Chris
The National Energy Technology Laboratory relies on supercomputers to develop many novel ideas that become tomorrow's energy solutions. Supercomputers provide a cost-effective, efficient platform for research and usher technologies into widespread use faster to bring benefits to the nation. In 2013, Secretary of Energy Dr. Ernest Moniz dedicated NETL's new supercomputer, the Simulation Based Engineering User Center, or SBEUC. The SBEUC is dedicated to fossil energy research and is a collaborative tool for all of NETL and our regional university partners.
Computation Directorate 2008 Annual Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crawford, D L
2009-03-25
Whether a computer is simulating the aging and performance of a nuclear weapon, the folding of a protein, or the probability of rainfall over a particular mountain range, the necessary calculations can be enormous. Our computers help researchers answer these and other complex problems, and each new generation of system hardware and software widens the realm of possibilities. Building on Livermore's historical excellence and leadership in high-performance computing, Computation added more than 331 trillion floating-point operations per second (teraFLOPS) of power to LLNL's computer room floors in 2008. In addition, Livermore's next big supercomputer, Sequoia, advanced ever closer to itsmore » 2011-2012 delivery date, as architecture plans and the procurement contract were finalized. Hyperion, an advanced technology cluster test bed that teams Livermore with 10 industry leaders, made a big splash when it was announced during Michael Dell's keynote speech at the 2008 Supercomputing Conference. The Wall Street Journal touted Hyperion as a 'bright spot amid turmoil' in the computer industry. Computation continues to measure and improve the costs of operating LLNL's high-performance computing systems by moving hardware support in-house, by measuring causes of outages to apply resources asymmetrically, and by automating most of the account and access authorization and management processes. These improvements enable more dollars to go toward fielding the best supercomputers for science, while operating them at less cost and greater responsiveness to the customers.« less
Understanding Lustre Internals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Feiyi; Oral, H Sarp; Shipman, Galen M
2009-04-01
Lustre was initiated and funded, almost a decade ago, by the U.S. Department of Energy (DoE) Office of Science and National Nuclear Security Administration laboratories to address the need for an open source, highly-scalable, high-performance parallel filesystem on by then present and future supercomputing platforms. Throughout the last decade, it was deployed over numerous medium-to-large-scale supercomputing platforms and clusters, and it performed and met the expectations of the Lustre user community. As it stands at the time of writing this document, according to the Top500 list, 15 of the top 30 supercomputers in the world use Lustre filesystem. This reportmore » aims to present a streamlined overview on how Lustre works internally at reasonable details including relevant data structures, APIs, protocols and algorithms involved for Lustre version 1.6 source code base. More importantly, it tries to explain how various components interconnect with each other and function as a system. Portions of this report are based on discussions with Oak Ridge National Laboratory Lustre Center of Excellence team members and portions of it are based on our own understanding of how the code works. We, as the authors team bare all responsibilities for all errors and omissions in this document. We can only hope it helps current and future Lustre users and Lustre code developers as much as it helped us understanding the Lustre source code and its internal workings.« less
Tracing Scientific Facilities through the Research Literature Using Persistent Identifiers
NASA Astrophysics Data System (ADS)
Mayernik, M. S.; Maull, K. E.
2016-12-01
Tracing persistent identifiers to their source publications is an easy task when authors use them, since it is a simple matter of matching the persistent identifier to the specific text string of the identifier. However, trying to understand if a publication uses the resource behind an identifier when such identifier is not referenced explicitly is a harder task. In this research, we explore the effectiveness of alternative strategies of associating publications with uses of the resource referenced by an identifier when it may not be explicit. This project is explored within the context of the NCAR supercomputer, where we are broadly interesting in the science that can be traced to the usage of the NCAR supercomputing facility, by way of the peer-reviewed research publications that utilize and reference it. In this project we explore several ways of drawing linkages between publications and the NCAR supercomputing resources. Identifying and compiling peer-reviewed publications related to NCAR supercomputer usage are explored via three sources: 1) User-supplied publications gathered through a community survey, 2) publications that were identified via manual searching of the Google scholar search index, and 3) publications associated with National Science Foundation (NSF) grants extracted from a public NSF database. These three sources represent three styles of collecting information about publications that likely imply usage of the NCAR supercomputing facilities. Each source has strengths and weaknesses, thus our discussion will explore how our publication identification and analysis methods vary in terms of accuracy, reliability, and effort. We will also discuss strategies for enabling more efficient tracing of research impacts of supercomputing facilities going forward through the assignment of a persistent web identifier to the NCAR supercomputer. While this solution has potential to greatly enhance our ability to trace the use of the facility through publications, authors must cite the facility consistently. It is therefore necessary to provide recommendations for citation and attribution behavior, and we will conclude our discussion with how such recommendations have improved tracing the supercomputer facility allowing for more consistent and widespread measurement of its impact.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meneses, Esteban; Ni, Xiang; Jones, Terry R
The unprecedented computational power of cur- rent supercomputers now makes possible the exploration of complex problems in many scientific fields, from genomic analysis to computational fluid dynamics. Modern machines are powerful because they are massive: they assemble millions of cores and a huge quantity of disks, cards, routers, and other components. But it is precisely the size of these machines that glooms the future of supercomputing. A system that comprises many components has a high chance to fail, and fail often. In order to make the next generation of supercomputers usable, it is imperative to use some type of faultmore » tolerance platform to run applications on large machines. Most fault tolerance strategies can be optimized for the peculiarities of each system and boost efficacy by keeping the system productive. In this paper, we aim to understand how failure characterization can improve resilience in several layers of the software stack: applications, runtime systems, and job schedulers. We examine the Titan supercomputer, one of the fastest systems in the world. We analyze a full year of Titan in production and distill the failure patterns of the machine. By looking into Titan s log files and using the criteria of experts, we provide a detailed description of the types of failures. In addition, we inspect the job submission files and describe how the system is used. Using those two sources, we cross correlate failures in the machine to executing jobs and provide a picture of how failures affect the user experience. We believe such characterization is fundamental in developing appropriate fault tolerance solutions for Cray systems similar to Titan.« less
High Performance Computing and Networking for Science--Background Paper.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. Office of Technology Assessment.
The Office of Technology Assessment is conducting an assessment of the effects of new information technologies--including high performance computing, data networking, and mass data archiving--on research and development. This paper offers a view of the issues and their implications for current discussions about Federal supercomputer initiatives…
Energy Efficient Supercomputing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anypas, Katie
2014-10-17
Katie Anypas, Head of NERSC's Services Department discusses the Lab's research into developing increasingly powerful and energy efficient supercomputers at our '8 Big Ideas' Science at the Theater event on October 8th, 2014, in Oakland, California.
Energy Efficient Supercomputing
Anypas, Katie
2018-05-07
Katie Anypas, Head of NERSC's Services Department discusses the Lab's research into developing increasingly powerful and energy efficient supercomputers at our '8 Big Ideas' Science at the Theater event on October 8th, 2014, in Oakland, California.
Diskless supercomputers: Scalable, reliable I/O for the Tera-Op technology base
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Ousterhout, John K.; Patterson, David A.
1993-01-01
Computing is seeing an unprecedented improvement in performance; over the last five years there has been an order-of-magnitude improvement in the speeds of workstation CPU's. At least another order of magnitude seems likely in the next five years, to machines with 500 MIPS or more. The goal of the ARPA Teraop program is to realize even larger, more powerful machines, executing as many as a trillion operations per second. Unfortunately, we have seen no comparable breakthroughs in I/O performance; the speeds of I/O devices and the hardware and software architectures for managing them have not changed substantially in many years. We have completed a program of research to demonstrate hardware and software I/O architectures capable of supporting the kinds of internetworked 'visualization' workstations and supercomputers that will appear in the mid 1990s. The project had three overall goals: high performance, high reliability, and scalable, multipurpose system.
High Resolution Aerospace Applications using the NASA Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Aftosmis, Michael J.; Berger, Marsha
2005-01-01
This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier-Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary combination of these two simulation codes enables high-fidelity characterization of aerospace vehicle design performance over the entire flight envelope through extensive parametric analysis and detailed simulation of critical regions of the flight envelope. Both packages. are industrial-level codes designed for complex geometry and incorpor.ats. CuStomized multigrid solution algorithms. The performance of these codes on Columbia is examined using both MPI and OpenMP and using both the NUMAlink and InfiniBand interconnect fabrics. Numerical results demonstrate good scalability on up to 2016 CPUs using the NUMAIink4 interconnect, with measured computational rates in the vicinity of 3 TFLOP/s, while InfiniBand showed some performance degradation at high CPU counts, particularly with multigrid. Nonetheless, the results are encouraging enough to indicate that larger test cases using combined MPI/OpenMP communication should scale well on even more processors.
Cyberinfrastructure for high energy physics in Korea
NASA Astrophysics Data System (ADS)
Cho, Kihyeon; Kim, Hyunwoo; Jeung, Minho; High Energy Physics Team
2010-04-01
We introduce the hierarchy of cyberinfrastructure which consists of infrastructure (supercomputing and networks), Grid, e-Science, community and physics from bottom layer to top layer. KISTI is the national headquarter of supercomputer, network, Grid and e-Science in Korea. Therefore, KISTI is the best place to for high energy physicists to use cyberinfrastructure. We explain this concept on the CDF and the ALICE experiments. In the meantime, the goal of e-Science is to study high energy physics anytime and anywhere even if we are not on-site of accelerator laboratories. The components are data production, data processing and data analysis. The data production is to take both on-line and off-line shifts remotely. The data processing is to run jobs anytime, anywhere using Grid farms. The data analysis is to work together to publish papers using collaborative environment such as EVO (Enabling Virtual Organization) system. We also present the global community activities of FKPPL (France-Korea Particle Physics Laboratory) and physics as top layer.
NASA Astrophysics Data System (ADS)
Vincenti, Henri; Vay, Jean-Luc
2018-07-01
The advent of massively parallel supercomputers, with their distributed-memory technology using many processing units, has favored the development of highly-scalable local low-order solvers at the expense of harder-to-scale global very high-order spectral methods. Indeed, FFT-based methods, which were very popular on shared memory computers, have been largely replaced by finite-difference (FD) methods for the solution of many problems, including plasmas simulations with electromagnetic Particle-In-Cell methods. For some problems, such as the modeling of so-called "plasma mirrors" for the generation of high-energy particles and ultra-short radiations, we have shown that the inaccuracies of standard FD-based PIC methods prevent the modeling on present supercomputers at sufficient accuracy. We demonstrate here that a new method, based on the use of local FFTs, enables ultrahigh-order accuracy with unprecedented scalability, and thus for the first time the accurate modeling of plasma mirrors in 3D.
Design applications for supercomputers
NASA Technical Reports Server (NTRS)
Studerus, C. J.
1987-01-01
The complexity of codes for solutions of real aerodynamic problems has progressed from simple two-dimensional models to three-dimensional inviscid and viscous models. As the algorithms used in the codes increased in accuracy, speed and robustness, the codes were steadily incorporated into standard design processes. The highly sophisticated codes, which provide solutions to the truly complex flows, require computers with large memory and high computational speed. The advent of high-speed supercomputers, such that the solutions of these complex flows become more practical, permits the introduction of the codes into the design system at an earlier stage. The results of several codes which either were already introduced into the design process or are rapidly in the process of becoming so, are presented. The codes fall into the area of turbomachinery aerodynamics and hypersonic propulsion. In the former category, results are presented for three-dimensional inviscid and viscous flows through nozzle and unducted fan bladerows. In the latter category, results are presented for two-dimensional inviscid and viscous flows for hypersonic vehicle forebodies and engine inlets.
Federal Market Information Technology in the Post Flash Crash Era: Roles for Supercomputing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bethel, E. Wes; Leinweber, David; Ruebel, Oliver
2011-09-16
This paper describes collaborative work between active traders, regulators, economists, and supercomputing researchers to replicate and extend investigations of the Flash Crash and other market anomalies in a National Laboratory HPC environment. Our work suggests that supercomputing tools and methods will be valuable to market regulators in achieving the goal of market safety, stability, and security. Research results using high frequency data and analytics are described, and directions for future development are discussed. Currently the key mechanism for preventing catastrophic market action are “circuit breakers.” We believe a more graduated approach, similar to the “yellow light” approach in motorsports tomore » slow down traffic, might be a better way to achieve the same goal. To enable this objective, we study a number of indicators that could foresee hazards in market conditions and explore options to confirm such predictions. Our tests confirm that Volume Synchronized Probability of Informed Trading (VPIN) and a version of volume Herfindahl-Hirschman Index (HHI) for measuring market fragmentation can indeed give strong signals ahead of the Flash Crash event on May 6 2010. This is a preliminary step toward a full-fledged early-warning system for unusual market conditions.« less
NASA Technical Reports Server (NTRS)
Cohen, Jarrett
1999-01-01
Parallel computers built out of mass-market parts are cost-effectively performing data processing and simulation tasks. The Supercomputing (now known as "SC") series of conferences celebrated its 10th anniversary last November. While vendors have come and gone, the dominant paradigm for tackling big problems still is a shared-resource, commercial supercomputer. Growing numbers of users needing a cheaper or dedicated-access alternative are building their own supercomputers out of mass-market parts. Such machines are generally called Beowulf-class systems after the 11th century epic. This modern-day Beowulf story began in 1994 at NASA's Goddard Space Flight Center. A laboratory for the Earth and space sciences, computing managers there threw down a gauntlet to develop a $50,000 gigaFLOPS workstation for processing satellite data sets. Soon, Thomas Sterling and Don Becker were working on the Beowulf concept at the University Space Research Association (USRA)-run Center of Excellence in Space Data and Information Sciences (CESDIS). Beowulf clusters mix three primary ingredients: commodity personal computers or workstations, low-cost Ethernet networks, and the open-source Linux operating system. One of the larger Beowulfs is Goddard's Highly-parallel Integrated Virtual Environment, or HIVE for short.
1993 Gordon Bell Prize Winners
NASA Technical Reports Server (NTRS)
Karp, Alan H.; Simon, Horst; Heller, Don; Cooper, D. M. (Technical Monitor)
1994-01-01
The Gordon Bell Prize recognizes significant achievements in the application of supercomputers to scientific and engineering problems. In 1993, finalists were named for work in three categories: (1) Performance, which recognizes those who solved a real problem in the quickest elapsed time. (2) Price/performance, which encourages the development of cost-effective supercomputing. (3) Compiler-generated speedup, which measures how well compiler writers are facilitating the programming of parallel processors. The winners were announced November 17 at the Supercomputing 93 conference in Portland, Oregon. Gordon Bell, an independent consultant in Los Altos, California, is sponsoring $2,000 in prizes each year for 10 years to promote practical parallel processing research. This is the sixth year of the prize, which Computer administers. Something unprecedented in Gordon Bell Prize competition occurred this year: A computer manufacturer was singled out for recognition. Nine entries reporting results obtained on the Cray C90 were received, seven of the submissions orchestrated by Cray Research. Although none of these entries showed sufficiently high performance to win outright, the judges were impressed by the breadth of applications that ran well on this machine, all nine running at more than a third of the peak performance of the machine.
Improving Memory Error Handling Using Linux
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carlton, Michael Andrew; Blanchard, Sean P.; Debardeleben, Nathan A.
As supercomputers continue to get faster and more powerful in the future, they will also have more nodes. If nothing is done, then the amount of memory in supercomputer clusters will soon grow large enough that memory failures will be unmanageable to deal with by manually replacing memory DIMMs. "Improving Memory Error Handling Using Linux" is a process oriented method to solve this problem by using the Linux kernel to disable (offline) faulty memory pages containing bad addresses, preventing them from being used again by a process. The process of offlining memory pages simplifies error handling and results in reducingmore » both hardware and manpower costs required to run Los Alamos National Laboratory (LANL) clusters. This process will be necessary for the future of supercomputing to allow the development of exascale computers. It will not be feasible without memory error handling to manually replace the number of DIMMs that will fail daily on a machine consisting of 32-128 petabytes of memory. Testing reveals the process of offlining memory pages works and is relatively simple to use. As more and more testing is conducted, the entire process will be automated within the high-performance computing (HPC) monitoring software, Zenoss, at LANL.« less
NASA Astrophysics Data System (ADS)
Schaaf, Kjeld; Overeem, Ruud
2004-06-01
Moore’s law is best exploited by using consumer market hardware. In particular, the gaming industry pushes the limit of processor performance thus reducing the cost per raw flop even faster than Moore’s law predicts. Next to the cost benefits of Common-Of-The-Shelf (COTS) processing resources, there is a rapidly growing experience pool in cluster based processing. The typical Beowulf cluster of PC’s supercomputers are well known. Multiple examples exists of specialised cluster computers based on more advanced server nodes or even gaming stations. All these cluster machines build upon the same knowledge about cluster software management, scheduling, middleware libraries and mathematical libraries. In this study, we have integrated COTS processing resources and cluster nodes into a very high performance processing platform suitable for streaming data applications, in particular to implement a correlator. The required processing power for the correlator in modern radio telescopes is in the range of the larger supercomputers, which motivates the usage of supercomputer technology. Raw processing power is provided by graphical processors and is combined with an Infiniband host bus adapter with integrated data stream handling logic. With this processing platform a scalable correlator can be built with continuously growing processing power at consumer market prices.
KNBD: A Remote Kernel Block Server for Linux
NASA Technical Reports Server (NTRS)
Becker, Jeff
1999-01-01
I am developing a prototype of a Linux remote disk block server whose purpose is to serve as a lower level component of a parallel file system. Parallel file systems are an important component of high performance supercomputers and clusters. Although supercomputer vendors such as SGI and IBM have their own custom solutions, there has been a void and hence a demand for such a system on Beowulf-type PC Clusters. Recently, the Parallel Virtual File System (PVFS) project at Clemson University has begun to address this need (1). Although their system provides much of the functionality of (and indeed was inspired by) the equivalent file systems in the commercial supercomputer market, their system is all in user-space. Migrating their 10 services to the kernel could provide a performance boost, by obviating the need for expensive system calls. Thanks to Pavel Machek, the Linux kernel has provided the network block device (2) with kernels 2.1.101 and later. You can configure this block device to redirect reads and writes to a remote machine's disk. This can be used as a building block for constructing a striped file system across several nodes.
The Q continuum simulation: Harnessing the power of GPU accelerated supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heitmann, Katrin; Frontiere, Nicholas; Sewell, Chris
2015-08-01
Modeling large-scale sky survey observations is a key driver for the continuing development of high-resolution, large-volume, cosmological simulations. We report the first results from the "Q Continuum" cosmological N-body simulation run carried out on the GPU-accelerated supercomputer Titan. The simulation encompasses a volume of (1300 Mpc)(3) and evolves more than half a trillion particles, leading to a particle mass resolution of m(p) similar or equal to 1.5 . 10(8) M-circle dot. At thismass resolution, the Q Continuum run is currently the largest cosmology simulation available. It enables the construction of detailed synthetic sky catalogs, encompassing different modeling methodologies, including semi-analyticmore » modeling and sub-halo abundance matching in a large, cosmological volume. Here we describe the simulation and outputs in detail and present first results for a range of cosmological statistics, such as mass power spectra, halo mass functions, and halo mass-concentration relations for different epochs. We also provide details on challenges connected to running a simulation on almost 90% of Titan, one of the fastest supercomputers in the world, including our usage of Titan's GPU accelerators.« less
Approaching the exa-scale: a real-world evaluation of rendering extremely large data sets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patchett, John M; Ahrens, James P; Lo, Li - Ta
2010-10-15
Extremely large scale analysis is becoming increasingly important as supercomputers and their simulations move from petascale to exascale. The lack of dedicated hardware acceleration for rendering on today's supercomputing platforms motivates our detailed evaluation of the possibility of interactive rendering on the supercomputer. In order to facilitate our understanding of rendering on the supercomputing platform, we focus on scalability of rendering algorithms and architecture envisioned for exascale datasets. To understand tradeoffs for dealing with extremely large datasets, we compare three different rendering algorithms for large polygonal data: software based ray tracing, software based rasterization and hardware accelerated rasterization. We presentmore » a case study of strong and weak scaling of rendering extremely large data on both GPU and CPU based parallel supercomputers using Para View, a parallel visualization tool. Wc use three different data sets: two synthetic and one from a scientific application. At an extreme scale, algorithmic rendering choices make a difference and should be considered while approaching exascale computing, visualization, and analysis. We find software based ray-tracing offers a viable approach for scalable rendering of the projected future massive data sizes.« less
Supercomputing Drives Innovation - Continuum Magazine | NREL
years, NREL scientists have used supercomputers to simulate 3D models of the primary enzymes and Scientist, discuss a 3D model of wind plant aerodynamics, showing low velocity wakes and impact on
Supercomputer algorithms for efficient linear octree encoding of three-dimensional brain images.
Berger, S B; Reis, D J
1995-02-01
We designed and implemented algorithms for three-dimensional (3-D) reconstruction of brain images from serial sections using two important supercomputer architectures, vector and parallel. These architectures were represented by the Cray YMP and Connection Machine CM-2, respectively. The programs operated on linear octree representations of the brain data sets, and achieved 500-800 times acceleration when compared with a conventional laboratory workstation. As the need for higher resolution data sets increases, supercomputer algorithms may offer a means of performing 3-D reconstruction well above current experimental limits.
Intelligent supercomputers: the Japanese computer sputnik
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walter, G.
1983-11-01
Japan's government-supported fifth-generation computer project has had a pronounced effect on the American computer and information systems industry. The US firms are intensifying their research on and production of intelligent supercomputers, a combination of computer architecture and artificial intelligence software programs. While the present generation of computers is built for the processing of numbers, the new supercomputers will be designed specifically for the solution of symbolic problems and the use of artificial intelligence software. This article discusses new and exciting developments that will increase computer capabilities in the 1990s. 4 references.
Introducing Mira, Argonne's Next-Generation Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2013-03-19
Mira, the new petascale IBM Blue Gene/Q system installed at the ALCF, will usher in a new era of scientific supercomputing. An engineering marvel, the 10-petaflops machine is capable of carrying out 10 quadrillion calculations per second.
Green Supercomputing at Argonne
Pete Beckman
2017-12-09
Pete Beckman, head of Argonne's Leadership Computing Facility (ALCF) talks about Argonne National Laboratory's green supercomputingâeverything from designing algorithms to use fewer kilowatts per operation to using cold Chicago winter air to cool the machine more efficiently.
The New Explorers teacher`s guide: The new language of science
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1997-09-01
The Chicago Science Explorers Program is designed to make students aware of the many career options that are available to them which involve science. The program also hopes to encourage students to consider a career in science by providing interesting classroom experiences, information on various careers generated from the video tape, and a class field trip. In the videotape The New Language of Science, Dr. Larry Smarr of the University of Illinois illustrates how supercomputers can create visualizations of such complex scientific concepts and events as black holes in space, microbursts, smog, drug interactions in the body, earthquakes, and tornadoes.more » It also illustrates how math and science are integrated and emphasizes the need for students to take as much advanced mathematics as is offered at the junior high and high school level. Another underlying concept of the videotape is teamwork. Often students think of science as being an isolated career and this video tape clearly demonstrates that no one scientist would have enough knowledge to create a visualization alone. This report is the teacher`s guide for this video.« less
Magnetosphere simulations with a high-performance 3D AMR MHD Code
NASA Astrophysics Data System (ADS)
Gombosi, Tamas; Dezeeuw, Darren; Groth, Clinton; Powell, Kenneth; Song, Paul
1998-11-01
BATS-R-US is a high-performance 3D AMR MHD code for space physics applications running on massively parallel supercomputers. In BATS-R-US the electromagnetic and fluid equations are solved with a high-resolution upwind numerical scheme in a tightly coupled manner. The code is very robust and it is capable of spanning a wide range of plasma parameters (such as β, acoustic and Alfvénic Mach numbers). Our code is highly scalable: it achieved a sustained performance of 233 GFLOPS on a Cray T3E-1200 supercomputer with 1024 PEs. This talk reports results from the BATS-R-US code for the GGCM (Geospace General Circularculation Model) Phase 1 Standard Model Suite. This model suite contains 10 different steady-state configurations: 5 IMF clock angles (north, south, and three equally spaced angles in- between) with 2 IMF field strengths for each angle (5 nT and 10 nT). The other parameters are: solar wind speed =400 km/sec; solar wind number density = 5 protons/cc; Hall conductance = 0; Pedersen conductance = 5 S; parallel conductivity = ∞.
Advanced Computing for Manufacturing.
ERIC Educational Resources Information Center
Erisman, Albert M.; Neves, Kenneth W.
1987-01-01
Discusses ways that supercomputers are being used in the manufacturing industry, including the design and production of airplanes and automobiles. Describes problems that need to be solved in the next few years for supercomputers to assume a major role in industry. (TW)
Large Eddy Simulation of a Wind Turbine Airfoil at High Freestream-Flow Angle
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2015-04-13
A simulation of the airflow over a section of a wind turbine blade, run on the supercomputer Mira at the Argonne Leadership Computing Facility. Simulations like these help identify ways to make turbine blades more efficient.
Large Eddy Simulation of a Wind Turbine Airfoil at High Freestream-Flow Angle
None
2018-02-07
A simulation of the airflow over a section of a wind turbine blade, run on the supercomputer Mira at the Argonne Leadership Computing Facility. Simulations like these help identify ways to make turbine blades more efficient.
Virtualizing Super-Computation On-Board Uas
NASA Astrophysics Data System (ADS)
Salami, E.; Soler, J. A.; Cuadrado, R.; Barrado, C.; Pastor, E.
2015-04-01
Unmanned aerial systems (UAS, also known as UAV, RPAS or drones) have a great potential to support a wide variety of aerial remote sensing applications. Most UAS work by acquiring data using on-board sensors for later post-processing. Some require the data gathered to be downlinked to the ground in real-time. However, depending on the volume of data and the cost of the communications, this later option is not sustainable in the long term. This paper develops the concept of virtualizing super-computation on-board UAS, as a method to ease the operation by facilitating the downlink of high-level information products instead of raw data. Exploiting recent developments in miniaturized multi-core devices is the way to speed-up on-board computation. This hardware shall satisfy size, power and weight constraints. Several technologies are appearing with promising results for high performance computing on unmanned platforms, such as the 36 cores of the TILE-Gx36 by Tilera (now EZchip) or the 64 cores of the Epiphany-IV by Adapteva. The strategy for virtualizing super-computation on-board includes the benchmarking for hardware selection, the software architecture and the communications aware design. A parallelization strategy is given for the 36-core TILE-Gx36 for a UAS in a fire mission or in similar target-detection applications. The results are obtained for payload image processing algorithms and determine in real-time the data snapshot to gather and transfer to ground according to the needs of the mission, the processing time, and consumed watts.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hazi, A U
2007-02-06
Setting performance goals is part of the business plan for almost every company. The same is true in the world of supercomputers. Ten years ago, the Department of Energy (DOE) launched the Accelerated Strategic Computing Initiative (ASCI) to help ensure the safety and reliability of the nation's nuclear weapons stockpile without nuclear testing. ASCI, which is now called the Advanced Simulation and Computing (ASC) Program and is managed by DOE's National Nuclear Security Administration (NNSA), set an initial 10-year goal to obtain computers that could process up to 100 trillion floating-point operations per second (teraflops). Many computer experts thought themore » goal was overly ambitious, but the program's results have proved them wrong. Last November, a Livermore-IBM team received the 2005 Gordon Bell Prize for achieving more than 100 teraflops while modeling the pressure-induced solidification of molten metal. The prestigious prize, which is named for a founding father of supercomputing, is awarded each year at the Supercomputing Conference to innovators who advance high-performance computing. Recipients for the 2005 prize included six Livermore scientists--physicists Fred Streitz, James Glosli, and Mehul Patel and computer scientists Bor Chan, Robert Yates, and Bronis de Supinski--as well as IBM researchers James Sexton and John Gunnels. This team produced the first atomic-scale model of metal solidification from the liquid phase with results that were independent of system size. The record-setting calculation used Livermore's domain decomposition molecular-dynamics (ddcMD) code running on BlueGene/L, a supercomputer developed by IBM in partnership with the ASC Program. BlueGene/L reached 280.6 teraflops on the Linpack benchmark, the industry standard used to measure computing speed. As a result, it ranks first on the list of Top500 Supercomputer Sites released in November 2005. To evaluate the performance of nuclear weapons systems, scientists must understand how materials behave under extreme conditions. Because experiments at high pressures and temperatures are often difficult or impossible to conduct, scientists rely on computer models that have been validated with obtainable data. Of particular interest to weapons scientists is the solidification of metals. ''To predict the performance of aging nuclear weapons, we need detailed information on a material's phase transitions'', says Streitz, who leads the Livermore-IBM team. For example, scientists want to know what happens to a metal as it changes from molten liquid to a solid and how that transition affects the material's characteristics, such as its strength.« less
ERIC Educational Resources Information Center
Le, Thanh; Tang, Kam Ki
2015-01-01
This paper empirically examines the impact of academic research on high-tech manufacturing growth of 28 Organisation for Economic Co-operation and Development (OECD) and emerging countries over the 1991-2005 period. A standard research and development (R&D) expenditure based measure is found to be too general to capture the input in high-tech…
Supercomputers Join the Fight against Cancer – U.S. Department of Energy
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
The Department of Energy has some of the best supercomputers in the world. Now, they’re joining the fight against cancer. Learn about our new partnership with the National Cancer Institute and GlaxoSmithKline Pharmaceuticals.
NAS-current status and future plans
NASA Technical Reports Server (NTRS)
Bailey, F. R.
1987-01-01
The Numerical Aerodynamic Simulation (NAS) has met its first major milestone, the NAS Processing System Network (NPSN) Initial Operating Configuration (IOC). The program has met its goal of providing a national supercomputer facility capable of greatly enhancing the Nation's research and development efforts. Furthermore, the program is fulfilling its pathfinder role by defining and implementing a paradigm for supercomputing system environments. The IOC is only the begining and the NAS Program will aggressively continue to develop and implement emerging supercomputer, communications, storage, and software technologies to strengthen computations as a critical element in supporting the Nation's leadership role in aeronautics.
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Howser, Lona M.
1993-01-01
This document briefly describes the use of the CRAY supercomputers that are an integral part of the Supercomputing Network Subsystem of the Central Scientific Computing Complex at LaRC. Features of the CRAY supercomputers are covered, including: FORTRAN, C, PASCAL, architectures of the CRAY-2 and CRAY Y-MP, the CRAY UNICOS environment, batch job submittal, debugging, performance analysis, parallel processing, utilities unique to CRAY, and documentation. The document is intended for all CRAY users as a ready reference to frequently asked questions and to more detailed information contained in the vendor manuals. It is appropriate for both the novice and the experienced user.
Scaling of data communications for an advanced supercomputer network
NASA Technical Reports Server (NTRS)
Levin, E.; Eaton, C. K.; Young, Bruce
1986-01-01
The goal of NASA's Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations and by remote communication to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. The implications of a projected 20-fold increase in processing power on the data communications requirements are described.
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Roadrunner Supercomputer Breaks the Petaflop Barrier
Los Alamos National Lab - Brian Albright, Charlie McMillan, Lin Yin
2017-12-09
At 3:30 a.m. on May 26, 2008, Memorial Day, the "Roadrunner" supercomputer exceeded a sustained speed of 1 petaflop/s, or 1 million billion calculations per second. The sustained performance makes Roadrunner more than twice as fast as the current number 1
QCD on the BlueGene/L Supercomputer
NASA Astrophysics Data System (ADS)
Bhanot, G.; Chen, D.; Gara, A.; Sexton, J.; Vranas, P.
2005-03-01
In June 2004 QCD was simulated for the first time at sustained speed exceeding 1 TeraFlops in the BlueGene/L supercomputer at the IBM T.J. Watson Research Lab. The implementation and performance of QCD in the BlueGene/L is presented.
Supercomputer Issues from a University Perspective.
ERIC Educational Resources Information Center
Beering, Steven C.
1984-01-01
Discusses issues related to the access of and training of university researchers in using supercomputers, considering National Science Foundation's (NSF) role in this area, microcomputers on campuses, and the limited use of existing telecommunication networks. Includes examples of potential scientific projects (by subject area) utilizing…
NETL - Supercomputing: NETL Simulation Based Engineering User Center (SBEUC)
None
2018-02-07
NETL's Simulation-Based Engineering User Center, or SBEUC, integrates one of the world's largest high-performance computers with an advanced visualization center. The SBEUC offers a collaborative environment among researchers at NETL sites and those working through the NETL-Regional University Alliance.
NETL - Supercomputing: NETL Simulation Based Engineering User Center (SBEUC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2013-09-30
NETL's Simulation-Based Engineering User Center, or SBEUC, integrates one of the world's largest high-performance computers with an advanced visualization center. The SBEUC offers a collaborative environment among researchers at NETL sites and those working through the NETL-Regional University Alliance.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
Wang, Bei; Ethier, Stephane; Tang, William; ...
2017-06-29
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin
2015-07-01
As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.
Use of high performance networks and supercomputers for real-time flight simulation
NASA Technical Reports Server (NTRS)
Cleveland, Jeff I., II
1993-01-01
In order to meet the stringent time-critical requirements for real-time man-in-the-loop flight simulation, computer processing operations must be consistent in processing time and be completed in as short a time as possible. These operations include simulation mathematical model computation and data input/output to the simulators. In 1986, in response to increased demands for flight simulation performance, NASA's Langley Research Center (LaRC), working with the contractor, developed extensions to the Computer Automated Measurement and Control (CAMAC) technology which resulted in a factor of ten increase in the effective bandwidth and reduced latency of modules necessary for simulator communication. This technology extension is being used by more than 80 leading technological developers in the United States, Canada, and Europe. Included among the commercial applications are nuclear process control, power grid analysis, process monitoring, real-time simulation, and radar data acquisition. Personnel at LaRC are completing the development of the use of supercomputers for mathematical model computation to support real-time flight simulation. This includes the development of a real-time operating system and development of specialized software and hardware for the simulator network. This paper describes the data acquisition technology and the development of supercomputing for flight simulation.
Finite element methods on supercomputers - The scatter-problem
NASA Technical Reports Server (NTRS)
Loehner, R.; Morgan, K.
1985-01-01
Certain problems arise in connection with the use of supercomputers for the implementation of finite-element methods. These problems are related to the desirability of utilizing the power of the supercomputer as fully as possible for the rapid execution of the required computations, taking into account the gain in speed possible with the aid of pipelining operations. For the finite-element method, the time-consuming operations may be divided into three categories. The first two present no problems, while the third type of operation can be a reason for the inefficient performance of finite-element programs. Two possibilities for overcoming certain difficulties are proposed, giving attention to a scatter-process.
Code IN Exhibits - Supercomputing 2000
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; McCann, Karen M.; Biswas, Rupak; VanderWijngaart, Rob F.; Kwak, Dochan (Technical Monitor)
2000-01-01
The creation of parameter study suites has recently become a more challenging problem as the parameter studies have become multi-tiered and the computational environment has become a supercomputer grid. The parameter spaces are vast, the individual problem sizes are getting larger, and researchers are seeking to combine several successive stages of parameterization and computation. Simultaneously, grid-based computing offers immense resource opportunities but at the expense of great difficulty of use. We present ILab, an advanced graphical user interface approach to this problem. Our novel strategy stresses intuitive visual design tools for parameter study creation and complex process specification, and also offers programming-free access to grid-based supercomputer resources and process automation.
High Productivity Computing Systems and Competitiveness Initiative
2007-07-01
planning committee for the annual, international Supercomputing Conference in 2004 and 2005. This is the leading HPC industry conference in the world. It...sector partnerships. Partnerships will form a key part of discussions at the 2nd High Performance Computing Users Conference, planned for July 13, 2005...other things an interagency roadmap for high-end computing core technologies and an accessibility improvement plan . Improving HPC Education and
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doerfler, Douglas; Austin, Brian; Cook, Brandon
There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less
Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers
NASA Astrophysics Data System (ADS)
Oyarzun, Guillermo; Borrell, Ricard; Gorobets, Andrey; Oliva, Assensi
2017-10-01
Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.
Large-Scale NASA Science Applications on the Columbia Supercluster
NASA Technical Reports Server (NTRS)
Brooks, Walter
2005-01-01
Columbia, NASA's newest 61 teraflops supercomputer that became operational late last year, is a highly integrated Altix cluster of 10,240 processors, and was named to honor the crew of the Space Shuttle lost in early 2003. Constructed in just four months, Columbia increased NASA's computing capability ten-fold, and revitalized the Agency's high-end computing efforts. Significant cutting-edge science and engineering simulations in the areas of space and Earth sciences, as well as aeronautics and space operations, are already occurring on this largest operational Linux supercomputer, demonstrating its capacity and capability to accelerate NASA's space exploration vision. The presentation will describe how an integrated environment consisting not only of next-generation systems, but also modeling and simulation, high-speed networking, parallel performance optimization, and advanced data analysis and visualization, is being used to reduce design cycle time, accelerate scientific discovery, conduct parametric analysis of multiple scenarios, and enhance safety during the life cycle of NASA missions. The talk will conclude by discussing how NAS partnered with various NASA centers, other government agencies, computer industry, and academia, to create a national resource in large-scale modeling and simulation.
NSF Establishes First Four National Supercomputer Centers.
ERIC Educational Resources Information Center
Lepkowski, Wil
1985-01-01
The National Science Foundation (NSF) has awarded support for supercomputer centers at Cornell University, Princeton University, University of California (San Diego), and University of Illinois. These centers are to be the nucleus of a national academic network for use by scientists and engineers throughout the United States. (DH)
Library Services in a Supercomputer Center.
ERIC Educational Resources Information Center
Layman, Mary
1991-01-01
Describes library services that are offered at the San Diego Supercomputer Center (SDSC), which is located at the University of California at San Diego. Topics discussed include the user population; online searching; microcomputer use; electronic networks; current awareness programs; library catalogs; and the slide collection. A sidebar outlines…
Probing the cosmic causes of errors in supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Cosmic rays from outer space are causing errors in supercomputers. The neutrons that pass through the CPU may be causing binary data to flip leading to incorrect calculations. Los Alamos National Laboratory has developed detectors to determine how much data is being corrupted by these cosmic particles.
NSF Says It Will Support Supercomputer Centers in California and Illinois.
ERIC Educational Resources Information Center
Strosnider, Kim; Young, Jeffrey R.
1997-01-01
The National Science Foundation will increase support for supercomputer centers at the University of California, San Diego and the University of Illinois, Urbana-Champaign, while leaving unclear the status of the program at Cornell University (New York) and a cooperative Carnegie-Mellon University (Pennsylvania) and University of Pittsburgh…
Access to Supercomputers. Higher Education Panel Report 69.
ERIC Educational Resources Information Center
Holmstrom, Engin Inel
This survey was conducted to provide the National Science Foundation with baseline information on current computer use in the nation's major research universities, including the actual and potential use of supercomputers. Questionnaires were sent to 207 doctorate-granting institutions; after follow-ups, 167 institutions (91% of the institutions…
NOAA announces significant investment in next generation of supercomputers
provide more timely, accurate weather forecasts. (Credit: istockphoto.com) Today, NOAA announced the next phase in the agency's efforts to increase supercomputing capacity to provide more timely, accurate turn will lead to more timely, accurate, and reliable forecasts." Ahead of this upgrade, each of
Developments in the simulation of compressible inviscid and viscous flow on supercomputers
NASA Technical Reports Server (NTRS)
Steger, J. L.; Buning, P. G.
1985-01-01
In anticipation of future supercomputers, finite difference codes are rapidly being extended to simulate three-dimensional compressible flow about complex configurations. Some of these developments are reviewed. The importance of computational flow visualization and diagnostic methods to three-dimensional flow simulation is also briefly discussed.
NASA Technical Reports Server (NTRS)
Smarr, Larry; Press, William; Arnett, David W.; Cameron, Alastair G. W.; Crutcher, Richard M.; Helfand, David J.; Horowitz, Paul; Kleinmann, Susan G.; Linsky, Jeffrey L.; Madore, Barry F.
1991-01-01
The applications of computers and data processing to astronomy are discussed. Among the topics covered are the emerging national information infrastructure, workstations and supercomputers, supertelescopes, digital astronomy, astrophysics in a numerical laboratory, community software, archiving of ground-based observations, dynamical simulations of complex systems, plasma astrophysics, and the remote control of fourth dimension supercomputers.
Wang, Zihao; Chen, Yu; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Liu, Zhiyong; Sun, Fei; Zhang, Fa
2018-03-01
Electron tomography (ET) is an important technique for studying the three-dimensional structures of the biological ultrastructure. Recently, ET has reached sub-nanometer resolution for investigating the native and conformational dynamics of macromolecular complexes by combining with the sub-tomogram averaging approach. Due to the limited sampling angles, ET reconstruction typically suffers from the "missing wedge" problem. Using a validation procedure, iterative compressed-sensing optimized nonuniform fast Fourier transform (NUFFT) reconstruction (ICON) demonstrates its power in restoring validated missing information for a low-signal-to-noise ratio biological ET dataset. However, the huge computational demand has become a bottleneck for the application of ICON. In this work, we implemented a parallel acceleration technology ICON-many integrated core (MIC) on Xeon Phi cards to address the huge computational demand of ICON. During this step, we parallelize the element-wise matrix operations and use the efficient summation of a matrix to reduce the cost of matrix computation. We also developed parallel versions of NUFFT on MIC to achieve a high acceleration of ICON by using more efficient fast Fourier transform (FFT) calculation. We then proposed a hybrid task allocation strategy (two-level load balancing) to improve the overall performance of ICON-MIC by making full use of the idle resources on Tianhe-2 supercomputer. Experimental results using two different datasets show that ICON-MIC has high accuracy in biological specimens under different noise levels and a significant acceleration, up to 13.3 × , compared with the CPU version. Further, ICON-MIC has good scalability efficiency and overall performance on Tianhe-2 supercomputer.
Role of High-End Computing in Meeting NASA's Science and Engineering Challenges
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Tu, Eugene L.; Van Dalsem, William R.
2006-01-01
Two years ago, NASA was on the verge of dramatically increasing its HEC capability and capacity. With the 10,240-processor supercomputer, Columbia, now in production for 18 months, HEC has an even greater impact within the Agency and extending to partner institutions. Advanced science and engineering simulations in space exploration, shuttle operations, Earth sciences, and fundamental aeronautics research are occurring on Columbia, demonstrating its ability to accelerate NASA s exploration vision. This talk describes how the integrated production environment fostered at the NASA Advanced Supercomputing (NAS) facility at Ames Research Center is accelerating scientific discovery, achieving parametric analyses of multiple scenarios, and enhancing safety for NASA missions. We focus on Columbia s impact on two key engineering and science disciplines: Aerospace, and Climate. We also discuss future mission challenges and plans for NASA s next-generation HEC environment.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Fast I/O for Massively Parallel Applications
NASA Technical Reports Server (NTRS)
OKeefe, Matthew T.
1996-01-01
The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.
Tools for 3D scientific visualization in computational aerodynamics
NASA Technical Reports Server (NTRS)
Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val
1989-01-01
The purpose is to describe the tools and techniques in use at the NASA Ames Research Center for performing visualization of computational aerodynamics, for example visualization of flow fields from computer simulations of fluid dynamics about vehicles such as the Space Shuttle. The hardware used for visualization is a high-performance graphics workstation connected to a super computer with a high speed channel. At present, the workstation is a Silicon Graphics IRIS 3130, the supercomputer is a CRAY2, and the high speed channel is a hyperchannel. The three techniques used for visualization are post-processing, tracking, and steering. Post-processing analysis is done after the simulation. Tracking analysis is done during a simulation but is not interactive, whereas steering analysis involves modifying the simulation interactively during the simulation. Using post-processing methods, a flow simulation is executed on a supercomputer and, after the simulation is complete, the results of the simulation are processed for viewing. The software in use and under development at NASA Ames Research Center for performing these types of tasks in computational aerodynamics is described. Workstation performance issues, benchmarking, and high-performance networks for this purpose are also discussed as well as descriptions of other hardware for digital video and film recording.
Supercomputer use in orthopaedic biomechanics research: focus on functional adaptation of bone.
Hart, R T; Thongpreda, N; Van Buskirk, W C
1988-01-01
The authors describe two biomechanical analyses carried out using numerical methods. One is an analysis of the stress and strain in a human mandible, and the other analysis involves modeling the adaptive response of a sheep bone to mechanical loading. The computing environment required for the two types of analyses is discussed. It is shown that a simple stress analysis of a geometrically complex mandible can be accomplished using a minicomputer. However, more sophisticated analyses of the same model with dynamic loading or nonlinear materials would require supercomputer capabilities. A supercomputer is also required for modeling the adaptive response of living bone, even when simple geometric and material models are use.
Supercomputer optimizations for stochastic optimal control applications
NASA Technical Reports Server (NTRS)
Chung, Siu-Leung; Hanson, Floyd B.; Xu, Huihuang
1991-01-01
Supercomputer optimizations for a computational method of solving stochastic, multibody, dynamic programming problems are presented. The computational method is valid for a general class of optimal control problems that are nonlinear, multibody dynamical systems, perturbed by general Markov noise in continuous time, i.e., nonsmooth Gaussian as well as jump Poisson random white noise. Optimization techniques for vector multiprocessors or vectorizing supercomputers include advanced data structures, loop restructuring, loop collapsing, blocking, and compiler directives. These advanced computing techniques and superconducting hardware help alleviate Bellman's curse of dimensionality in dynamic programming computations, by permitting the solution of large multibody problems. Possible applications include lumped flight dynamics models for uncertain environments, such as large scale and background random aerospace fluctuations.
Optimization of large matrix calculations for execution on the Cray X-MP vector supercomputer
NASA Technical Reports Server (NTRS)
Hornfeck, William A.
1988-01-01
A considerable volume of large computational computer codes were developed for NASA over the past twenty-five years. This code represents algorithms developed for machines of earlier generation. With the emergence of the vector supercomputer as a viable, commercially available machine, an opportunity exists to evaluate optimization strategies to improve the efficiency of existing software. This result is primarily due to architectural differences in the latest generation of large-scale machines and the earlier, mostly uniprocessor, machines. A sofware package being used by NASA to perform computations on large matrices is described, and a strategy for conversion to the Cray X-MP vector supercomputer is also described.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis
2016-05-23
LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF. Keywords—LLMapReduce; map-reduce; performance; scheduler; Grid Engine ...SLURM; LSF I. INTRODUCTION Large scale computing is currently dominated by four ecosystems: supercomputing, database, enterprise , and big data [1...interconnects [6]), High performance math libraries (e.g., BLAS [7, 8], LAPACK [9], ScaLAPACK [10]) designed to exploit special processing hardware, High
Congressional Panel Seeks To Curb Access of Foreign Students to U.S. Supercomputers.
ERIC Educational Resources Information Center
Kiernan, Vincent
1999-01-01
Fearing security problems, a congressional committee on Chinese espionage recommends that foreign students and other foreign nationals be barred from using supercomputers at national laboratories unless they first obtain export licenses from the federal government. University officials dispute the data on which the report is based and find the…
The Age of the Supercomputer Gives Way to the Age of the Super Infrastructure.
ERIC Educational Resources Information Center
Young, Jeffrey R.
1997-01-01
In October 1997, the National Science Foundation will discontinue financial support for two university-based supercomputer facilities to concentrate resources on partnerships led by facilities at the University of California, San Diego and the University of Illinois, Urbana-Champaign. The reconfigured program will develop more user-friendly and…
Extracting the Textual and Temporal Structure of Supercomputing Logs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, S; Singh, I; Chandra, A
2009-05-26
Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an onlinemore » clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.« less
In situ visualization for large-scale combustion simulations.
Yu, Hongfeng; Wang, Chaoli; Grout, Ray W; Chen, Jacqueline H; Ma, Kwan-Liu
2010-01-01
As scientific supercomputing moves toward petascale and exascale levels, in situ visualization stands out as a scalable way for scientists to view the data their simulations generate. This full picture is crucial particularly for capturing and understanding highly intermittent transient phenomena, such as ignition and extinction events in turbulent combustion.
ERIC Educational Resources Information Center
Raths, David
2010-01-01
In the tug-of-war between researchers and IT for supercomputing resources, a centralized approach can help both sides get more bang for their buck. As 2010 began, the University of Washington was preparing to launch its first shared high-performance computing cluster, a 1,500-node system called Hyak, dedicated to research activities. Like other…
Towards Scalable Deep Learning via I/O Analysis and Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pumma, Sarunya; Si, Min; Feng, Wu-Chun
Deep learning systems have been growing in prominence as a way to automatically characterize objects, trends, and anomalies. Given the importance of deep learning systems, researchers have been investigating techniques to optimize such systems. An area of particular interest has been using large supercomputing systems to quickly generate effective deep learning networks: a phase often referred to as “training” of the deep learning neural network. As we scale existing deep learning frameworks—such as Caffe—on these large supercomputing systems, we notice that the parallelism can help improve the computation tremendously, leaving data I/O as the major bottleneck limiting the overall systemmore » scalability. In this paper, we first present a detailed analysis of the performance bottlenecks of Caffe on large supercomputing systems. Our analysis shows that the I/O subsystem of Caffe—LMDB—relies on memory-mapped I/O to access its database, which can be highly inefficient on large-scale systems because of its interaction with the process scheduling system and the network-based parallel filesystem. Based on this analysis, we then present LMDBIO, our optimized I/O plugin for Caffe that takes into account the data access pattern of Caffe in order to vastly improve I/O performance. Our experimental results show that LMDBIO can improve the overall execution time of Caffe by nearly 20-fold in some cases.« less
NASA Astrophysics Data System (ADS)
Wang, Z. P.; Hayhurst, D. R.
1994-07-01
The creep deformation and damage evolution in a pipe weldment has been modeled by using the finite-element continuum damage mechanics (CDM) method. The finite-element CDM computer program DAMAGE XX has been adapted to run with increased speed on a Cray XMP/416 supercomputer. Run times are sufficiently short (20 min) to permit many parametric studies to be carried out on vessel lifetimes for different weld and heat affected zone (HAZ) materials. Finite-element mesh sensitivity was studied first in order to select a mesh capable of correctly predicting experimentally observed results using at least possible computer time. A study was then made of the effect on the lifetime of a butt welded vessel of each of the commomly measured material parameters for the weld and HAZ materials. Forty different ferritic steel welded vessels were analyzed for a constant internal pressure of 45.5 MPa at a temperature of 565 C; each vessel having the same parent pipe material but different weld and HAZ materials. A lifetime improvement has been demonstrated of 30% over that obtained for the initial materials property data. A methodology for weldment design has been established which uses supercomputer-based CDM analysis techniques; it is quick to use, provides accurate results, and is a viable design tool.
P2P Technology for High-Performance Computing: An Overview
NASA Technical Reports Server (NTRS)
Follen, Gregory J. (Technical Monitor); Berry, Jason
2003-01-01
The transition from cluster computing to peer-to-peer (P2P) high-performance computing has recently attracted the attention of the computer science community. It has been recognized that existing local networks and dedicated clusters of headless workstations can serve as inexpensive yet powerful virtual supercomputers. It has also been recognized that the vast number of lower-end computers connected to the Internet stay idle for as long as 90% of the time. The growing speed of Internet connections and the high availability of free CPU time encourage exploration of the possibility to use the whole Internet rather than local clusters as massively parallel yet almost freely available P2P supercomputer. As a part of a larger project on P2P high-performance computing, it has been my goal to compile an overview of the 2P2 paradigm. I have studied various P2P platforms and I have compiled systematic brief descriptions of their most important characteristics. I have also experimented and obtained hands-on experience with selected P2P platforms focusing on those that seem promising with respect to P2P high-performance computing. I have also compiled relevant literature and web references. I have prepared a draft technical report and I have summarized my findings in a poster paper.
High performance computing for advanced modeling and simulation of materials
NASA Astrophysics Data System (ADS)
Wang, Jue; Gao, Fei; Vazquez-Poletti, Jose Luis; Li, Jianjiang
2017-02-01
The First International Workshop on High Performance Computing for Advanced Modeling and Simulation of Materials (HPCMS2015) was held in Austin, Texas, USA, Nov. 18, 2015. HPCMS 2015 was organized by Computer Network Information Center (Chinese Academy of Sciences), University of Michigan, Universidad Complutense de Madrid, University of Science and Technology Beijing, Pittsburgh Supercomputing Center, China Institute of Atomic Energy, and Ames Laboratory.
Some Problems and Solutions in Transferring Ecosystem Simulation Codes to Supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1994-01-01
Many computer codes for the simulation of ecological systems have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Recent recognition of ecosystem science as a High Performance Computing and Communications Program Grand Challenge area emphasizes supercomputers (both parallel and distributed systems) as the next set of tools for ecological simulation. Transferring ecosystem simulation codes to such systems is not a matter of simply compiling and executing existing code on the supercomputer since there are significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers. To more appropriately match the application to the architecture (necessary to achieve reasonable performance), the parallelism (if it exists) of the original application must be exploited. We discuss our work in transferring a general grassland simulation model (developed on a VAX in the FORTRAN computer programming language) to a Cray Y-MP. We show the Cray shared-memory vector-architecture, and discuss our rationale for selecting the Cray. We describe porting the model to the Cray and executing and verifying a baseline version, and we discuss the changes we made to exploit the parallelism in the application and to improve code execution. As a result, the Cray executed the model 30 times faster than the VAX 11/785 and 10 times faster than a Sun 4 workstation. We achieved an additional speed-up of approximately 30 percent over the original Cray run by using the compiler's vectorizing capabilities and the machine's ability to put subroutines and functions "in-line" in the code. With the modifications, the code still runs at only about 5% of the Cray's peak speed because it makes ineffective use of the vector processing capabilities of the Cray. We conclude with a discussion and future plans.
Predicting Hurricanes with Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2010-01-01
Hurricane Emily, formed in the Atlantic Ocean on July 10, 2005, was the strongest hurricane ever to form before August. By checking computer models against the actual path of the storm, researchers can improve hurricane prediction. In 2010, NOAA researchers were awarded 25 million processor-hours on Argonne's BlueGene/P supercomputer for the project. Read more at http://go.usa.gov/OLh
NASA Technical Reports Server (NTRS)
Peterson, Victor L.; Kim, John; Holst, Terry L.; Deiwert, George S.; Cooper, David M.; Watson, Andrew B.; Bailey, F. Ron
1992-01-01
Report evaluates supercomputer needs of five key disciplines: turbulence physics, aerodynamics, aerothermodynamics, chemistry, and mathematical modeling of human vision. Predicts these fields will require computer speed greater than 10(Sup 18) floating-point operations per second (FLOP's) and memory capacity greater than 10(Sup 15) words. Also, new parallel computer architectures and new structured numerical methods will make necessary speed and capacity available.
Advances in petascale kinetic plasma simulation with VPIC and Roadrunner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Kevin J; Albright, Brian J; Yin, Lin
2009-01-01
VPIC, a first-principles 3d electromagnetic charge-conserving relativistic kinetic particle-in-cell (PIC) code, was recently adapted to run on Los Alamos's Roadrunner, the first supercomputer to break a petaflop (10{sup 15} floating point operations per second) in the TOP500 supercomputer performance rankings. They give a brief overview of the modeling capabilities and optimization techniques used in VPIC and the computational characteristics of petascale supercomputers like Roadrunner. They then discuss three applications enabled by VPIC's unprecedented performance on Roadrunner: modeling laser plasma interaction in upcoming inertial confinement fusion experiments at the National Ignition Facility (NIF), modeling short pulse laser GeV ion acceleration andmore » modeling reconnection in magnetic confinement fusion experiments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curran, L.
1988-03-03
Interest has been building in recent months over the imminent arrival of a new class of supercomputer, called the ''supercomputer on a desk'' or the single-user model. Most observers expected the first such product to come from either of two startups, Ardent Computer Corp. or Stellar Computer Inc. But a surprise entry has shown up. Apollo Computer Inc. is launching a new work station this week that racks up an impressive list of industry first as it puts supercomputer power at the disposal of a single user. The new series 10000 from the Chelmsford, Mass., a company is built aroundmore » a reduced-instruction-set architecture that the company calls Prism, for parallel reduced-instruction-set multiprocessor. This article describes the 10000 and Prism.« less
Science & Technology Review June 2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poyneer, L A
2012-04-20
This month's issue has the following articles: (1) A New Era in Climate System Analysis - Commentary by William H. Goldstein; (2) Seeking Clues to Climate Change - By comparing past climate records with results from computer simulations, Livermore scientists can better understand why Earth's climate has changed and how it might change in the future; (3) Finding and Fixing a Supercomputer's Faults - Livermore experts have developed innovative methods to detect hardware faults in supercomputers and help applications recover from errors that do occur; (4) Targeting Ignition - Enhancements to the cryogenic targets for National Ignition Facility experiments aremore » furthering work to achieve fusion ignition with energy gain; (5) Neural Implants Come of Age - A new generation of fully implantable, biocompatible neural prosthetics offers hope to patients with neurological impairment; and (6) Incubator Busy Growing Energy Technologies - Six collaborations with industrial partners are using the Laboratory's high-performance computing resources to find solutions to urgent energy-related problems.« less
Using a Cray Y-MP as an array processor for a RISC Workstation
NASA Technical Reports Server (NTRS)
Lamaster, Hugh; Rogallo, Sarah J.
1992-01-01
As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described.
NASA Center for Climate Simulation (NCCS) Presentation
NASA Technical Reports Server (NTRS)
Webster, William P.
2012-01-01
The NASA Center for Climate Simulation (NCCS) offers integrated supercomputing, visualization, and data interaction technologies to enhance NASA's weather and climate prediction capabilities. It serves hundreds of users at NASA Goddard Space Flight Center, as well as other NASA centers, laboratories, and universities across the US. Over the past year, NCCS has continued expanding its data-centric computing environment to meet the increasingly data-intensive challenges of climate science. We doubled our Discover supercomputer's peak performance to more than 800 teraflops by adding 7,680 Intel Xeon Sandy Bridge processor-cores and most recently 240 Intel Xeon Phi Many Integrated Core (MIG) co-processors. A supercomputing-class analysis system named Dali gives users rapid access to their data on Discover and high-performance software including the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT), with interfaces from user desktops and a 17- by 6-foot visualization wall. NCCS also is exploring highly efficient climate data services and management with a new MapReduce/Hadoop cluster while augmenting its data distribution to the science community. Using NCCS resources, NASA completed its modeling contributions to the Intergovernmental Panel on Climate Change (IPCG) Fifth Assessment Report this summer as part of the ongoing Coupled Modellntercomparison Project Phase 5 (CMIP5). Ensembles of simulations run on Discover reached back to the year 1000 to test model accuracy and projected climate change through the year 2300 based on four different scenarios of greenhouse gases, aerosols, and land use. The data resulting from several thousand IPCC/CMIP5 simulations, as well as a variety of other simulation, reanalysis, and observationdatasets, are available to scientists and decision makers through an enhanced NCCS Earth System Grid Federation Gateway. Worldwide downloads have totaled over 110 terabytes of data.
On the energy footprint of I/O management in Exascale HPC systems
Dorier, Matthieu; Yildiz, Orcun; Ibrahim, Shadi; ...
2016-03-21
The advent of unprecedentedly scalable yet energy hungry Exascale supercomputers poses a major challenge in sustaining a high performance-per-watt ratio. With I/O management acquiring a crucial role in supporting scientific simulations, various I/O management approaches have been proposed to achieve high performance and scalability. But, the details of how these approaches affect energy consumption have not been studied yet. Therefore, this paper aims to explore how much energy a supercomputer consumes while running scientific simulations when adopting various I/O management approaches. In particular, we closely examine three radically different I/O schemes including time partitioning, dedicated cores, and dedicated nodes. Tomore » accomplish this, we implement the three approaches within the Damaris I/O middleware and perform extensive experiments with one of the target HPC applications of the Blue Waters sustained-petaflop supercomputer project: the CM1 atmospheric model. Our experimental results obtained on the French Grid'5000 platform highlight the differences among these three approaches and illustrate in which way various configurations of the application and of the system can impact performance and energy consumption. Moreover, we propose and validate a mathematical model that estimates the energy consumption of a HPC simulation under different I/O approaches. This proposed model gives hints to pre-select the most energy-efficient I/O approach for a particular simulation on a particular HPC system and therefore provides a step towards energy-efficient HPC simulations in Exascale systems. To the best of our knowledge, our work provides the first in-depth look into the energy-performance tradeoffs of I/O management approaches.« less
NAS Applications and Advanced Algorithms
NASA Technical Reports Server (NTRS)
Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)
1997-01-01
This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
Integration of the Chinese HPC Grid in ATLAS Distributed Computing
NASA Astrophysics Data System (ADS)
Filipčič, A.;
2017-10-01
Fifteen Chinese High-Performance Computing sites, many of them on the TOP500 list of most powerful supercomputers, are integrated into a common infrastructure providing coherent access to a user through an interface based on a RESTful interface called SCEAPI. These resources have been integrated into the ATLAS Grid production system using a bridge between ATLAS and SCEAPI which translates the authorization and job submission protocols between the two environments. The ARC Computing Element (ARC-CE) forms the bridge using an extended batch system interface to allow job submission to SCEAPI. The ARC-CE was setup at the Institute for High Energy Physics, Beijing, in order to be as close as possible to the SCEAPI front-end interface at the Computing Network Information Center, also in Beijing. This paper describes the technical details of the integration between ARC-CE and SCEAPI and presents results so far with two supercomputer centers, Tianhe-IA and ERA. These two centers have been the pilots for ATLAS Monte Carlo Simulation in SCEAPI and have been providing CPU power since fall 2015.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code.
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling. PMID:28701946
NASA Astrophysics Data System (ADS)
Hara, Tatsuhiko
2004-08-01
We implement the Direct Solution Method (DSM) on a vector-parallel supercomputer and show that it is possible to significantly improve its computational efficiency through parallel computing. We apply the parallel DSM calculation to waveform inversion of long period (250-500 s) surface wave data for three-dimensional (3-D) S-wave velocity structure in the upper and uppermost lower mantle. We use a spherical harmonic expansion to represent lateral variation with the maximum angular degree 16. We find significant low velocities under south Pacific hot spots in the transition zone. This is consistent with other seismological studies conducted in the Superplume project, which suggests deep roots of these hot spots. We also perform simultaneous waveform inversion for 3-D S-wave velocity and Q structure. Since resolution for Q is not good, we develop a new technique in which power spectra are used as data for inversion. We find good correlation between long wavelength patterns of Vs and Q in the transition zone such as high Vs and high Q under the western Pacific.
None
2018-05-01
A new Idaho National Laboratory supercomputer is helping scientists create more realistic simulations of nuclear fuel. Dubbed "Ice Storm" this 2048-processor machine allows researchers to model and predict the complex physics behind nuclear reactor behavior. And with a new visualization lab, the team can see the results of its simulations on the big screen. For more information about INL research, visit http://www.facebook.com/idahonationallaboratory.
Open Skies Project Computational Fluid Dynamic Analysis
1994-03-01
109 -. -_ _ 9 . CONCLUSIONSI1 f 10. LIST OF REFERENCES _________ ___________112 APPENDIX A: Transition Prediction __________________116 B...Behind the Open Skies Plate 20 8. VSAERO Results on the Alternate Fairing 21 9 . Centerline Cp Comparisons 22 10. VSAERO Wing Effects Study Centerline C...problems. The assistance Mrs. Mary Ann Mages, at Kirtland Supercomputer Center ( PL /SCPR) gave by setting a precedent for supercomputer account
High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.
2017-12-01
The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.
Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X
2009-08-01
This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.
Supercomputer Simulations Help Develop New Approach to Fight Antibiotic Resistance
Zgurskaya, Helen; Smith, Jeremy
2018-06-13
ORNL leveraged powerful supercomputing to support research led by University of Oklahoma scientists to identify chemicals that seek out and disrupt bacterial proteins called efflux pumps, known to be a major cause of antibiotic resistance. By running simulations on Titan, the team selected molecules most likely to target and potentially disable the assembly of efflux pumps found in E. coli bacteria cells.
Standish, Kristopher A; Carland, Tristan M; Lockwood, Glenn K; Pfeiffer, Wayne; Tatineni, Mahidhar; Huang, C Chris; Lamberth, Sarah; Cherkas, Yauheniya; Brodmerkel, Carrie; Jaeger, Ed; Smith, Lance; Rajagopal, Gunaretnam; Curran, Mark E; Schork, Nicholas J
2015-09-22
Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.
Scalability Test of Multiscale Fluid-Platelet Model for Three Top Supercomputers
Zhang, Peng; Zhang, Na; Gao, Chao; Zhang, Li; Gao, Yuxiang; Deng, Yuefan; Bluestein, Danny
2016-01-01
We have tested the scalability of three supercomputers: the Tianhe-2, Stampede and CS-Storm with multiscale fluid-platelet simulations, in which a highly-resolved and efficient numerical model for nanoscale biophysics of platelets in microscale viscous biofluids is considered. Three experiments involving varying problem sizes were performed: Exp-S: 680,718-particle single-platelet; Exp-M: 2,722,872-particle 4-platelet; and Exp-L: 10,891,488-particle 16-platelet. Our implementations of multiple time-stepping (MTS) algorithm improved the performance of single time-stepping (STS) in all experiments. Using MTS, our model achieved the following simulation rates: 12.5, 25.0, 35.5 μs/day for Exp-S and 9.09, 6.25, 14.29 μs/day for Exp-M on Tianhe-2, CS-Storm 16-K80 and Stampede K20. The best rate for Exp-L was 6.25 μs/day for Stampede. Utilizing current advanced HPC resources, the simulation rates achieved by our algorithms bring within reach performing complex multiscale simulations for solving vexing problems at the interface of biology and engineering, such as thrombosis in blood flow which combines millisecond-scale hematology with microscale blood flow at resolutions of micro-to-nanoscale cellular components of platelets. This study of testing the performance characteristics of supercomputers with advanced computational algorithms that offer optimal trade-off to achieve enhanced computational performance serves to demonstrate that such simulations are feasible with currently available HPC resources. PMID:27570250
NREL, Hewlett-Packard Developed Ultra-Efficient, High-Performance Computing
and allows the heat captured from the supercomputer to provide all the heating needs for the Energy Systems Integration Facility. And there's even enough heat left over to melt snow outside on sidewalks during the winter. During the summer, the unused heat can be rejected via cooling towers. R&D
Measurements over distributed high performance computing and storage systems
NASA Technical Reports Server (NTRS)
Williams, Elizabeth; Myers, Tom
1993-01-01
A strawman proposal is given for a framework for presenting a common set of metrics for supercomputers, workstations, file servers, mass storage systems, and the networks that interconnect them. Production control and database systems are also included. Though other applications and third part software systems are not addressed, it is important to measure them as well.
Next Generation Security for the 10,240 Processor Columbia System
NASA Technical Reports Server (NTRS)
Hinke, Thomas; Kolano, Paul; Shaw, Derek; Keller, Chris; Tweton, Dave; Welch, Todd; Liu, Wen (Betty)
2005-01-01
This presentation includes a discussion of the Columbia 10,240-processor system located at the NASA Advanced Supercomputing (NAS) division at the NASA Ames Research Center which supports each of NASA's four missions: science, exploration systems, aeronautics, and space operations. It is comprised of 20 Silicon Graphics nodes, each consisting of 512 Itanium II processors. A 64 processor Columbia front-end system supports users as they prepare their jobs and then submits them to the PBS system. Columbia nodes and front-end systems use the Linux OS. Prior to SC04, the Columbia system was used to attain a processing speed of 51.87 TeraFlops, which made it number two on the Top 500 list of the world's supercomputers and the world's fastest "operational" supercomputer since it was fully engaged in supporting NASA users.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas
2017-04-01
Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
SISYPHUS: A high performance seismic inversion factory
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas
2016-04-01
In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).
CFD applications: The Lockheed perspective
NASA Technical Reports Server (NTRS)
Miranda, Luis R.
1987-01-01
The Numerical Aerodynamic Simulator (NAS) epitomizes the coming of age of supercomputing and opens exciting horizons in the world of numerical simulation. An overview of supercomputing at Lockheed Corporation in the area of Computational Fluid Dynamics (CFD) is presented. This overview will focus on developments and applications of CFD as an aircraft design tool and will attempt to present an assessment, withing this context, of the state-of-the-art in CFD methodology.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
A Layered Solution for Supercomputing Storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grider, Gary
To solve the supercomputing challenge of memory keeping up with processing speed, a team at Los Alamos National Laboratory developed two innovative memory management and storage technologies. Burst buffers peel off data onto flash memory to support the checkpoint/restart paradigm of large simulations. MarFS adds a thin software layer enabling a new tier for campaign storage—based on inexpensive, failure-prone disk drives—between disk drives and tape archives.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muller, U.A.; Baumle, B.; Kohler, P.
1992-10-01
Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.
LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments
2015-11-20
1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming
Advanced Numerical Techniques of Performance Evaluation. Volume 1
1990-06-01
system scheduling3thread. The scheduling thread then runs any other ready thread that can be found. A thread can only sleep or switch out on itself...Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers C...Kuck 1987] C.D. Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Comp
Hurricane Intensity Forecasts with a Global Mesoscale Model on the NASA Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Shen, Bo-Wen; Tao, Wei-Kuo; Atlas, Robert
2006-01-01
It is known that General Circulation Models (GCMs) have insufficient resolution to accurately simulate hurricane near-eye structure and intensity. The increasing capabilities of high-end computers (e.g., the NASA Columbia Supercomputer) have changed this. In 2004, the finite-volume General Circulation Model at a 1/4 degree resolution, doubling the resolution used by most of operational NWP center at that time, was implemented and run to obtain promising landfall predictions for major hurricanes (e.g., Charley, Frances, Ivan, and Jeanne). In 2005, we have successfully implemented the 1/8 degree version, and demonstrated its performance on intensity forecasts with hurricane Katrina (2005). It is found that the 1/8 degree model is capable of simulating the radius of maximum wind and near-eye wind structure, and thereby promising intensity forecasts. In this study, we will further evaluate the model s performance on intensity forecasts of hurricanes Ivan, Jeanne, Karl in 2004. Suggestions for further model development will be made in the end.
Automation of a Wave-Optics Simulation and Image Post-Processing Package on Riptide
NASA Astrophysics Data System (ADS)
Werth, M.; Lucas, J.; Thompson, D.; Abercrombie, M.; Holmes, R.; Roggemann, M.
Detailed wave-optics simulations and image post-processing algorithms are computationally expensive and benefit from the massively parallel hardware available at supercomputing facilities. We created an automated system that interfaces with the Maui High Performance Computing Center (MHPCC) Distributed MATLAB® Portal interface to submit massively parallel waveoptics simulations to the IBM iDataPlex (Riptide) supercomputer. This system subsequently postprocesses the output images with an improved version of physically constrained iterative deconvolution (PCID) and analyzes the results using a series of modular algorithms written in Python. With this architecture, a single person can simulate thousands of unique scenarios and produce analyzed, archived, and briefing-compatible output products with very little effort. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
HACC: Extreme Scaling and Performance Across Diverse Architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin
2013-11-01
Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Preparing for in situ processing on upcoming leading-edge supercomputers
Kress, James; Churchill, Randy Michael; Klasky, Scott; ...
2016-10-01
High performance computing applications are producing increasingly large amounts of data and placing enormous stress on current capabilities for traditional post-hoc visualization techniques. Because of the growing compute and I/O imbalance, data reductions, including in situ visualization, are required. These reduced data are used for analysis and visualization in a variety of different ways. Many of he visualization and analysis requirements are known a priori, but when they are not, scientists are dependent on the reduced data to accurately represent the simulation in post hoc analysis. The contributions of this paper is a description of the directions we are pursuingmore » to assist a large scale fusion simulation code succeed on the next generation of supercomputers. Finally, these directions include the role of in situ processing for performing data reductions, as well as the tradeoffs between data size and data integrity within the context of complex operations in a typical scientific workflow.« less
Web-based system for surgical planning and simulation
NASA Astrophysics Data System (ADS)
Eldeib, Ayman M.; Ahmed, Mohamed N.; Farag, Aly A.; Sites, C. B.
1998-10-01
The growing scientific knowledge and rapid progress in medical imaging techniques has led to an increasing demand for better and more efficient methods of remote access to high-performance computer facilities. This paper introduces a web-based telemedicine project that provides interactive tools for surgical simulation and planning. The presented approach makes use of client-server architecture based on new internet technology where clients use an ordinary web browser to view, send, receive and manipulate patients' medical records while the server uses the supercomputer facility to generate online semi-automatic segmentation, 3D visualization, surgical simulation/planning and neuroendoscopic procedures navigation. The supercomputer (SGI ONYX 1000) is located at the Computer Vision and Image Processing Lab, University of Louisville, Kentucky. This system is under development in cooperation with the Department of Neurological Surgery, Alliant Health Systems, Louisville, Kentucky. The server is connected via a network to the Picture Archiving and Communication System at Alliant Health Systems through a DICOM standard interface that enables authorized clients to access patients' images from different medical modalities.
NASA Technical Reports Server (NTRS)
Shen, B.-W.; Atlas, R.; Chern, J.-D.; Reale, O.; Lin, S.-J.; Lee, T.; Chang, J.
2005-01-01
The NASA Columbia supercomputer was ranked second on the TOP500 List in November, 2004. Such a quantum jump in computing power provides unprecedented opportunities to conduct ultra-high resolution simulations with the finite-volume General Circulation Model (fvGCM). During 2004, the model was run in realtime experimentally at 0.25 degree resolution producing remarkable hurricane forecasts [Atlas et al., 2005]. In 2005, the horizontal resolution was further doubled, which makes the fvGCM comparable to the first mesoscale resolving General Circulation Model at the Earth Simulator Center [Ohfuchi et al., 2004]. Nine 5-day 0.125 degree simulations of three hurricanes in 2004 are presented first for model validation. Then it is shown how the model can simulate the formation of the Catalina eddies and Hawaiian lee vortices, which are generated by the interaction of the synoptic-scale flow with surface forcing, and have never been reproduced in a GCM before.)
Remote visual analysis of large turbulence databases at multiple scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
NASA Astrophysics Data System (ADS)
Fluke, Christopher J.; Barnes, David G.; Barsdell, Benjamin R.; Hassan, Amr H.
2011-01-01
General-purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplifying the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.
Remote visual analysis of large turbulence databases at multiple scales
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...
2018-06-15
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray
2015-12-01
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan
2007-01-01
Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
A Layered Solution for Supercomputing Storage
Grider, Gary
2018-06-13
To solve the supercomputing challenge of memory keeping up with processing speed, a team at Los Alamos National Laboratory developed two innovative memory management and storage technologies. Burst buffers peel off data onto flash memory to support the checkpoint/restart paradigm of large simulations. MarFS adds a thin software layer enabling a new tier for campaign storageâbased on inexpensive, failure-prone disk drivesâbetween disk drives and tape archives.
A Long History of Supercomputing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grider, Gary
As part of its national security science mission, Los Alamos National Laboratory and HPC have a long, entwined history dating back to the earliest days of computing. From bringing the first problem to the nation’s first computer to building the first machine to break the petaflop barrier, Los Alamos holds many “firsts” in HPC breakthroughs. Today, supercomputers are integral to stockpile stewardship and the Laboratory continues to work with vendors in developing the future of HPC.
Introducing Argonne’s Theta Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Theta, the Argonne Leadership Computing Facility’s (ALCF) new Intel-Cray supercomputer, is officially open to the research community. Theta’s massively parallel, many-core architecture puts the ALCF on the path to Aurora, the facility’s future Intel-Cray system. Capable of nearly 10 quadrillion calculations per second, Theta enables researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Graphics supercomputer for computational fluid dynamics research
NASA Astrophysics Data System (ADS)
Liaw, Goang S.
1994-11-01
The objective of this project is to purchase a state-of-the-art graphics supercomputer to improve the Computational Fluid Dynamics (CFD) research capability at Alabama A & M University (AAMU) and to support the Air Force research projects. A cutting-edge graphics supercomputer system, Onyx VTX, from Silicon Graphics Computer Systems (SGI), was purchased and installed. Other equipment including a desktop personal computer, PC-486 DX2 with a built-in 10-BaseT Ethernet card, a 10-BaseT hub, an Apple Laser Printer Select 360, and a notebook computer from Zenith were also purchased. A reading room has been converted to a research computer lab by adding some furniture and an air conditioning unit in order to provide an appropriate working environments for researchers and the purchase equipment. All the purchased equipment were successfully installed and are fully functional. Several research projects, including two existing Air Force projects, are being performed using these facilities.
Modelling sodium cobaltate by mapping onto magnetic Ising model
NASA Astrophysics Data System (ADS)
Gemperline, Patrick; Morris, David Jonathan Pryce
Fast Ion conductors are a class of crystals that are frequently used as battery materials, especially in smart phones, laptops, and other portable devices. Sodium Cobalt Oxide, NaxCoO2, falls into this class of crystals, but is unique because it possesses the ability to act as a thermoelectric material and a superconductor at different concentrations of Na+. The crystal lattice is mapped onto an Ising Magnetic Spin model and a Monte-Carol Simulation is used to find the most energetically favorable configuration of spins. This spin configuration is mapped back to the crystal lattice resulting in the most stable crystal structure of Sodium Cobalt Oxide at various concentrations. Knowing the atomic structures of the crystals will aid in the research of the materials capabilities and the possible uses of the material commercially. Ohio Supercomputer Center. 1987. Ohio Supercomputer Center. Columbus OH: Ohio Supercomputer Center. and the John Hauck Foundation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boris, J.P.; Picone, J.M.; Lambrakos, S.G.
The Surveillance, Correlation, and Tracking (SCAT) problem is the computation-limited kernel of future battle-management systems currently being developed, for example, under the Strategic Defense Initiative (SDI). This report shows how high-performance SCAT can be performed in this decade. Estimates suggest that an increase by a factor of at least one thousand in computational capacity will be necessary to track 10/sup 5/ SDI objects in real time. This large improvement is needed because standard algorithms for data organization in important segments of the SCAT problem scale as N/sup 2/ and N/sup 3/, where N is the number of perceived objects. Itmore » is shown that the required speed-up factor can now be achieved because of two new developments: 1) a heterogeneous element supercomputer system based on available parallel-processing technology can account for over one order of magnitude performance improvement today over existing supercomputers; and 2) algorithmic innovations development recently by the NRL Laboratory for Computational Physics will account for another two orders of magnitude improvement. Based on these advances, a comprehensive, high-performance kernel for a simulator/system to perform the SCAT portion of SDI battle management is described.« less
Hu, Hao; Hong, Xingchen; Terstriep, Jeff; Liu, Yan; Finn, Michael P.; Rush, Johnathan; Wendel, Jeffrey; Wang, Shaowen
2016-01-01
Geospatial data, often embedded with geographic references, are important to many application and science domains, and represent a major type of big data. The increased volume and diversity of geospatial data have caused serious usability issues for researchers in various scientific domains, which call for innovative cyberGIS solutions. To address these issues, this paper describes a cyberGIS community data service framework to facilitate geospatial big data access, processing, and sharing based on a hybrid supercomputer architecture. Through the collaboration between the CyberGIS Center at the University of Illinois at Urbana-Champaign (UIUC) and the U.S. Geological Survey (USGS), a community data service for accessing, customizing, and sharing digital elevation model (DEM) and its derived datasets from the 10-meter national elevation dataset, namely TopoLens, is created to demonstrate the workflow integration of geospatial big data sources, computation, analysis needed for customizing the original dataset for end user needs, and a friendly online user environment. TopoLens provides online access to precomputed and on-demand computed high-resolution elevation data by exploiting the ROGER supercomputer. The usability of this prototype service has been acknowledged in community evaluation.
Computational Nanotechnology at NASA Ames Research Center, 1996
NASA Technical Reports Server (NTRS)
Globus, Al; Bailey, David; Langhoff, Steve; Pohorille, Andrew; Levit, Creon; Chancellor, Marisa K. (Technical Monitor)
1996-01-01
Some forms of nanotechnology appear to have enormous potential to improve aerospace and computer systems; computational nanotechnology, the design and simulation of programmable molecular machines, is crucial to progress. NASA Ames Research Center has begun a computational nanotechnology program including in-house work, external research grants, and grants of supercomputer time. Four goals have been established: (1) Simulate a hypothetical programmable molecular machine replicating itself and building other products. (2) Develop molecular manufacturing CAD (computer aided design) software and use it to design molecular manufacturing systems and products of aerospace interest, including computer components. (3) Characterize nanotechnologically accessible materials of aerospace interest. Such materials may have excellent strength and thermal properties. (4) Collaborate with experimentalists. Current in-house activities include: (1) Development of NanoDesign, software to design and simulate a nanotechnology based on functionalized fullerenes. Early work focuses on gears. (2) A design for high density atomically precise memory. (3) Design of nanotechnology systems based on biology. (4) Characterization of diamonoid mechanosynthetic pathways. (5) Studies of the laplacian of the electronic charge density to understand molecular structure and reactivity. (6) Studies of entropic effects during self-assembly. Characterization of properties of matter for clusters up to sizes exhibiting bulk properties. In addition, the NAS (NASA Advanced Supercomputing) supercomputer division sponsored a workshop on computational molecular nanotechnology on March 4-5, 1996 held at NASA Ames Research Center. Finally, collaborations with Bill Goddard at CalTech, Ralph Merkle at Xerox Parc, Don Brenner at NCSU (North Carolina State University), Tom McKendree at Hughes, and Todd Wipke at UCSC are underway.
The feasibility of an efficient drug design method with high-performance computers.
Yamashita, Takefumi; Ueda, Akihiko; Mitsui, Takashi; Tomonaga, Atsushi; Matsumoto, Shunji; Kodama, Tatsuhiko; Fujitani, Hideaki
2015-01-01
In this study, we propose a supercomputer-assisted drug design approach involving all-atom molecular dynamics (MD)-based binding free energy prediction after the traditional design/selection step. Because this prediction is more accurate than the empirical binding affinity scoring of the traditional approach, the compounds selected by the MD-based prediction should be better drug candidates. In this study, we discuss the applicability of the new approach using two examples. Although the MD-based binding free energy prediction has a huge computational cost, it is feasible with the latest 10 petaflop-scale computer. The supercomputer-assisted drug design approach also involves two important feedback procedures: The first feedback is generated from the MD-based binding free energy prediction step to the drug design step. While the experimental feedback usually provides binding affinities of tens of compounds at one time, the supercomputer allows us to simultaneously obtain the binding free energies of hundreds of compounds. Because the number of calculated binding free energies is sufficiently large, the compounds can be classified into different categories whose properties will aid in the design of the next generation of drug candidates. The second feedback, which occurs from the experiments to the MD simulations, is important to validate the simulation parameters. To demonstrate this, we compare the binding free energies calculated with various force fields to the experimental ones. The results indicate that the prediction will not be very successful, if we use an inaccurate force field. By improving/validating such simulation parameters, the next prediction can be made more accurate.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.
A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less
NASA Astrophysics Data System (ADS)
Kollet, S. J.; Goergen, K.; Gasper, F.; Shresta, P.; Sulis, M.; Rihani, J.; Simmer, C.; Vereecken, H.
2013-12-01
In studies of the terrestrial hydrologic, energy and biogeochemical cycles, integrated multi-physics simulation platforms take a central role in characterizing non-linear interactions, variances and uncertainties of system states and fluxes in reciprocity with observations. Recently developed integrated simulation platforms attempt to honor the complexity of the terrestrial system across multiple time and space scales from the deeper subsurface including groundwater dynamics into the atmosphere. Technically, this requires the coupling of atmospheric, land surface, and subsurface-surface flow models in supercomputing environments, while ensuring a high-degree of efficiency in the utilization of e.g., standard Linux clusters and massively parallel resources. A systematic performance analysis including profiling and tracing in such an application is crucial in the understanding of the runtime behavior, to identify optimum model settings, and is an efficient way to distinguish potential parallel deficiencies. On sophisticated leadership-class supercomputers, such as the 28-rack 5.9 petaFLOP IBM Blue Gene/Q 'JUQUEEN' of the Jülich Supercomputing Centre (JSC), this is a challenging task, but even more so important, when complex coupled component models are to be analysed. Here we want to present our experience from coupling, application tuning (e.g. 5-times speedup through compiler optimizations), parallel scaling and performance monitoring of the parallel Terrestrial Systems Modeling Platform TerrSysMP. The modeling platform consists of the weather prediction system COSMO of the German Weather Service; the Community Land Model, CLM of NCAR; and the variably saturated surface-subsurface flow code ParFlow. The model system relies on the Multiple Program Multiple Data (MPMD) execution model where the external Ocean-Atmosphere-Sea-Ice-Soil coupler (OASIS3) links the component models. TerrSysMP has been instrumented with the performance analysis tool Scalasca and analyzed on JUQUEEN with processor counts on the order of 10,000. The instrumentation is used in weak and strong scaling studies with real data cases and hypothetical idealized numerical experiments for detailed profiling and tracing analysis. The profiling is not only useful in identifying wait states that are due to the MPMD execution model, but also in fine-tuning resource allocation to the component models in search of the most suitable load balancing. This is especially necessary, as with numerical experiments that cover multiple (high resolution) spatial scales, the time stepping, coupling frequencies, and communication overheads are constantly shifting, which makes it necessary to re-determine the model setup with each new experimental design.
Teaching weather and climate science in primary schools - a pilot project from the UK Met Office
NASA Astrophysics Data System (ADS)
Orrell, Richard; Liggins, Felicity; Challenger, Lesley; Lethem, Dom; Campbell, Katy
2017-04-01
Wow Schools is a pilot project from the Met Office with an aim to inspire and educate the next generation of scientists and, uniquely, use the data collected by schools to improve weather forecasts and warnings across the UK. Wow Schools was launched in late 2015 with a competition open to primary schools across the UK. 74 schools entered the draw, all hoping to be picked as one of the ten lucky schools taking part in the pilot scheme. Each winning school received a fully automatic weather station (AWS), enabling them to transmit real-time local weather observations to the Met Office's Weather Observation Website (WOW - wow.metoffice.gov.uk), an award winning web portal for uploading and sharing a range of environmental observations. They were also given a package of materials designed to get students out of the classroom to observe the weather, get hands-on with the science underpinning weather forecasting, and analyse the data they are collecting. The curriculum-relevant materials were designed with the age group 7 to 11 in mind, but could be extended to support other age groups. Each school was offered a visit by a Wow Schools Ambassador (a Met Office employee) to bring the students' learning to life, and access to a dedicated forecast for its location generated by our new supercomputer. These forecasts are improved by the school's onsite AWS reinforcing the link between observations and forecast production. The Wow Schools pilot ran throughout 2016. Here, we present the initial findings of the project, examining the potential benefits and challenges of working with schools across the UK to: enrich students' understanding of the science of weather forecasting; to source an ongoing supply of weather observations and discover how these might be used in the forecasting process; and explore what materials and business model(s) would be most useful and affordable if a wider roll-out of the initiative was undertaken.
Green Supercomputing at Argonne
Beckman, Pete
2018-02-07
Pete Beckman, head of Argonne's Leadership Computing Facility (ALCF) talks about Argonne National Laboratory's green supercomputingâeverything from designing algorithms to use fewer kilowatts per operation to using cold Chicago winter air to cool the machine more efficiently. Argonne was recognized for green computing in the 2009 HPCwire Readers Choice Awards. More at http://www.anl.gov/Media_Center/News/2009/news091117.html Read more about the Argonne Leadership Computing Facility at http://www.alcf.anl.gov/
NASA Astrophysics Data System (ADS)
Buaria, D.; Yeung, P. K.
2017-12-01
A new parallel algorithm utilizing a partitioned global address space (PGAS) programming model to achieve high scalability is reported for particle tracking in direct numerical simulations of turbulent fluid flow. The work is motivated by the desire to obtain Lagrangian information necessary for the study of turbulent dispersion at the largest problem sizes feasible on current and next-generation multi-petaflop supercomputers. A large population of fluid particles is distributed among parallel processes dynamically, based on instantaneous particle positions such that all of the interpolation information needed for each particle is available either locally on its host process or neighboring processes holding adjacent sub-domains of the velocity field. With cubic splines as the preferred interpolation method, the new algorithm is designed to minimize the need for communication, by transferring between adjacent processes only those spline coefficients determined to be necessary for specific particles. This transfer is implemented very efficiently as a one-sided communication, using Co-Array Fortran (CAF) features which facilitate small data movements between different local partitions of a large global array. The cost of monitoring transfer of particle properties between adjacent processes for particles migrating across sub-domain boundaries is found to be small. Detailed benchmarks are obtained on the Cray petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign. For operations on the particles in a 81923 simulation (0.55 trillion grid points) on 262,144 Cray XE6 cores, the new algorithm is found to be orders of magnitude faster relative to a prior algorithm in which each particle is tracked by the same parallel process at all times. This large speedup reduces the additional cost of tracking of order 300 million particles to just over 50% of the cost of computing the Eulerian velocity field at this scale. Improving support of PGAS models on major compilers suggests that this algorithm will be of wider applicability on most upcoming supercomputers.
Efficient development of memory bounded geo-applications to scale on modern supercomputers
NASA Astrophysics Data System (ADS)
Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric
2016-04-01
Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases, near peak memory bandwidth transfer is achieved. Our approach allows us to get the best out of the current hardware.
A unified approach to VLSI layout automation and algorithm mapping on processor arrays
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Pattabiraman, S.; Srinivasan, Vinoo N.
1993-01-01
Development of software tools for designing supercomputing systems is highly complex and cost ineffective. To tackle this a special purpose PAcube silicon compiler which integrates different design levels from cell to processor arrays has been proposed. As a part of this, we present in this paper a novel methodology which unifies the problems of Layout Automation and Algorithm Mapping.
RELIABILITY, AVAILABILITY, AND SERVICEABILITY FOR PETASCALE HIGH-END COMPUTING AND BEYOND
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chokchai "Box" Leangsuksun
2011-05-31
Our project is a multi-institutional research effort that adopts interplay of RELIABILITY, AVAILABILITY, and SERVICEABILITY (RAS) aspects for solving resilience issues in highend scientific computing in the next generation of supercomputers. results lie in the following tracks: Failure prediction in a large scale HPC; Investigate reliability issues and mitigation techniques including in GPGPU-based HPC system; HPC resilience runtime & tools.
NASA Technical Reports Server (NTRS)
Shannon, Robert V., Jr.
1989-01-01
The model generation and structural analysis performed for the High Pressure Oxidizer Turbopump (HPOTP) preburner pump volute housing located on the main pump end of the HPOTP in the space shuttle main engine are summarized. An ANSYS finite element model of the volute housing was built and executed. A static structural analysis was performed on the Engineering Analysis and Data System (EADS) Cray-XMP supercomputer
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.
Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli
2016-09-10
The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
NASA Technical Reports Server (NTRS)
1986-01-01
Overview descriptions of on-line environmental data systems, supercomputer facilities, and networks are presented. Each description addresses the concepts of content, capability, and user access relevant to the point of view of potential utilization by the Earth and environmental science community. The information on similar systems or facilities is presented in parallel fashion to encourage and facilitate intercomparison. In addition, summary sheets are given for each description, and a summary table precedes each section.
A Long History of Supercomputing
Grider, Gary
2018-06-13
As part of its national security science mission, Los Alamos National Laboratory and HPC have a long, entwined history dating back to the earliest days of computing. From bringing the first problem to the nationâs first computer to building the first machine to break the petaflop barrier, Los Alamos holds many âfirstsâ in HPC breakthroughs. Today, supercomputers are integral to stockpile stewardship and the Laboratory continues to work with vendors in developing the future of HPC.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)
2002-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Science and Technology Review June 2000
DOE Office of Scientific and Technical Information (OSTI.GOV)
de Pruneda, J.H.
2000-06-01
This issue contains the following articles: (1) ''Accelerating on the ASCI Challenge''. (2) ''New Day Daws in Supercomputing'' When the ASCI White supercomputer comes online this summer, DOE's Stockpile Stewardship Program will make another significant advanced toward helping to ensure the safety, reliability, and performance of the nation's nuclear weapons. (3) ''Uncovering the Secrets of Actinides'' Researchers are obtaining fundamental information about the actinides, a group of elements with a key role in nuclear weapons and fuels. (4) ''A Predictable Structure for Aerogels''. (5) ''Tibet--Where Continents Collide''.
Heart Fibrillation and Parallel Supercomputers
NASA Technical Reports Server (NTRS)
Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.
1997-01-01
The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.
NASA Technical Reports Server (NTRS)
Guruswamy, Guru
2004-01-01
A procedure to accurately generate AIC using the Navier-Stokes solver including grid deformation is presented. Preliminary results show good comparisons between experiment and computed flutter boundaries for a rectangular wing. A full wing body configuration of an orbital space plane is selected for demonstration on a large number of processors. In the final paper the AIC of full wing body configuration will be computed. The scalability of the procedure on supercomputer will be demonstrated.
2017-12-08
Two rows of the “Discover” supercomputer at the NASA Center for Climate Simulation (NCCS) contain more than 4,000 computer processors. Discover has a total of nearly 15,000 processors. Credit: NASA/Pat Izzo To learn more about NCCS go to: www.nasa.gov/topics/earth/features/climate-sim-center.html NASA Goddard Space Flight Center is home to the nation's largest organization of combined scientists, engineers and technologists that build spacecraft, instruments and new technology to study the Earth, the sun, our solar system, and the universe.
2017-12-08
This close-up view highlights one row—approximately 2,000 computer processors—of the “Discover” supercomputer at the NASA Center for Climate Simulation (NCCS). Discover has a total of nearly 15,000 processors. Credit: NASA/Pat Izzo To learn more about NCCS go to: www.nasa.gov/topics/earth/features/climate-sim-center.html NASA Goddard Space Flight Center is home to the nation's largest organization of combined scientists, engineers and technologists that build spacecraft, instruments and new technology to study the Earth, the sun, our solar system, and the universe.
High-performance computing-based exploration of flow control with micro devices.
Fujii, Kozo
2014-08-13
The dielectric barrier discharge (DBD) plasma actuator that controls flow separation is one of the promising technologies to realize energy savings and noise reduction of fluid dynamic systems. However, the mechanism for controlling flow separation is not clearly defined, and this lack of knowledge prevents practical use of this technology. Therefore, large-scale computations for the study of the DBD plasma actuator have been conducted using the Japanese Petaflops supercomputer 'K' for three different Reynolds numbers. Numbers of new findings on the control of flow separation by the DBD plasma actuator have been obtained from the simulations, and some of them are presented in this study. Knowledge of suitable device parameters is also obtained. The DBD plasma actuator is clearly shown to be very effective for controlling flow separation at a Reynolds number of around 10(5), and several times larger lift-to-drag ratio can be achieved at higher angles of attack after stall. For higher Reynolds numbers, separated flow is partially controlled. Flow analysis shows key features towards better control. DBD plasma actuators are a promising technology, which could reduce fuel consumption and contribute to a green environment by achieving high aerodynamic performance. The knowledge described above can be obtained only with high-end computers such as the supercomputer 'K'. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
The Centre of High-Performance Scientific Computing, Geoverbund, ABC/J - Geosciences enabled by HPSC
NASA Astrophysics Data System (ADS)
Kollet, Stefan; Görgen, Klaus; Vereecken, Harry; Gasper, Fabian; Hendricks-Franssen, Harrie-Jan; Keune, Jessica; Kulkarni, Ketan; Kurtz, Wolfgang; Sharples, Wendy; Shrestha, Prabhakar; Simmer, Clemens; Sulis, Mauro; Vanderborght, Jan
2016-04-01
The Centre of High-Performance Scientific Computing (HPSC TerrSys) was founded 2011 to establish a centre of competence in high-performance scientific computing in terrestrial systems and the geosciences enabling fundamental and applied geoscientific research in the Geoverbund ABC/J (geoscientfic research alliance of the Universities of Aachen, Cologne, Bonn and the Research Centre Jülich, Germany). The specific goals of HPSC TerrSys are to achieve relevance at the national and international level in (i) the development and application of HPSC technologies in the geoscientific community; (ii) student education; (iii) HPSC services and support also to the wider geoscientific community; and in (iv) the industry and public sectors via e.g., useful applications and data products. A key feature of HPSC TerrSys is the Simulation Laboratory Terrestrial Systems, which is located at the Jülich Supercomputing Centre (JSC) and provides extensive capabilities with respect to porting, profiling, tuning and performance monitoring of geoscientific software in JSC's supercomputing environment. We will present a summary of success stories of HPSC applications including integrated terrestrial model development, parallel profiling and its application from watersheds to the continent; massively parallel data assimilation using physics-based models and ensemble methods; quasi-operational terrestrial water and energy monitoring; and convection permitting climate simulations over Europe. The success stories stress the need for a formalized education of students in the application of HPSC technologies in future.
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL.
Stone, John E; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-05-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications.
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL
Stone, John E.; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-01-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications. PMID:27747137
Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide
Tang, William; Wang, Bei; Ethier, Stephane; ...
2016-11-01
The goal of the extreme scale plasma turbulence studies described in this paper is to expedite the delivery of reliable predictions on confinement physics in large magnetic fusion systems by using world-class supercomputers to carry out simulations with unprecedented resolution and temporal duration. This has involved architecture-dependent optimizations of performance scaling and addressing code portability and energy issues, with the metrics for multi-platform comparisons being 'time-to-solution' and 'energy-to-solution'. Realistic results addressing how confinement losses caused by plasma turbulence scale from present-day devices to the much larger $25 billion international ITER fusion facility have been enabled by innovative advances in themore » GTC-P code including (i) implementation of one-sided communication from MPI 3.0 standard; (ii) creative optimization techniques on Xeon Phi processors; and (iii) development of a novel performance model for the key kernels of the PIC code. Our results show that modeling data movement is sufficient to predict performance on modern supercomputer platforms.« less
NASA Astrophysics Data System (ADS)
Landgrebe, Anton J.
1987-03-01
An overview of research activities at the United Technologies Research Center (UTRC) in the area of Computational Fluid Dynamics (CFD) is presented. The requirement and use of various levels of computers, including supercomputers, for the CFD activities is described. Examples of CFD directed toward applications to helicopters, turbomachinery, heat exchangers, and the National Aerospace Plane are included. Helicopter rotor codes for the prediction of rotor and fuselage flow fields and airloads were developed with emphasis on rotor wake modeling. Airflow and airload predictions and comparisons with experimental data are presented. Examples are presented of recent parabolized Navier-Stokes and full Navier-Stokes solutions for hypersonic shock-wave/boundary layer interaction, and hydrogen/air supersonic combustion. In addition, other examples of CFD efforts in turbomachinery Navier-Stokes methodology and separated flow modeling are presented. A brief discussion of the 3-tier scientific computing environment is also presented, in which the researcher has access to workstations, mid-size computers, and supercomputers.
NASA Technical Reports Server (NTRS)
Landgrebe, Anton J.
1987-01-01
An overview of research activities at the United Technologies Research Center (UTRC) in the area of Computational Fluid Dynamics (CFD) is presented. The requirement and use of various levels of computers, including supercomputers, for the CFD activities is described. Examples of CFD directed toward applications to helicopters, turbomachinery, heat exchangers, and the National Aerospace Plane are included. Helicopter rotor codes for the prediction of rotor and fuselage flow fields and airloads were developed with emphasis on rotor wake modeling. Airflow and airload predictions and comparisons with experimental data are presented. Examples are presented of recent parabolized Navier-Stokes and full Navier-Stokes solutions for hypersonic shock-wave/boundary layer interaction, and hydrogen/air supersonic combustion. In addition, other examples of CFD efforts in turbomachinery Navier-Stokes methodology and separated flow modeling are presented. A brief discussion of the 3-tier scientific computing environment is also presented, in which the researcher has access to workstations, mid-size computers, and supercomputers.
Antenna pattern control using impedance surfaces
NASA Technical Reports Server (NTRS)
Balanis, Constantine A.; Liu, Kefeng
1992-01-01
During this research period, we have effectively transferred existing computer codes from CRAY supercomputer to work station based systems. The work station based version of our code preserved the accuracy of the numerical computations while giving a much better turn-around time than the CRAY supercomputer. Such a task relieved us of the heavy dependence of the supercomputer account budget and made codes developed in this research project more feasible for applications. The analysis of pyramidal horns with impedance surfaces was our major focus during this research period. Three different modeling algorithms in analyzing lossy impedance surfaces were investigated and compared with measured data. Through this investigation, we discovered that a hybrid Fourier transform technique, which uses the eigen mode in the stepped waveguide section and the Fourier transformed field distributions across the stepped discontinuities for lossy impedances coating, gives a better accuracy in analyzing lossy coatings. After a further refinement of the present technique, we will perform an accurate radiation pattern synthesis in the coming reporting period.
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization
NASA Technical Reports Server (NTRS)
Jones, James Patton; Nitzberg, Bill
1999-01-01
The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FIFO first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet, utilization was affected little. In particular these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.
Data communication requirements for the advanced NAS network
NASA Technical Reports Server (NTRS)
Levin, Eugene; Eaton, C. K.; Young, Bruce
1986-01-01
The goal of the Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations, and by remote communications to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. In the 1987/1988 time period it is anticipated that a computer with 4 times the processing speed of a Cray 2 will be obtained and by 1990 an additional supercomputer with 16 times the speed of the Cray 2. The implications of this 20-fold increase in processing power on the data communications requirements are described. The analysis was based on models of the projected workload and system architecture. The results are presented together with the estimates of their sensitivity to assumptions inherent in the models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gill, D.H.
During the 1994 summer institute NTEP teachers worked in coordination with LANL and the Los Alamos Middle School and Mountain Elementary School to gain experience in communicating on-line, to gain further information from the Internet and in using electronic Bulletin Board Systems (BBSs) to exchange ideas with other teachers. To build on their telecommunications skills, NTEP teachers participated in the International Telecommunications In Education Conference (Tel*ED `94) at the Albuquerque Convention Center on November 11 & 12, 1994. They attended the multimedia keynote address, various workshops highlighting many aspects of educational telecommunications skills, and the Telecomm Rodeo sponsored by Losmore » Alamos National Laboratory. The Rodeo featured many presentations by Laboratory personnel and educational institutions on ways in which telecommunications technologies can be use din the classroom. Many were of the `hands-on` type, so that teachers were able to try out methods and equipment and evaluate their usefulness in their own schools and classrooms. Some of the presentations featured were the Geonet educational BBS system, the Supercomputing Challenge, and the Sunrise Project, all sponsored by LANL; the `CU-seeMe` live video software, various simulation software packages, networking help, and many other interesting and useful exhibits.« less
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.
Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias
2011-01-01
The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
NASA Astrophysics Data System (ADS)
Leutwyler, David; Fuhrer, Oliver; Cumming, Benjamin; Lapillonne, Xavier; Gysi, Tobias; Lüthi, Daniel; Osuna, Carlos; Schär, Christoph
2014-05-01
The representation of moist convection is a major shortcoming of current global and regional climate models. State-of-the-art global models usually operate at grid spacings of 10-300 km, and therefore cannot fully resolve the relevant upscale and downscale energy cascades. Therefore parametrization of the relevant sub-grid scale processes is required. Several studies have shown that this approach entails major uncertainties for precipitation processes, which raises concerns about the model's ability to represent precipitation statistics and associated feedback processes, as well as their sensitivities to large-scale conditions. Further refining the model resolution to the kilometer scale allows representing these processes much closer to first principles and thus should yield an improved representation of the water cycle including the drivers of extreme events. Although cloud-resolving simulations are very useful tools for climate simulations and numerical weather prediction, their high horizontal resolution and consequently the small time steps needed, challenge current supercomputers to model large domains and long time scales. The recent innovations in the domain of hybrid supercomputers have led to mixed node designs with a conventional CPU and an accelerator such as a graphics processing unit (GPU). GPUs relax the necessity for cache coherency and complex memory hierarchies, but have a larger system memory-bandwidth. This is highly beneficial for low compute intensity codes such as atmospheric stencil-based models. However, to efficiently exploit these hybrid architectures, climate models need to be ported and/or redesigned. Within the framework of the Swiss High Performance High Productivity Computing initiative (HP2C) a project to port the COSMO model to hybrid architectures has recently come to and end. The product of these efforts is a version of COSMO with an improved performance on traditional x86-based clusters as well as hybrid architectures with GPUs. We present our redesign and porting approach as well as our experience and lessons learned. Furthermore, we discuss relevant performance benchmarks obtained on the new hybrid Cray XC30 system "Piz Daint" installed at the Swiss National Supercomputing Centre (CSCS), both in terms of time-to-solution as well as energy consumption. We will demonstrate a first set of short cloud-resolving climate simulations at the European-scale using the GPU-enabled COSMO prototype and elaborate our future plans on how to exploit this new model capability.
2009-01-01
University of California, Berkeley. In this session, Dennis Gannon of Indiana University described the use of high performance computing for storm...Software Development (Session Introduction) Dennis Gannon Indiana University Software for Mesoscale Storm Prediction: Using Supercomputers for On...Ho, D. Ierardi, I. Kolossvary, J. Klepeis, T. Layman, C. McLeavey , M. Moraes, R. Mueller, E. Priest, Y. Shan, J. Spengler, M. Theobald, B. Towles
Implementation of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Churchill, R. Michael
Apache Spark is explored as a tool for analyzing large data sets from the magnetic fusion simulation code XGCI. Implementation details of Apache Spark on the NERSC Edison supercomputer are discussed, including binary file reading, and parameter setup. Here, an unsupervised machine learning algorithm, k-means clustering, is applied to XGCI particle distribution function data, showing that highly turbulent spatial regions do not have common coherent structures, but rather broad, ring-like structures in velocity space.
Multibillion-atom Molecular Dynamics Simulations of Plasticity, Spall, and Ejecta
NASA Astrophysics Data System (ADS)
Germann, Timothy C.
2007-06-01
Modern supercomputing platforms, such as the IBM BlueGene/L at Lawrence Livermore National Laboratory and the Roadrunner hybrid supercomputer being built at Los Alamos National Laboratory, are enabling large-scale classical molecular dynamics simulations of phenomena that were unthinkable just a few years ago. Using either the embedded atom method (EAM) description of simple (close-packed) metals, or modified EAM (MEAM) models of more complex solids and alloys with mixed covalent and metallic character, simulations containing billions to trillions of atoms are now practical, reaching volumes in excess of a cubic micron. In order to obtain any new physical insights, however, it is equally important that the analysis of such systems be tractable. This is in fact possible, in large part due to our highly efficient parallel visualization code, which enables the rendering of atomic spheres, Eulerian cells, and other geometric objects in a matter of minutes, even for tens of thousands of processors and billions of atoms. After briefly describing the BlueGene/L and Roadrunner architectures, and the code optimization strategies that were employed, results obtained thus far on BlueGene/L will be reviewed, including: (1) shock compression and release of a defective EAM Cu sample, illustrating the plastic deformation accompanying void collapse as well as the subsequent void growth and linkup upon release; (2) solid-solid martensitic phase transition in shock-compressed MEAM Ga; and (3) Rayleigh-Taylor fluid instability modeled using large-scale direct simulation Monte Carlo (DSMC) simulations. I will also describe our initial experiences utilizing Cell Broadband Engine processors (developed for the Sony PlayStation 3), and planned simulation studies of ejecta and spall failure in polycrystalline metals that will be carried out when the full Petaflop Opteron/Cell Roadrunner supercomputer is assembled in mid-2008.
Solving global shallow water equations on heterogeneous supercomputers
Fu, Haohuan; Gan, Lin; Yang, Chao; Xue, Wei; Wang, Lanning; Wang, Xinliang; Huang, Xiaomeng; Yang, Guangwen
2017-01-01
The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved. PMID:28282428
Multi-threaded ATLAS simulation on Intel Knights Landing processors
NASA Astrophysics Data System (ADS)
Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration
2017-10-01
The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
A novel VLSI processor architecture for supercomputing arrays
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.
1993-01-01
Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
Adapting NBODY4 with a GRAPE-6a Supercomputer for Web Access, Using NBodyLab
NASA Astrophysics Data System (ADS)
Johnson, V.; Aarseth, S.
2006-07-01
A demonstration site has been developed by the authors that enables researchers and students to experiment with the capabilities and performance of NBODY4 running on a GRAPE-6a over the web. NBODY4 is a sophisticated open-source N-body code for high accuracy simulations of dense stellar systems (Aarseth 2003). In 2004, NBODY4 was successfully tested with a GRAPE-6a, yielding an unprecedented low-cost tool for astrophysical research. The GRAPE-6a is a supercomputer card developed by astrophysicists to accelerate high accuracy N-body simulations with a cluster or a desktop PC (Fukushige et al. 2005, Makino & Taiji 1998). The GRAPE-6a card became commercially available in 2004, runs at 125 Gflops peak, has a standard PCI interface and costs less than 10,000. Researchers running the widely used NBODY6 (which does not require GRAPE hardware) can compare their own PC or laptop performance with simulations run on http://www.NbodyLab.org. Such comparisons may help justify acquisition of a GRAPE-6a. For workgroups such as university physics or astronomy departments, the demonstration site may be replicated or serve as a model for a shared computing resource. The site was constructed using an NBodyLab server-side framework.
NASA Technical Reports Server (NTRS)
Shen, B.-W.; Atlas, R.; Reale, O.; Lin, S.-J.; Chern, J.-D.; Chang, J.; Henze, C.
2006-01-01
Hurricane Katrina was the sixth most intense hurricane in the Atlantic. Katrina's forecast poses major challenges, the most important of which is its rapid intensification. Hurricane intensity forecast with General Circulation Models (GCMs) is difficult because of their coarse resolution. In this article, six 5-day simulations with the ultra-high resolution finite-volume GCM are conducted on the NASA Columbia supercomputer to show the effects of increased resolution on the intensity predictions of Katrina. It is found that the 0.125 degree runs give comparable tracks to the 0.25 degree, but provide better intensity forecasts, bringing the center pressure much closer to observations with differences of only plus or minus 12 hPa. In the runs initialized at 1200 UTC 25 AUG, the 0.125 degree simulates a more realistic intensification rate and better near-eye wind distributions. Moreover, the first global 0.125 degree simulation without convection parameterization (CP) produces even better intensity evolution and near-eye winds than the control run with CP.
TOP500 Sublist for November 2001
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack J.
2001-11-09
18th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, GERMANY; KNOXVILLE, TENN.; BERKELEY, CALIF. In what has become a much-anticipated event in the world of high-performance computing, the 18th edition of the TOP500 list of the world's fastest supercomputers was released today (November 9, 2001). The latest edition of the twice-yearly ranking finds IBM as the leader in the field, with 32 percent in terms of installed systems and 37 percent in terms of total performance of all the installed systems. In a surprise move Hewlett-Packard captured the second place with 30 percent of the systems. Most ofmore » these systems are smaller in size and as a consequence HP's share of installed performance is smaller with 15 percent. This is still enough for second place in this category. SGI, Cray and Sun follow in the number of TOP500 systems with 41 (8 percent), 39 (8 percent), and 31 (6 percent) respectively. In the category of installed performance Cray Inc. keeps the third position with 11 percent ahead of SGI (8 percent) and Compaq (8 percent).« less
SAMSA2: a standalone metatranscriptome analysis pipeline.
Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G
2018-05-21
Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.
Freud: a software suite for high-throughput simulation analysis
NASA Astrophysics Data System (ADS)
Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon
Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.
Supercomputer analysis of sedimentary basins.
Bethke, C M; Altaner, S P; Harrison, W J; Upson, C
1988-01-15
Geological processes of fluid transport and chemical reaction in sedimentary basins have formed many of the earth's energy and mineral resources. These processes can be analyzed on natural time and distance scales with the use of supercomputers. Numerical experiments are presented that give insights to the factors controlling subsurface pressures, temperatures, and reactions; the origin of ores; and the distribution and quality of hydrocarbon reservoirs. The results show that numerical analysis combined with stratigraphic, sea level, and plate tectonic histories provides a powerful tool for studying the evolution of sedimentary basins over geologic time.
2017-12-08
The heart of the NASA Center for Climate Simulation (NCCS) is the “Discover” supercomputer. In 2009, NCCS added more than 8,000 computer processors to Discover, for a total of nearly 15,000 processors. Credit: NASA/Pat Izzo To learn more about NCCS go to: www.nasa.gov/topics/earth/features/climate-sim-center.html NASA Goddard Space Flight Center is home to the nation's largest organization of combined scientists, engineers and technologists that build spacecraft, instruments and new technology to study the Earth, the sun, our solar system, and the universe.
2017-12-08
The heart of the NASA Center for Climate Simulation (NCCS) is the “Discover” supercomputer. In 2009, NCCS added more than 8,000 computer processors to Discover, for a total of nearly 15,000 processors. Credit: NASA/Pat Izzo To learn more about NCCS go to: www.nasa.gov/topics/earth/features/climate-sim-center.html NASA Goddard Space Flight Center is home to the nation's largest organization of combined scientists, engineers and technologists that build spacecraft, instruments and new technology to study the Earth, the sun, our solar system, and the universe.
2017-12-08
The heart of the NASA Center for Climate Simulation (NCCS) is the “Discover” supercomputer. In 2009, NCCS added more than 8,000 computer processors to Discover, for a total of nearly 15,000 processors. Credit: NASA/Pat Izzo To learn more about NCCS go to: www.nasa.gov/topics/earth/features/climate-sim-center.html NASA Goddard Space Flight Center is home to the nation's largest organization of combined scientists, engineers and technologists that build spacecraft, instruments and new technology to study the Earth, the sun, our solar system, and the universe.
Development of the general interpolants method for the CYBER 200 series of supercomputers
NASA Technical Reports Server (NTRS)
Stalnaker, J. F.; Robinson, M. A.; Spradley, L. W.; Kurzius, S. C.; Thoenes, J.
1988-01-01
The General Interpolants Method (GIM) is a 3-D, time-dependent, hybrid procedure for generating numerical analogs of the conservation laws. This study is directed toward the development and application of the GIM computer code for fluid dynamic research applications as implemented for the Cyber 200 series of supercomputers. An elliptic and quasi-parabolic version of the GIM code are discussed. Turbulence models, algebraic and differential equations, were added to the basic viscous code. An equilibrium reacting chemistry model and an implicit finite difference scheme are also included.
NASA Technical Reports Server (NTRS)
Nosenchuck, D. M.; Littman, M. G.
1986-01-01
The Navier-Stokes computer (NSC) has been developed for solving problems in fluid mechanics involving complex flow simulations that require more speed and capacity than provided by current and proposed Class VI supercomputers. The machine is a parallel processing supercomputer with several new architectural elements which can be programmed to address a wide range of problems meeting the following criteria: (1) the problem is numerically intensive, and (2) the code makes use of long vectors. A simulation of two-dimensional nonsteady viscous flows is presented to illustrate the architecture, programming, and some of the capabilities of the NSC.
Visualizing staggered fields and analyzing electromagnetic data with PerceptEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shasharina, Svetlana
This project resulted in VSimSP: a software for simulating large photonic devices of high-performance computers. It includes: GUI for Photonics Simulations; High-Performance Meshing Algorithm; 2d Order Multimaterials Algorithm; Mode Solver for Waveguides; 2d Order Material Dispersion Algorithm; S Parameters Calculation; High-Performance Workflow at NERSC ; and Large Photonic Devices Simulation Setups We believe we became the only company in the world which can simulate large photonics devices in 3D on modern supercomputers without the need to split them into subparts or do low-fidelity modeling. We started commercial engagement with a manufacturing company.
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.
1991-01-01
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Integration, Development and Performance of the 500 TFLOPS Heterogeneous Cluster (Condor)
2012-08-01
PlayStation 3 for High Performance Cluster Computing” LAPACK Working Note 185, 2007. [ 4 ] Feng, W., X. Feng, and R. Ge, “Green Supercomputing Comes of...CONFERENCE PAPER (Post Print) 3. DATES COVERED (From - To) JUN 2010 – MAY 2013 4 . TITLE AND SUBTITLE INTEGRATION, DEVELOPMENT AND PERFORMANCE OF...and streaming processing; the PlayStation 3 uses the IBM Cell BE processor, which adopts the multi-processor, single-instruction-multiple- data (SIMD
ATOM - Accelerating therapeutics through opportunities in medicine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mcmahon, Benjamin Hamilton; Dotson, Paul Jeffrey
Create a new paradigm of drug discovery that would reduce the time from an identified drug target to clinical candidate from the current ~6 years to just 12 months. ATOM will develop, test, and validate a multidisciplinary approach to drug discovery in which modern science, technology and engineering, supercomputing, simulations, data science, and artificial intelligence are highly integrated into a single drug-discovery platform that can ultimately be shared with the drug development community at-large.
AHPCRC (Army High Performance Computing Research Center) Bulletin. Volume 1, Issue 2
2011-01-01
area and the researchers working on these projects. Also inside: news from the AHPCRC consortium partners at Morgan State University and the NASA ...Computing Research Center is provided by the supercomputing and research facilities at Stanford University and at the NASA Ames Research Center at...atomic and molecular level, he said. He noted that “every general would like to have” a Star Trek -like holodeck, where holographic avatars could
RRTMGP: A High-Performance Broadband Radiation Code for the Next Decade
2015-09-30
NOAA ), Robin Hogan (ECMWF), a number of colleagues at the Max-Planck Institute, and Will Sawyer and Marcus Wetzstein (Swiss Supercomputer Center...somewhat out of date, so that the accuracy of our simplified algorithms can not be thoroughly evaluated. RRTMGP_LW_v0 has been provided to our NASA ...support, RRTMGP_LW_v0, has been completed and distributed to selected colleagues at modeling centers, including NOAA , NCAR, and CSCS. Our colleagues
Jiang, Wei; Luo, Yun; Maragliano, Luca; Roux, Benoît
2012-11-13
An extremely scalable computational strategy is described for calculations of the potential of mean force (PMF) in multidimensions on massively distributed supercomputers. The approach involves coupling thousands of umbrella sampling (US) simulation windows distributed to cover the space of order parameters with a Hamiltonian molecular dynamics replica-exchange (H-REMD) algorithm to enhance the sampling of each simulation. In the present application, US/H-REMD is carried out in a two-dimensional (2D) space and exchanges are attempted alternatively along the two axes corresponding to the two order parameters. The US/H-REMD strategy is implemented on the basis of parallel/parallel multiple copy protocol at the MPI level, and therefore can fully exploit computing power of large-scale supercomputers. Here the novel technique is illustrated using the leadership supercomputer IBM Blue Gene/P with an application to a typical biomolecular calculation of general interest, namely the binding of calcium ions to the small protein Calbindin D9k. The free energy landscape associated with two order parameters, the distance between the ion and its binding pocket and the root-mean-square deviation (rmsd) of the binding pocket relative the crystal structure, was calculated using the US/H-REMD method. The results are then used to estimate the absolute binding free energy of calcium ion to Calbindin D9k. The tests demonstrate that the 2D US/H-REMD scheme greatly accelerates the configurational sampling of the binding pocket, thereby improving the convergence of the potential of mean force calculation.
Calibrating Building Energy Models Using Supercomputer Trained Machine Learning Agents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanyal, Jibonananda; New, Joshua Ryan; Edwards, Richard
2014-01-01
Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrofit purposes. EnergyPlus is the flagship Department of Energy software that performs BEM for different types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manually by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building energy modeling unfeasible for smaller projects. In this paper, we describe the Autotune research which employs machine learning algorithms to generate agents for the different kinds of standardmore » reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of EnergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-effective calibration of building models.« less
Two-dimensional nonsteady viscous flow simulation on the Navier-Stokes computer miniNode
NASA Technical Reports Server (NTRS)
Nosenchuck, Daniel M.; Littman, Michael G.; Flannery, William
1986-01-01
The needs of large-scale scientific computation are outpacing the growth in performance of mainframe supercomputers. In particular, problems in fluid mechanics involving complex flow simulations require far more speed and capacity than that provided by current and proposed Class VI supercomputers. To address this concern, the Navier-Stokes Computer (NSC) was developed. The NSC is a parallel-processing machine, comprised of individual Nodes, each comparable in performance to current supercomputers. The global architecture is that of a hypercube, and a 128-Node NSC has been designed. New architectural features, such as a reconfigurable many-function ALU pipeline and a multifunction memory-ALU switch, have provided the capability to efficiently implement a wide range of algorithms. Efficient algorithms typically involve numerically intensive tasks, which often include conditional operations. These operations may be efficiently implemented on the NSC without, in general, sacrificing vector-processing speed. To illustrate the architecture, programming, and several of the capabilities of the NSC, the simulation of two-dimensional, nonsteady viscous flows on a prototype Node, called the miniNode, is presented.
Long-Term file activity patterns in a UNIX workstation environment
NASA Technical Reports Server (NTRS)
Gibson, Timothy J.; Miller, Ethan L.
1998-01-01
As mass storage technology becomes more affordable for sites smaller than supercomputer centers, understanding their file access patterns becomes crucial for developing systems to store rarely used data on tertiary storage devices such as tapes and optical disks. This paper presents a new way to collect and analyze file system statistics for UNIX-based file systems. The collection system runs in user-space and requires no modification of the operating system kernel. The statistics package provides details about file system operations at the file level: creations, deletions, modifications, etc. The paper analyzes four months of file system activity on a university file system. The results confirm previously published results gathered from supercomputer file systems, but differ in several important areas. Files in this study were considerably smaller than those at supercomputer centers, and they were accessed less frequently. Additionally, the long-term creation rate on workstation file systems is sufficiently low so that all data more than a day old could be cheaply saved on a mass storage device, allowing the integration of time travel into every file system.
Simulating functional magnetic materials on supercomputers.
Gruner, Markus Ernst; Entel, Peter
2009-07-22
The recent passing of the petaflop per second landmark by the Roadrunner project at the Los Alamos National Laboratory marks a preliminary peak of an impressive world-wide development in the high-performance scientific computing sector. Also, purely academic state-of-the-art supercomputers such as the IBM Blue Gene/P at Forschungszentrum Jülich allow us nowadays to investigate large systems of the order of 10(3) spin polarized transition metal atoms by means of density functional theory. Three applications will be presented where large-scale ab initio calculations contribute to the understanding of key properties emerging from a close interrelation between structure and magnetism. The first two examples discuss the size dependent evolution of equilibrium structural motifs in elementary iron and binary Fe-Pt and Co-Pt transition metal nanoparticles, which are currently discussed as promising candidates for ultra-high-density magnetic data storage media. However, the preference for multiply twinned morphologies at smaller cluster sizes counteracts the formation of a single-crystalline L1(0) phase, which alone provides the required hard magnetic properties. The third application is concerned with the magnetic shape memory effect in the Ni-Mn-Ga Heusler alloy, which is a technologically relevant candidate for magnetomechanical actuators and sensors. In this material strains of up to 10% can be induced by external magnetic fields due to the field induced shifting of martensitic twin boundaries, requiring an extremely high mobility of the martensitic twin boundaries, but also the selection of the appropriate martensitic structure from the rich phase diagram.
Designing a connectionist network supercomputer.
Asanović, K; Beck, J; Feldman, J; Morgan, N; Wawrzynek, J
1993-12-01
This paper describes an effort at UC Berkeley and the International Computer Science Institute to develop a supercomputer for artificial neural network applications. Our perspective has been strongly influenced by earlier experiences with the construction and use of a simpler machine. In particular, we have observed Amdahl's Law in action in our designs and those of others. These observations inspire attention to many factors beyond fast multiply-accumulate arithmetic. We describe a number of these factors along with rough expressions for their influence and then give the applications targets, machine goals and the system architecture for the machine we are currently designing.
Building black holes: supercomputer cinema.
Shapiro, S L; Teukolsky, S A
1988-07-22
A new computer code can solve Einstein's equations of general relativity for the dynamical evolution of a relativistic star cluster. The cluster may contain a large number of stars that move in a strong gravitational field at speeds approaching the speed of light. Unstable star clusters undergo catastrophic collapse to black holes. The collapse of an unstable cluster to a supermassive black hole at the center of a galaxy may explain the origin of quasars and active galactic nuclei. By means of a supercomputer simulation and color graphics, the whole process can be viewed in real time on a movie screen.
Supercomputer analysis of purine and pyrimidine metabolism leading to DNA synthesis.
Heinmets, F
1989-06-01
A model-system is established to analyze purine and pyrimidine metabolism leading to DNA synthesis. The principal aim is to explore the flow and regulation of terminal deoxynucleoside triophosphates (dNTPs) in various input and parametric conditions. A series of flow equations are established, which are subsequently converted to differential equations. These are programmed (Fortran) and analyzed on a Cray chi-MP/48 supercomputer. The pool concentrations are presented as a function of time in conditions in which various pertinent parameters of the system are modified. The system is formulated by 100 differential equations.
Performance of the Widely-Used CFD Code OVERFLOW on the Pleides Supercomputer
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.
2017-01-01
Computational performance studies were made for NASA's widely used Computational Fluid Dynamics code OVERFLOW on the Pleiades Supercomputer. Two test cases were considered: a full launch vehicle with a grid of 286 million points and a full rotorcraft model with a grid of 614 million points. Computations using up to 8000 cores were run on Sandy Bridge and Ivy Bridge nodes. Performance was monitored using times reported in the day files from the Portable Batch System utility. Results for two grid topologies are presented and compared in detail. Observations and suggestions for future work are made.
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks
NASA Technical Reports Server (NTRS)
Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;
2006-01-01
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Tumeo, Antonino; Secchi, Simone
Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
Design of a neural network simulator on a transputer array
NASA Technical Reports Server (NTRS)
Mcintire, Gary; Villarreal, James; Baffes, Paul; Rua, Monica
1987-01-01
A brief summary of neural networks is presented which concentrates on the design constraints imposed. Major design issues are discussed together with analysis methods and the chosen solutions. Although the system will be capable of running on most transputer architectures, it currently is being implemented on a 40-transputer system connected to a toroidal architecture. Predictions show a performance level equivalent to that of a highly optimized simulator running on the SX-2 supercomputer.
NASA Astrophysics Data System (ADS)
Yan, Beichuan; Regueiro, Richard A.
2018-02-01
A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Hahne, Jan; Helias, Moritz; Kunkel, Susanne; Igarashi, Jun; Bolten, Matthias; Frommer, Andreas; Diesmann, Markus
2015-01-01
Contemporary simulators for networks of point and few-compartment model neurons come with a plethora of ready-to-use neuron and synapse models and support complex network topologies. Recent technological advancements have broadened the spectrum of application further to the efficient simulation of brain-scale networks on supercomputers. In distributed network simulations the amount of spike data that accrues per millisecond and process is typically low, such that a common optimization strategy is to communicate spikes at relatively long intervals, where the upper limit is given by the shortest synaptic transmission delay in the network. This approach is well-suited for simulations that employ only chemical synapses but it has so far impeded the incorporation of gap-junction models, which require instantaneous neuronal interactions. Here, we present a numerical algorithm based on a waveform-relaxation technique which allows for network simulations with gap junctions in a way that is compatible with the delayed communication strategy. Using a reference implementation in the NEST simulator, we demonstrate that the algorithm and the required data structures can be smoothly integrated with existing code such that they complement the infrastructure for spiking connections. To show that the unified framework for gap-junction and spiking interactions achieves high performance and delivers high accuracy in the presence of gap junctions, we present benchmarks for workstations, clusters, and supercomputers. Finally, we discuss limitations of the novel technology.
RISC Processors and High Performance Computing
NASA Technical Reports Server (NTRS)
Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)
1995-01-01
In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
Hahne, Jan; Helias, Moritz; Kunkel, Susanne; Igarashi, Jun; Bolten, Matthias; Frommer, Andreas; Diesmann, Markus
2015-01-01
Contemporary simulators for networks of point and few-compartment model neurons come with a plethora of ready-to-use neuron and synapse models and support complex network topologies. Recent technological advancements have broadened the spectrum of application further to the efficient simulation of brain-scale networks on supercomputers. In distributed network simulations the amount of spike data that accrues per millisecond and process is typically low, such that a common optimization strategy is to communicate spikes at relatively long intervals, where the upper limit is given by the shortest synaptic transmission delay in the network. This approach is well-suited for simulations that employ only chemical synapses but it has so far impeded the incorporation of gap-junction models, which require instantaneous neuronal interactions. Here, we present a numerical algorithm based on a waveform-relaxation technique which allows for network simulations with gap junctions in a way that is compatible with the delayed communication strategy. Using a reference implementation in the NEST simulator, we demonstrate that the algorithm and the required data structures can be smoothly integrated with existing code such that they complement the infrastructure for spiking connections. To show that the unified framework for gap-junction and spiking interactions achieves high performance and delivers high accuracy in the presence of gap junctions, we present benchmarks for workstations, clusters, and supercomputers. Finally, we discuss limitations of the novel technology. PMID:26441628
Extensible Computational Chemistry Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
2012-08-09
ECCE provides a sophisticated graphical user interface, scientific visualization tools, and the underlying data management framework enabling scientists to efficiently set up calculations and store, retrieve, and analyze the rapidly growing volumes of data produced by computational chemistry studies. ECCE was conceived as part of the Environmental Molecular Sciences Laboratory construction to solve the problem of researchers being able to effectively utilize complex computational chemistry codes and massively parallel high performance compute resources. Bringing the power of these codes and resources to the desktops of researcher and thus enabling world class research without users needing a detailed understanding of themore » inner workings of either the theoretical codes or the supercomputers needed to run them was a grand challenge problem in the original version of the EMSL. ECCE allows collaboration among researchers using a web-based data repository where the inputs and results for all calculations done within ECCE are organized. ECCE is a first of kind end-to-end problem solving environment for all phases of computational chemistry research: setting up calculations with sophisticated GUI and direct manipulation visualization tools, submitting and monitoring calculations on remote high performance supercomputers without having to be familiar with the details of using these compute resources, and performing results visualization and analysis including creating publication quality images. ECCE is a suite of tightly integrated applications that are employed as the user moves through the modeling process.« less
NASA Astrophysics Data System (ADS)
Herrera, I.; Herrera, G. S.
2015-12-01
Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
Spatiotemporal modeling of node temperatures in supercomputers
Storlie, Curtis Byron; Reich, Brian James; Rust, William Newton; ...
2016-06-10
Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to investigate the effect that changes to the cooling system in a machine room had on three large machines that were housed there. Coupled with this goal was the aim to develop a general good-practice for characterizing the effect of cooling changes and monitoring machine node temperatures in this andmore » other machine rooms. This paper focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1,600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial effects on the node temperatures as the cooling changes take place. This model is then used to assess the condition of the node temperatures after each change to the room. The analysis approach was used to uncover the cause of a problematic episode of overheating nodes on one of the supercomputing clusters. Lastly, this same approach can easily be applied to monitor and investigate cooling systems at other data centers, as well.« less
Integration of PanDA workload management system with Titan supercomputer at OLCF
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.
2015-12-01
The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently distributes jobs to more than 100,000 cores at well over 100 Grid sites, the future LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). The current approach utilizes a modified PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multicore worker nodes. It also gives PanDA new capability to collect, in real time, information about unused worker nodes on Titan, which allows precise definition of the size and duration of jobs submitted to Titan according to available free resources. This capability significantly reduces PanDA job wait time while improving Titan's utilization efficiency. This implementation was tested with a variety of Monte-Carlo workloads on Titan and is being tested on several other supercomputing platforms. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Production experience with the ATLAS Event Service
NASA Astrophysics Data System (ADS)
Benjamin, D.; Calafiura, P.; Childers, T.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The ATLAS Event Service (AES) has been designed and implemented for efficient running of ATLAS production workflows on a variety of computing platforms, ranging from conventional Grid sites to opportunistic, often short-lived resources, such as spot market commercial clouds, supercomputers and volunteer computing. The Event Service architecture allows real time delivery of fine grained workloads to running payload applications which process dispatched events or event ranges and immediately stream the outputs to highly scalable Object Stores. Thanks to its agile and flexible architecture the AES is currently being used by grid sites for assigning low priority workloads to otherwise idle computing resources; similarly harvesting HPC resources in an efficient back-fill mode; and massively scaling out to the 50-100k concurrent core level on the Amazon spot market to efficiently utilize those transient resources for peak production needs. Platform ports in development include ATLAS@Home (BOINC) and the Google Compute Engine, and a growing number of HPC platforms. After briefly reviewing the concept and the architecture of the Event Service, we will report the status and experience gained in AES commissioning and production operations on supercomputers, and our plans for extending ES application beyond Geant4 simulation to other workflows, such as reconstruction and data analysis.
Basu, Protonu; Williams, Samuel; Van Straalen, Brian; ...
2017-04-05
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Basu, Protonu; Williams, Samuel; Van Straalen, Brian
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
PFLOTRAN: Reactive Flow & Transport Code for Use on Laptops to Leadership-Class Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hammond, Glenn E.; Lichtner, Peter C.; Lu, Chuan
PFLOTRAN, a next-generation reactive flow and transport code for modeling subsurface processes, has been designed from the ground up to run efficiently on machines ranging from leadership-class supercomputers to laptops. Based on an object-oriented design, the code is easily extensible to incorporate additional processes. It can interface seamlessly with Fortran 9X, C and C++ codes. Domain decomposition parallelism is employed, with the PETSc parallel framework used to manage parallel solvers, data structures and communication. Features of the code include a modular input file, implementation of high-performance I/O using parallel HDF5, ability to perform multiple realization simulations with multiple processors permore » realization in a seamless manner, and multiple modes for multiphase flow and multicomponent geochemical transport. Chemical reactions currently implemented in the code include homogeneous aqueous complexing reactions and heterogeneous mineral precipitation/dissolution, ion exchange, surface complexation and a multirate kinetic sorption model. PFLOTRAN has demonstrated petascale performance using 2{sup 17} processor cores with over 2 billion degrees of freedom. Accomplishments achieved to date include applications to the Hanford 300 Area and modeling CO{sub 2} sequestration in deep geologic formations.« less
Performance Assessment Institute-NV
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lombardo, Joesph
2012-12-31
The National Supercomputing Center for Energy and the Environment’s intention is to purchase a multi-purpose computer cluster in support of the Performance Assessment Institute (PA Institute). The PA Institute will serve as a research consortium located in Las Vegas Nevada with membership that includes: national laboratories, universities, industry partners, and domestic and international governments. This center will provide a one-of-a-kind centralized facility for the accumulation of information for use by Institutions of Higher Learning, the U.S. Government, and Regulatory Agencies and approved users. This initiative will enhance and extend High Performance Computing (HPC) resources in Nevada to support critical nationalmore » and international needs in "scientific confirmation". The PA Institute will be promoted as the leading Modeling, Learning and Research Center worldwide. The program proposes to utilize the existing supercomputing capabilities and alliances of the University of Nevada Las Vegas as a base, and to extend these resource and capabilities through a collaborative relationship with its membership. The PA Institute will provide an academic setting for interactive sharing, learning, mentoring and monitoring of multi-disciplinary performance assessment and performance confirmation information. The role of the PA Institute is to facilitate research, knowledge-increase, and knowledge-sharing among users.« less
An efficient framework for Java data processing systems in HPC environments
NASA Astrophysics Data System (ADS)
Fries, Aidan; Castañeda, Javier; Isasi, Yago; Taboada, Guillermo L.; Portell de Mora, Jordi; Sirvent, Raül
2011-11-01
Java is a commonly used programming language, although its use in High Performance Computing (HPC) remains relatively low. One of the reasons is a lack of libraries offering specific HPC functions to Java applications. In this paper we present a Java-based framework, called DpcbTools, designed to provide a set of functions that fill this gap. It includes a set of efficient data communication functions based on message-passing, thus providing, when a low latency network such as Myrinet is available, higher throughputs and lower latencies than standard solutions used by Java. DpcbTools also includes routines for the launching, monitoring and management of Java applications on several computing nodes by making use of JMX to communicate with remote Java VMs. The Gaia Data Processing and Analysis Consortium (DPAC) is a real case where scientific data from the ESA Gaia astrometric satellite will be entirely processed using Java. In this paper we describe the main elements of DPAC and its usage of the DpcbTools framework. We also assess the usefulness and performance of DpcbTools through its performance evaluation and the analysis of its impact on some DPAC systems deployed in the MareNostrum supercomputer (Barcelona Supercomputing Center).
Baum, K. G.; Menezes, G.; Helguera, M.
2011-01-01
Medical imaging system simulators are tools that provide a means to evaluate system architecture and create artificial image sets that are appropriate for specific applications. We have modified SIMRI, a Bloch equation-based magnetic resonance image simulator, in order to successfully generate high-resolution 3D MR images of the Montreal brain phantom using Blue Gene/L systems. Results show that redistribution of the workload allows an anatomically accurate 256 3 voxel spin-echo simulation in less than 5 hours when executed on an 8192-node partition of a Blue Gene/L system.
Baum, K G; Menezes, G; Helguera, M
2011-01-01
Medical imaging system simulators are tools that provide a means to evaluate system architecture and create artificial image sets that are appropriate for specific applications. We have modified SIMRI, a Bloch equation-based magnetic resonance image simulator, in order to successfully generate high-resolution 3D MR images of the Montreal brain phantom using Blue Gene/L systems. Results show that redistribution of the workload allows an anatomically accurate 256(3) voxel spin-echo simulation in less than 5 hours when executed on an 8192-node partition of a Blue Gene/L system.
High Efficiency Photonic Switch for Data Centers
DOE Office of Scientific and Technical Information (OSTI.GOV)
LaComb, Lloyd J.; Bablumyan, Arkady; Ordyan, Armen
2016-12-06
The worldwide demand for instant access to information is driving internet growth rates above 50% annually. This rapid growth is straining the resources and architectures of existing data centers, metro networks and high performance computer centers. If the current business as usual model continues, data centers alone will require 400TWhr of electricity by 2020. In order to meet the challenges of a faster and more cost effective data centers, metro networks and supercomputing facilities, we have demonstrated a new type of optical switch that will support transmissions speeds up to 1Tb/s, and requires significantly less energy per bit than
Monitoring Object Library Usage and Changes
NASA Technical Reports Server (NTRS)
Owen, R. K.; Craw, James M. (Technical Monitor)
1995-01-01
The NASA Ames Numerical Aerodynamic Simulation program Aeronautics Consolidated Supercomputing Facility (NAS/ACSF) supercomputing center services over 1600 users, and has numerous analysts with root access. Several tools have been developed to monitor object library usage and changes. Some of the tools do "noninvasive" monitoring and other tools implement run-time logging even for object-only libraries. The run-time logging identifies who, when, and what is being used. The benefits are that real usage can be measured, unused libraries can be discontinued, training and optimization efforts can be focused at those numerical methods that are actually used. An overview of the tools will be given and the results will be discussed.
Watson will see you now: a supercomputer to help clinicians make informed treatment decisions.
Doyle-Lindrud, Susan
2015-02-01
IBM has collaborated with several cancer care providers to develop and train the IBM supercomputer Watson to help clinicians make informed treatment decisions. When a patient is seen in clinic, the oncologist can input all of the clinical information into the computer system. Watson will then review all of the data and recommend treatment options based on the latest evidence and guidelines. Once the oncologist makes the treatment decision, this information can be sent directly to the insurance company for approval. Watson has the ability to standardize care and accelerate the approval process, a benefit to the healthcare provider and the patient.
The transition of a real-time single-rotor helicopter simulation program to a supercomputer
NASA Technical Reports Server (NTRS)
Martinez, Debbie
1995-01-01
This report presents the conversion effort and results of a real-time flight simulation application transition to a CONVEX supercomputer. Enclosed is a detailed description of the conversion process and a brief description of the Langley Research Center's (LaRC) flight simulation application program structure. Currently, this simulation program may be configured to represent Sikorsky S-61 helicopter (a five-blade, single-rotor, commercial passenger-type helicopter) or an Army Cobra helicopter (either the AH-1 G or AH-1 S model). This report refers to the Sikorsky S-61 simulation program since it is the most frequently used configuration.
Sequence search on a supercomputer.
Gotoh, O; Tagashira, Y
1986-01-10
A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.
Science & Technology Review November 2006
DOE Office of Scientific and Technical Information (OSTI.GOV)
Radousky, H
This months issue has the following articles: (1) Expanded Supercomputing Maximizes Scientific Discovery--Commentary by Dona Crawford; (2) Thunder's Power Delivers Breakthrough Science--Livermore's Thunder supercomputer allows researchers to model systems at scales never before possible. (3) Extracting Key Content from Images--A new system called the Image Content Engine is helping analysts find significant but hard-to-recognize details in overhead images. (4) Got Oxygen?--Oxygen, especially oxygen metabolism, was key to evolution, and a Livermore project helps find out why. (5) A Shocking New Form of Laserlike Light--According to research at Livermore, smashing a crystal with a shock wave can result in coherent light.
Optimal Full Information Synthesis for Flexible Structures Implemented on Cray Supercomputers
NASA Technical Reports Server (NTRS)
Lind, Rick; Balas, Gary J.
1995-01-01
This paper considers an algorithm for synthesis of optimal controllers for full information feedback. The synthesis procedure reduces to a single linear matrix inequality which may be solved via established convex optimization algorithms. The computational cost of the optimization is investigated. It is demonstrated the problem dimension and corresponding matrices can become large for practical engineering problems. This algorithm represents a process that is impractical for standard workstations for large order systems. A flexible structure is presented as a design example. Control synthesis requires several days on a workstation but may be solved in a reasonable amount of time using a Cray supercomputer.
SiGN-SSM: open source parallel software for estimating gene networks with state space models.
Tamada, Yoshinori; Yamaguchi, Rui; Imoto, Seiya; Hirose, Osamu; Yoshida, Ryo; Nagasaki, Masao; Miyano, Satoru
2011-04-15
SiGN-SSM is an open-source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using a supercomputer, it is able to determine the gene network structure by a statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to extracting the differentially regulated genes from time series expression profiles. SiGN-SSM is distributed under GNU Affero General Public Licence (GNU AGPL) version 3 and can be downloaded at http://sign.hgc.jp/signssm/. The pre-compiled binaries for some architectures are available in addition to the source code. The pre-installed binaries are also available on the Human Genome Center supercomputer system. The online manual and the supplementary information of SiGN-SSM is available on our web site. tamada@ims.u-tokyo.ac.jp.
Transferring ecosystem simulation codes to supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1995-01-01
Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moniz, Ernest; Carr, Alan; Bethe, Hans
The Trinity Test of July 16, 1945 was the first full-scale, real-world test of a nuclear weapon; with the new Trinity supercomputer Los Alamos National Laboratory's goal is to do this virtually, in 3D. Trinity was the culmination of a fantastic effort of groundbreaking science and engineering by hundreds of men and women at Los Alamos and other Manhattan Project sites. It took them less than two years to change the world. The Laboratory is marking the 70th anniversary of the Trinity Test because it not only ushered in the Nuclear Age, but with it the origin of today’s advancedmore » supercomputing. We live in the Age of Supercomputers due in large part to nuclear weapons science here at Los Alamos. National security science, and nuclear weapons science in particular, at Los Alamos National Laboratory have provided a key motivation for the evolution of large-scale scientific computing. Beginning with the Manhattan Project there has been a constant stream of increasingly significant, complex problems in nuclear weapons science whose timely solutions demand larger and faster computers. The relationship between national security science at Los Alamos and the evolution of computing is one of interdependence.« less
Moniz, Ernest; Carr, Alan; Bethe, Hans; Morrison, Phillip; Ramsay, Norman; Teller, Edward; Brixner, Berlyn; Archer, Bill; Agnew, Harold; Morrison, John
2018-01-16
The Trinity Test of July 16, 1945 was the first full-scale, real-world test of a nuclear weapon; with the new Trinity supercomputer Los Alamos National Laboratory's goal is to do this virtually, in 3D. Trinity was the culmination of a fantastic effort of groundbreaking science and engineering by hundreds of men and women at Los Alamos and other Manhattan Project sites. It took them less than two years to change the world. The Laboratory is marking the 70th anniversary of the Trinity Test because it not only ushered in the Nuclear Age, but with it the origin of todayâs advanced supercomputing. We live in the Age of Supercomputers due in large part to nuclear weapons science here at Los Alamos. National security science, and nuclear weapons science in particular, at Los Alamos National Laboratory have provided a key motivation for the evolution of large-scale scientific computing. Beginning with the Manhattan Project there has been a constant stream of increasingly significant, complex problems in nuclear weapons science whose timely solutions demand larger and faster computers. The relationship between national security science at Los Alamos and the evolution of computing is one of interdependence.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lingerfelt, Eric J; Messer, II, Otis E
2017-01-02
The Bellerophon software system supports CHIMERA, a production-level HPC application that simulates the evolution of core-collapse supernovae. Bellerophon enables CHIMERA's geographically dispersed team of collaborators to perform job monitoring and real-time data analysis from multiple supercomputing resources, including platforms at OLCF, NERSC, and NICS. Its multi-tier architecture provides an encapsulated, end-to-end software solution that enables the CHIMERA team to quickly and easily access highly customizable animated and static views of results from anywhere in the world via a cross-platform desktop application.
Evaluation of SuperLU on multicore architectures
NASA Astrophysics Data System (ADS)
Li, X. S.
2008-07-01
The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different high-performance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We included both Pthreads and MPI implementations in this study and found that the Pthreads implementation consistently delivers good performance and that a left-looking algorithm is usually superior.
Remote visualization and scale analysis of large turbulence datatsets
NASA Astrophysics Data System (ADS)
Livescu, D.; Pulido, J.; Burns, R.; Canada, C.; Ahrens, J.; Hamann, B.
2015-12-01
Accurate simulations of turbulent flows require solving all the dynamically relevant scales of motions. This technique, called Direct Numerical Simulation, has been successfully applied to a variety of simple flows; however, the large-scale flows encountered in Geophysical Fluid Dynamics (GFD) would require meshes outside the range of the most powerful supercomputers for the foreseeable future. Nevertheless, the current generation of petascale computers has enabled unprecedented simulations of many types of turbulent flows which focus on various GFD aspects, from the idealized configurations extensively studied in the past to more complex flows closer to the practical applications. The pace at which such simulations are performed only continues to increase; however, the simulations themselves are restricted to a small number of groups with access to large computational platforms. Yet the petabytes of turbulence data offer almost limitless information on many different aspects of the flow, from the hierarchy of turbulence moments, spectra and correlations, to structure-functions, geometrical properties, etc. The ability to share such datasets with other groups can significantly reduce the time to analyze the data, help the creative process and increase the pace of discovery. Using the largest DOE supercomputing platforms, we have performed some of the biggest turbulence simulations to date, in various configurations, addressing specific aspects of turbulence production and mixing mechanisms. Until recently, the visualization and analysis of such datasets was restricted by access to large supercomputers. The public Johns Hopkins Turbulence database simplifies the access to multi-Terabyte turbulence datasets and facilitates turbulence analysis through the use of commodity hardware. First, one of our datasets, which is part of the database, will be described and then a framework that adds high-speed visualization and wavelet support for multi-resolution analysis of turbulence will be highlighted. The addition of wavelet support reduces the latency and bandwidth requirements for visualization, allowing for many concurrent users, and enables new types of analyses, including scale decomposition and coherent feature extraction.
The UPSCALE project: a large simulation campaign
NASA Astrophysics Data System (ADS)
Mizielinski, Matthew; Roberts, Malcolm; Vidale, Pier Luigi; Schiemann, Reinhard; Demory, Marie-Estelle; Strachan, Jane
2014-05-01
The development of a traceable hierarchy of HadGEM3 global climate models, based upon the Met Office Unified Model, at resolutions from 135 km to 25 km, now allows the impact of resolution on the mean state, variability and extremes of climate to be studied in a robust fashion. In 2011 we successfully obtained a single-year grant of 144 million core hours of supercomputing time from the PRACE organization to run ensembles of 27 year atmosphere-only (HadGEM3-A GA3.0) climate simulations at 25km resolution, as used in present global weather forecasting, on HERMIT at HLRS. Through 2012 the UPSCALE project (UK on PRACE: weather-resolving Simulations of Climate for globAL Environmental risk) ran over 650 years of simulation at resolutions of 25 km (N512), 60 km (N216) and 135 km (N96) to look at the value of high resolution climate models in the study of both present climate and a potential future climate scenario based on RCP8.5. Over 400 TB of data was produced using HERMIT, with additional simulations run on HECToR (UK supercomputer) and MONSooN (Met Office NERC Supercomputing Node). The data generated was transferred to the JASMIN super-data cluster, hosted by STFC CEDA in the UK, where analysis facilities are allowing rapid scientific exploitation of the data set. Many groups across the UK and Europe are already taking advantage of these facilities and we welcome approaches from other interested scientists. This presentation will briefly cover the following points; Purpose and requirements of the UPSCALE project and facilities used. Technical implementation and hurdles (model porting and optimisation, automation, numerical failures, data transfer). Ensemble specification. Current analysis projects and access to the data set. A full description of UPSCALE and the data set generated has been submitted to Geoscientific Model development, with overview information available from http://proj.badc.rl.ac.uk/upscale .
Status report of the end-to-end ASKAP software system: towards early science operations
NASA Astrophysics Data System (ADS)
Guzman, Juan Carlos; Chapman, Jessica; Marquarding, Malte; Whiting, Matthew
2016-08-01
The Australian SKA Pathfinder (ASKAP) is a novel centimetre radio synthesis telescope currently in the commissioning phase and located in the midwest region of Western Australia. It comprises of 36 x 12 m diameter reflector antennas each equipped with state-of-the-art and award winning Phased Array Feeds (PAF) technology. The PAFs provide a wide, 30 square degree field-of-view by forming up to 36 separate dual-polarisation beams at once. This results in a high data rate: 70 TB of correlated visibilities in an 8-hour observation, requiring custom-written, high-performance software running in dedicated High Performance Computing (HPC) facilities. The first six antennas equipped with first-generation PAF technology (Mark I), named the Boolardy Engineering Test Array (BETA) have been in use since 2014 as a platform to test PAF calibration and imaging techniques, and along the way it has been producing some great science results. Commissioning of the ASKAP Array Release 1, that is the first six antennas with second-generation PAFs (Mark II) is currently under way. An integral part of the instrument is the Central Processor platform hosted at the Pawsey Supercomputing Centre in Perth, which executes custom-written software pipelines, designed specifically to meet the ASKAP imaging requirements of wide field of view and high dynamic range. There are three key hardware components of the Central Processor: The ingest nodes (16 x node cluster), the fast temporary storage (1 PB Lustre file system) and the processing supercomputer (200 TFlop system). This High-Performance Computing (HPC) platform is managed and supported by the Pawsey support team. Due to the limited amount of data generated by BETA and the first ASKAP Array Release, the Central Processor platform has been running in a more "traditional" or user-interactive mode. But this is about to change: integration and verification of the online ingest pipeline starts in early 2016, which is required to support the full 300 MHz bandwidth for Array Release 1; followed by the deployment of the real-time data processing components. In addition to the Central Processor, the first production release of the CSIRO ASKAP Science Data Archive (CASDA) has also been deployed in one of the Pawsey Supercomputing Centre facilities and it is integrated to the end-to-end ASKAP data flow system. This paper describes the current status of the "end-to-end" data flow software system from preparing observations to data acquisition, processing and archiving; and the challenges of integrating an HPC facility as a key part of the instrument. It also shares some lessons learned since the start of integration activities and the challenges ahead in preparation for the start of the Early Science program.
An approach to secure weather and climate models against hardware faults
NASA Astrophysics Data System (ADS)
Düben, Peter D.; Dawson, Andrew
2017-03-01
Enabling Earth System models to run efficiently on future supercomputers is a serious challenge for model development. Many publications study efficient parallelization to allow better scaling of performance on an increasing number of computing cores. However, one of the most alarming threats for weather and climate predictions on future high performance computing architectures is widely ignored: the presence of hardware faults that will frequently hit large applications as we approach exascale supercomputing. Changes in the structure of weather and climate models that would allow them to be resilient against hardware faults are hardly discussed in the model development community. In this paper, we present an approach to secure the dynamical core of weather and climate models against hardware faults using a backup system that stores coarse resolution copies of prognostic variables. Frequent checks of the model fields on the backup grid allow the detection of severe hardware faults, and prognostic variables that are changed by hardware faults on the model grid can be restored from the backup grid to continue model simulations with no significant delay. To justify the approach, we perform model simulations with a C-grid shallow water model in the presence of frequent hardware faults. As long as the backup system is used, simulations do not crash and a high level of model quality can be maintained. The overhead due to the backup system is reasonable and additional storage requirements are small. Runtime is increased by only 13 % for the shallow water model.
An approach to secure weather and climate models against hardware faults
NASA Astrophysics Data System (ADS)
Düben, Peter; Dawson, Andrew
2017-04-01
Enabling Earth System models to run efficiently on future supercomputers is a serious challenge for model development. Many publications study efficient parallelisation to allow better scaling of performance on an increasing number of computing cores. However, one of the most alarming threats for weather and climate predictions on future high performance computing architectures is widely ignored: the presence of hardware faults that will frequently hit large applications as we approach exascale supercomputing. Changes in the structure of weather and climate models that would allow them to be resilient against hardware faults are hardly discussed in the model development community. We present an approach to secure the dynamical core of weather and climate models against hardware faults using a backup system that stores coarse resolution copies of prognostic variables. Frequent checks of the model fields on the backup grid allow the detection of severe hardware faults, and prognostic variables that are changed by hardware faults on the model grid can be restored from the backup grid to continue model simulations with no significant delay. To justify the approach, we perform simulations with a C-grid shallow water model in the presence of frequent hardware faults. As long as the backup system is used, simulations do not crash and a high level of model quality can be maintained. The overhead due to the backup system is reasonable and additional storage requirements are small. Runtime is increased by only 13% for the shallow water model.
Enhancing Environmental HPC Applications: The EnCompAS approach
NASA Astrophysics Data System (ADS)
Frank, Anton; Donners, John; Pursula, Antti; Seinstra, Frank; Kranzlmüller, Dieter
2015-04-01
Many HPC applications in geoscience are of very high scientific quality and highly optimized for supercomputers. However, some of these codes lack the uptake by other adjacent scientific communities or industry due to deficiencies in usability, quality, and availability. Since enhancing software by, e.g., adding a graphical user interface, respecting data standards, setting up a support structure, or writing an extensive documentation is not of direct and immediate scientific relevance, most developers are not willing to invest any additional effort in these issues. Furthermore, if scientists, who are not directly involved in the development of some scientific software, could make benefit from additional features or interfaces, respective requests are often turned down due to the lack of time and resources. On the other hand, such enhancements are crucial for the sustainability of the scientific assets as well as the widespread or even worldwide distribution of European environmental software. Closely collaborating with environmental scientists the national supercomputing and eScience centres in Helsinki, Amsterdam, and Munich have identified that an enhancement of HPC and data analysis software must be provided as a service to the scientists developing such software. Therefore, first steps have been taken to establish respective services at these centres. In this talk we will present the already existing and envisioned service portfolio, some first success stories, and the approach to harmonize the current status aiming to turn this local effort into a pan-European service offering for environmental science.
The TESS science processing operations center
NASA Astrophysics Data System (ADS)
Jenkins, Jon M.; Twicken, Joseph D.; McCauliff, Sean; Campbell, Jennifer; Sanderfer, Dwight; Lung, David; Mansouri-Samani, Masoud; Girouard, Forrest; Tenenbaum, Peter; Klaus, Todd; Smith, Jeffrey C.; Caldwell, Douglas A.; Chacon, A. D.; Henze, Christopher; Heiges, Cory; Latham, David W.; Morgan, Edward; Swade, Daryl; Rinehart, Stephen; Vanderspek, Roland
2016-08-01
The Transiting Exoplanet Survey Satellite (TESS) will conduct a search for Earth's closest cousins starting in early 2018 and is expected to discover 1,000 small planets with Rp < 4 R⊕ and measure the masses of at least 50 of these small worlds. The Science Processing Operations Center (SPOC) is being developed at NASA Ames Research Center based on the Kepler science pipeline and will generate calibrated pixels and light curves on the NASA Advanced Supercomputing Division's Pleiades supercomputer. The SPOC will also search for periodic transit events and generate validation products for the transit-like features in the light curves. All TESS SPOC data products will be archived to the Mikulski Archive for Space Telescopes (MAST).
CFD code evaluation for internal flow modeling
NASA Technical Reports Server (NTRS)
Chung, T. J.
1990-01-01
Research on the computational fluid dynamics (CFD) code evaluation with emphasis on supercomputing in reacting flows is discussed. Advantages of unstructured grids, multigrids, adaptive methods, improved flow solvers, vector processing, parallel processing, and reduction of memory requirements are discussed. As examples, researchers include applications of supercomputing to reacting flow Navier-Stokes equations including shock waves and turbulence and combustion instability problems associated with solid and liquid propellants. Evaluation of codes developed by other organizations are not included. Instead, the basic criteria for accuracy and efficiency have been established, and some applications on rocket combustion have been made. Research toward an ultimate goal, the most accurate and efficient CFD code, is in progress and will continue for years to come.
Internal computational fluid mechanics on supercomputers for aerospace propulsion systems
NASA Technical Reports Server (NTRS)
Andersen, Bernhard H.; Benson, Thomas J.
1987-01-01
The accurate calculation of three-dimensional internal flowfields for application towards aerospace propulsion systems requires computational resources available only on supercomputers. A survey is presented of three-dimensional calculations of hypersonic, transonic, and subsonic internal flowfields conducted at the Lewis Research Center. A steady state Parabolized Navier-Stokes (PNS) solution of flow in a Mach 5.0, mixed compression inlet, a Navier-Stokes solution of flow in the vicinity of a terminal shock, and a PNS solution of flow in a diffusing S-bend with vortex generators are presented and discussed. All of these calculations were performed on either the NAS Cray-2 or the Lewis Research Center Cray XMP.
Supercomputer modeling of hydrogen combustion in rocket engines
NASA Astrophysics Data System (ADS)
Betelin, V. B.; Nikitin, V. F.; Altukhov, D. I.; Dushin, V. R.; Koo, Jaye
2013-08-01
Hydrogen being an ecological fuel is very attractive now for rocket engines designers. However, peculiarities of hydrogen combustion kinetics, the presence of zones of inverse dependence of reaction rate on pressure, etc. prevents from using hydrogen engines in all stages not being supported by other types of engines, which often brings the ecological gains back to zero from using hydrogen. Computer aided design of new effective and clean hydrogen engines needs mathematical tools for supercomputer modeling of hydrogen-oxygen components mixing and combustion in rocket engines. The paper presents the results of developing verification and validation of mathematical model making it possible to simulate unsteady processes of ignition and combustion in rocket engines.
Optimal wavelength-space crossbar switches for supercomputer optical interconnects.
Roudas, Ioannis; Hemenway, B Roe; Grzybowski, Richard R; Karinou, Fotini
2012-08-27
We propose a most economical design of the Optical Shared MemOry Supercomputer Interconnect System (OSMOSIS) all-optical, wavelength-space crossbar switch fabric. It is shown, by analysis and simulation, that the total number of on-off gates required for the proposed N × N switch fabric can scale asymptotically as N ln N if the number of input/output ports N can be factored into a product of small primes. This is of the same order of magnitude as Shannon's lower bound for switch complexity, according to which the minimum number of two-state switches required for the construction of a N × N permutation switch is log2 (N!).
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Howser, Lona M.
1993-01-01
The use of the CONVEX computers that are an integral part of the Supercomputing Network Subsystems (SNS) of the Central Scientific Computing Complex of LaRC is briefly described. Features of the CONVEX computers that are significantly different than the CRAY supercomputers are covered, including: FORTRAN, C, architecture of the CONVEX computers, the CONVEX environment, batch job submittal, debugging, performance analysis, utilities unique to CONVEX, and documentation. This revision reflects the addition of the Applications Compiler and X-based debugger, CXdb. The document id intended for all CONVEX users as a ready reference to frequently asked questions and to more detailed information contained with the vendor manuals. It is appropriate for both the novice and the experienced user.
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; McCann, Karen M.; Biswas, Rupak; VanderWijngaart, Rob; Yan, Jerry C. (Technical Monitor)
2000-01-01
The creation of parameter study suites has recently become a more challenging problem as the parameter studies have now become multi-tiered and the computational environment has become a supercomputer grid. The parameter spaces are vast, the individual problem sizes are getting larger, and researchers are now seeking to combine several successive stages of parameterization and computation. Simultaneously, grid-based computing offers great resource opportunity but at the expense of great difficulty of use. We present an approach to this problem which stresses intuitive visual design tools for parameter study creation and complex process specification, and also offers programming-free access to grid-based supercomputer resources and process automation.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
NASA Astrophysics Data System (ADS)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
A special purpose silicon compiler for designing supercomputing VLSI systems
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Murugavel, P.; Kamakoti, V.; Shankarraman, M. J.; Rangarajan, S.; Mallikarjun, M.; Karthikeyan, B.; Prabhakar, T. S.; Satish, V.; Venkatasubramaniam, P. R.
1991-01-01
Design of general/special purpose supercomputing VLSI systems for numeric algorithm execution involves tackling two important aspects, namely their computational and communication complexities. Development of software tools for designing such systems itself becomes complex. Hence a novel design methodology has to be developed. For designing such complex systems a special purpose silicon compiler is needed in which: the computational and communicational structures of different numeric algorithms should be taken into account to simplify the silicon compiler design, the approach is macrocell based, and the software tools at different levels (algorithm down to the VLSI circuit layout) should get integrated. In this paper a special purpose silicon (SPS) compiler based on PACUBE macrocell VLSI arrays for designing supercomputing VLSI systems is presented. It is shown that turn-around time and silicon real estate get reduced over the silicon compilers based on PLA's, SLA's, and gate arrays. The first two silicon compiler characteristics mentioned above enable the SPS compiler to perform systolic mapping (at the macrocell level) of algorithms whose computational structures are of GIPOP (generalized inner product outer product) form. Direct systolic mapping on PLA's, SLA's, and gate arrays is very difficult as they are micro-cell based. A novel GIPOP processor is under development using this special purpose silicon compiler.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; D'Azevedo, Eduardo; Philip, Bobby
On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phasemore » of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.« less
Real science at the petascale.
Saksena, Radhika S; Boghosian, Bruce; Fazendeiro, Luis; Kenway, Owain A; Manos, Steven; Mazzeo, Marco D; Sadiq, S Kashif; Suter, James L; Wright, David; Coveney, Peter V
2009-06-28
We describe computational science research that uses petascale resources to achieve scientific results at unprecedented scales and resolution. The applications span a wide range of domains, from investigation of fundamental problems in turbulence through computational materials science research to biomedical applications at the forefront of HIV/AIDS research and cerebrovascular haemodynamics. This work was mainly performed on the US TeraGrid 'petascale' resource, Ranger, at Texas Advanced Computing Center, in the first half of 2008 when it was the largest computing system in the world available for open scientific research. We have sought to use this petascale supercomputer optimally across application domains and scales, exploiting the excellent parallel scaling performance found on up to at least 32 768 cores for certain of our codes in the so-called 'capability computing' category as well as high-throughput intermediate-scale jobs for ensemble simulations in the 32-512 core range. Furthermore, this activity provides evidence that conventional parallel programming with MPI should be successful at the petascale in the short to medium term. We also report on the parallel performance of some of our codes on up to 65 636 cores on the IBM Blue Gene/P system at the Argonne Leadership Computing Facility, which has recently been named the fastest supercomputer in the world for open science.
Requirements for a network storage service
NASA Technical Reports Server (NTRS)
Kelly, Suzanne M.; Haynes, Rena A.
1991-01-01
Sandia National Laboratories provides a high performance classified computer network as a core capability in support of its mission of nuclear weapons design and engineering, physical sciences research, and energy research and development. The network, locally known as the Internal Secure Network (ISN), comprises multiple distributed local area networks (LAN's) residing in New Mexico and California. The TCP/IP protocol suite is used for inter-node communications. Scientific workstations and mid-range computers, running UNIX-based operating systems, compose most LAN's. One LAN, operated by the Sandia Corporate Computing Computing Directorate, is a general purpose resource providing a supercomputer and a file server to the entire ISN. The current file server on the supercomputer LAN is an implementation of the Common File Server (CFS). Subsequent to the design of the ISN, Sandia reviewed its mass storage requirements and chose to enter into a competitive procurement to replace the existing file server with one more adaptable to a UNIX/TCP/IP environment. The requirements study for the network was the starting point for the requirements study for the new file server. The file server is called the Network Storage Service (NSS) and its requirements are described. An application or functional description of the NSS is given. The final section adds performance, capacity, and access constraints to the requirements.
Detecting Silent Data Corruption for Extreme-Scale Applications through Data Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bautista-Gomez, Leonardo; Cappello, Franck
Supercomputers allow scientists to study natural phenomena by means of computer simulations. Next-generation machines are expected to have more components and, at the same time, consume several times less energy per operation. These trends are pushing supercomputer construction to the limits of miniaturization and energy-saving strategies. Consequently, the number of soft errors is expected to increase dramatically in the coming years. While mechanisms are in place to correct or at least detect some soft errors, a significant percentage of those errors pass unnoticed by the hardware. Such silent errors are extremely damaging because they can make applications silently produce wrongmore » results. In this work we propose a technique that leverages certain properties of high-performance computing applications in order to detect silent errors at the application level. Our technique detects corruption solely based on the behavior of the application datasets and is completely application-agnostic. We propose multiple corruption detectors, and we couple them to work together in a fashion transparent to the user. We demonstrate that this strategy can detect the majority of the corruptions, while incurring negligible overhead. We show that with the help of these detectors, applications can have up to 80% of coverage against data corruption.« less
Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolfe, Noah; Carothers, Christopher; Mubarak, Misbah
As supercomputers close in on exascale performance, the increased number of processors and processing power translates to an increased demand on the underlying network interconnect. The Slim Fly network topology, a new lowdiameter and low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present a high-fidelity Slim Fly it-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate our Slim Fly model with the Kathareios et al. Slim Fly model results provided at moderately sized network scales. We further scale the modelmore » size up to n unprecedented 1 million compute nodes; and through visualization of network simulation metrics such as link bandwidth, packet latency, and port occupancy, we get an insight into the network behavior at the million-node scale. We also show linear strong scaling of the Slim Fly model on an Intel cluster achieving a peak event rate of 36 million events per second using 128 MPI tasks to process 7 billion events. Detailed analysis of the underlying discrete-event simulation performance shows that a million-node Slim Fly model simulation can execute in 198 seconds on the Intel cluster.« less
Kenneth Wilson and Lattice QCD
NASA Astrophysics Data System (ADS)
Ukawa, Akira
2015-09-01
We discuss the physics and computation of lattice QCD, a space-time lattice formulation of quantum chromodynamics, and Kenneth Wilson's seminal role in its development. We start with the fundamental issue of confinement of quarks in the theory of the strong interactions, and discuss how lattice QCD provides a framework for understanding this phenomenon. A conceptual issue with lattice QCD is a conflict of space-time lattice with chiral symmetry of quarks. We discuss how this problem is resolved. Since lattice QCD is a non-linear quantum dynamical system with infinite degrees of freedom, quantities which are analytically calculable are limited. On the other hand, it provides an ideal case of massively parallel numerical computations. We review the long and distinguished history of parallel-architecture supercomputers designed and built for lattice QCD. We discuss algorithmic developments, in particular the difficulties posed by the fermionic nature of quarks, and their resolution. The triad of efforts toward better understanding of physics, better algorithms, and more powerful supercomputers have produced major breakthroughs in our understanding of the strong interactions. We review the salient results of this effort in understanding the hadron spectrum, the Cabibbo-Kobayashi-Maskawa matrix elements and CP violation, and quark-gluon plasma at high temperatures. We conclude with a brief summary and a future perspective.
PENTACLE: Parallelized particle-particle particle-tree code for planet formation
NASA Astrophysics Data System (ADS)
Iwasawa, Masaki; Oshino, Shoichi; Fujii, Michiko S.; Hori, Yasunori
2017-10-01
We have newly developed a parallelized particle-particle particle-tree code for planet formation, PENTACLE, which is a parallelized hybrid N-body integrator executed on a CPU-based (super)computer. PENTACLE uses a fourth-order Hermite algorithm to calculate gravitational interactions between particles within a cut-off radius and a Barnes-Hut tree method for gravity from particles beyond. It also implements an open-source library designed for full automatic parallelization of particle simulations, FDPS (Framework for Developing Particle Simulator), to parallelize a Barnes-Hut tree algorithm for a memory-distributed supercomputer. These allow us to handle 1-10 million particles in a high-resolution N-body simulation on CPU clusters for collisional dynamics, including physical collisions in a planetesimal disc. In this paper, we show the performance and the accuracy of PENTACLE in terms of \\tilde{R}_cut and a time-step Δt. It turns out that the accuracy of a hybrid N-body simulation is controlled through Δ t / \\tilde{R}_cut and Δ t / \\tilde{R}_cut ˜ 0.1 is necessary to simulate accurately the accretion process of a planet for ≥106 yr. For all those interested in large-scale particle simulations, PENTACLE, customized for planet formation, will be freely available from https://github.com/PENTACLE-Team/PENTACLE under the MIT licence.
An evaluation of the state of time synchronization on leadership class supercomputers
Jones, Terry; Ostrouchov, George; Koenig, Gregory A.; ...
2017-10-09
We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
2017-12-01
For instance, when Unit 121 created a supercomputer for decryption and encryption efforts by combining imported high -end qualified computers, Kim...developed and undeveloped countries, but the cyber-attacks by North Korea can double the effect by inflaming psychological fears when combined with other...lessons that the ROK and U.S. alliance can glean from NATO’s cyber cooperation. To maintain the balance of power with the ROK, North Korea has
Special issue of Computers and Fluids in honor of Cecil E. (Chuck) Leith
Zhou, Ye; Herring, Jackson
2017-05-12
Here, this special issue of Computers and Fluids is dedicated to Cecil E. (Chuck) Leith in honor of his research contributions, leadership in the areas of statistical fluid mechanics, computational fluid dynamics, and climate theory. Leith's contribution to these fields emerged from his interest in solving complex fluid flow problems--even those at high Mach numbers--in an era well before large scale supercomputing became the dominant mode of inquiry into these fields. Yet the issues raised and solved by his research effort are still of vital interest today.
Code Optimization and Parallelization on the Origins: Looking from Users' Perspective
NASA Technical Reports Server (NTRS)
Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)
2002-01-01
Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.
Systolic Processor Array For Recognition Of Spectra
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Peterson, John C.
1995-01-01
Spectral signatures of materials detected and identified quickly. Spectral Analysis Systolic Processor Array (SPA2) relatively inexpensive and satisfies need to analyze large, complex volume of multispectral data generated by imaging spectrometers to extract desired information: computational performance needed to do this in real time exceeds that of current supercomputers. Locates highly similar segments or contiguous subsegments in two different spectra at time. Compares sampled spectra from instruments with data base of spectral signatures of known materials. Computes and reports scores that express degrees of similarity between sampled and data-base spectra.
NASA Astrophysics Data System (ADS)
Wu, Linghui; Bihari, Bipin; Gan, Jianhua; Chen, Ray T.; Tang, Suning
1998-08-01
Si-CMOS compatible polymer-based waveguides for optoelectronic interconnects and packaging have been fabricated and characterized. A 1-to-48 fanout optoelectronic interconnection layer (OIL) structure based on Ultradel 9120/9020 for the high-speed massive clock signal distribution for a Cray T-90 supercomputer board has been constructed. The OIL employs multimode polymeric channel waveguides in conjunction with surface-normal waveguide output coupler and 1-to-2 splitter. A total insertion loss of 7.98 dB at 850 nm was measured experimentally.
Special issue of Computers and Fluids in honor of Cecil E. (Chuck) Leith
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ye; Herring, Jackson
Here, this special issue of Computers and Fluids is dedicated to Cecil E. (Chuck) Leith in honor of his research contributions, leadership in the areas of statistical fluid mechanics, computational fluid dynamics, and climate theory. Leith's contribution to these fields emerged from his interest in solving complex fluid flow problems--even those at high Mach numbers--in an era well before large scale supercomputing became the dominant mode of inquiry into these fields. Yet the issues raised and solved by his research effort are still of vital interest today.
An evaluation of the state of time synchronization on leadership class supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry; Ostrouchov, George; Koenig, Gregory A.
We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
Superconductor Digital Electronics: -- Current Status, Future Prospects
NASA Astrophysics Data System (ADS)
Mukhanov, Oleg
2011-03-01
Two major applications of superconductor electronics: communications and supercomputing will be presented. These areas hold a significant promise of a large impact on electronics state-of-the-art for the defense and commercial markets stemming from the fundamental advantages of superconductivity: simultaneous high speed and low power, lossless interconnect, natural quantization, and high sensitivity. The availability of relatively small cryocoolers lowered the foremost market barrier for cryogenically-cooled superconductor electronic systems. These fundamental advantages enabled a novel Digital-RF architecture - a disruptive technological approach changing wireless communications, radar, and surveillance system architectures dramatically. Practical results were achieved for Digital-RF systems in which wide-band, multi-band radio frequency signals are directly digitized and digital domain is expanded throughout the entire system. Digital-RF systems combine digital and mixed signal integrated circuits based on Rapid Single Flux Quantum (RSFQ) technology, superconductor analog filter circuits, and semiconductor post-processing circuits. The demonstrated cryocooled Digital-RF systems are the world's first and fastest directly digitizing receivers operating with live satellite signals, enabling multi-net data links, and performing signal acquisition from HF to L-band with 30 GHz clock frequencies. In supercomputing, superconductivity leads to the highest energy efficiencies per operation. Superconductor technology based on manipulation and ballistic transfer of magnetic flux quanta provides a superior low-power alternative to CMOS and other charge-transfer based device technologies. The fundamental energy consumption in SFQ circuits defined by flux quanta energy 2 x 10-19 J. Recently, a novel energy-efficient zero-static-power SFQ technology, eSFQ/ERSFQ was invented, which retains all advantages of standard RSFQ circuits: high-speed, dc power, internal memory. The voltage bias regulation, determined by SFQ clock, enables the zero-power at zero-activity regimes, indispensable for sensor and quantum bit readout.
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation
NASA Astrophysics Data System (ADS)
Anisenkov, A. V.
2018-03-01
In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
NASA Technical Reports Server (NTRS)
Stevens, Grady H.
1992-01-01
The Data Distribution Satellite (DDS), operating in conjunction with the planned space network, the National Research and Education Network and its commercial derivatives, would play a key role in networking the emerging supercomputing facilities, national archives, academic, industrial, and government institutions. Centrally located over the United States in geostationary orbit, DDS would carry sophisticated on-board switching and make use of advanced antennas to provide an array of special services. Institutions needing continuous high data rate service would be networked together by use of a microwave switching matrix and electronically steered hopping beams. Simultaneously, DDS would use other beams and on board processing to interconnect other institutions with lesser, low rate, intermittent needs. Dedicated links to White Sands and other facilities would enable direct access to space payloads and sensor data. Intersatellite links to a second generation ATDRS, called Advanced Space Data Acquisition and Communications System (ASDACS), would eliminate one satellite hop and enhance controllability of experimental payloads by reducing path delay. Similarly, direct access would be available to the supercomputing facilities and national data archives. Economies with DDS would be derived from its ability to switch high rate facilities amongst users needed. At the same time, having a CONUS view, DDS would interconnect with any institution regardless of how remote. Whether one needed high rate service or low rate service would be immaterial. With the capability to assign resources on demand, DDS will need only carry a portion of the resources needed if dedicated facilities were used. Efficiently switching resources to users as needed, DDS would become a very feasible spacecraft, even though it would tie together the space network, the terrestrial network, remote sites, 1000's of small users, and those few who need very large data links intermittently.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, D. H.
1985-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Will Your Next Supercomputer Come from Costco?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farber, Rob
2007-04-15
A fun topic for April, one that is not an April fool’s joke, is that you can purchase a commodity 200+ Gflop (single-precision) Linux supercomputer for around $600 from your favorite electronic vendor. Yes, it’s true. Just walk in and ask for a Sony Playstation 3 (PS3), take it home and install Linux on it. IBM has provided an excellent tutorial for installing Linux and building applications at http://www-128.ibm.com/developerworks/power/library/pa-linuxps3-1. If you want to raise some eyebrows at work, then submit a purchase request for a Sony PS3 game console and watch the reactions as your paperwork wends its way throughmore » the procurement process.« less
Interactive 3D visualization speeds well, reservoir planning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petzet, G.A.
1997-11-24
Texaco Exploration and Production has begun making expeditious analyses and drilling decisions that result from interactive, large screen visualization of seismic and other three dimensional data. A pumpkin shaped room or pod inside a 3,500 sq ft, state-of-the-art facility in Southwest Houston houses a supercomputer and projection equipment Texaco said will help its people sharply reduce 3D seismic project cycle time, boost production from existing fields, and find more reserves. Oil and gas related applications of the visualization center include reservoir engineering, plant walkthrough simulation for facilities/piping design, and new field exploration. The center houses a Silicon Graphics Onyx2 infinitemore » reality supercomputer configured with 8 processors, 3 graphics pipelines, and 6 gigabytes of main memory.« less
NASA Astrophysics Data System (ADS)
Xue, Wenhua; Dang, Hongli; Liu, Yingdi; Jentoft, Friederike; Resasco, Daniel; Wang, Sanwu
2014-03-01
In the study of catalytic reactions of biomass, furfural conversion over metal catalysts with the presence of hydrogen has attracted wide attention. We report ab initio molecular dynamics simulations for furfural and hydrogen on the Pd(111) surface at finite temperatures. The simulations demonstrate that the presence of hydrogen is important in promoting furfural conversion. In particular, hydrogen molecules dissociate rapidly on the Pd(111) surface. As a result of such dissociation, atomic hydrogen participates in the reactions with furfural. The simulations also provide detailed information about the possible reactions of hydrogen with furfural. Supported by DOE (DE-SC0004600). This research used the supercomputer resources of the XSEDE, the NERSC Center, and the Tandy Supercomputing Center.
NASA Astrophysics Data System (ADS)
Dang, Hongli; Xue, Wenhua; Liu, Yingdi; Jentoft, Friederike; Resasco, Daniel; Wang, Sanwu
2014-03-01
We report first-principles density-functional calculations and ab initio molecular dynamics (MD) simulations for the reactions involving furfural, which is an important intermediate in biomass conversion, at the catalytic liquid-solid interfaces. The different dynamic processes of furfural at the water-Cu(111) and water-Pd(111) interfaces suggest different catalytic reaction mechanisms for the conversion of furfural. Simulations for the dynamic processes with and without hydrogen demonstrate the importance of the liquid-solid interface as well as the presence of hydrogen in possible catalytic reactions including hydrogenation and decarbonylation of furfural. Supported by DOE (DE-SC0004600). This research used the supercomputer resources of the XSEDE, the NERSC Center, and the Tandy Supercomputing Center.
The TESS Science Processing Operations Center
NASA Technical Reports Server (NTRS)
Jenkins, Jon M.; Twicken, Joseph D.; McCauliff, Sean; Campbell, Jennifer; Sanderfer, Dwight; Lung, David; Mansouri-Samani, Masoud; Girouard, Forrest; Tenenbaum, Peter; Klaus, Todd;
2016-01-01
The Transiting Exoplanet Survey Satellite (TESS) will conduct a search for Earth's closest cousins starting in early 2018 and is expected to discover approximately 1,000 small planets with R(sub p) less than 4 (solar radius) and measure the masses of at least 50 of these small worlds. The Science Processing Operations Center (SPOC) is being developed at NASA Ames Research Center based on the Kepler science pipeline and will generate calibrated pixels and light curves on the NASA Advanced Supercomputing Division's Pleiades supercomputer. The SPOC will also search for periodic transit events and generate validation products for the transit-like features in the light curves. All TESS SPOC data products will be archived to the Mikulski Archive for Space Telescopes (MAST).
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, David H.
1987-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
An analysis of file migration in a UNIX supercomputing environment
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1992-01-01
The super computer center at the National Center for Atmospheric Research (NCAR) migrates large numbers of files to and from its mass storage system (MSS) because there is insufficient space to store them on the Cray supercomputer's local disks. This paper presents an analysis of file migration data collected over two years. The analysis shows that requests to the MSS are periodic, with one day and one week periods. Read requests to the MSS account for the majority of the periodicity; as write requests are relatively constant over the course of a week. Additionally, reads show a far greater fluctuation than writes over a day and week since reads are driven by human users while writes are machine-driven.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pluharova, Eva; Baer, Marcel D.; Mundy, Christopher J.
2014-07-03
Understanding specific ion effects on proteins remains a considerable challenge. N-methylacetamide serves as a useful proxy for the protein backbone that can be well characterized both experimentally and theoretically. The spectroscopic signatures in the amide I band reflecting the strength of the interaction of alkali cations and alkali earth dications with the carbonyl group remain difficult to assign and controversial to interpret. Herein, we directly compute the IR shifts corresponding to the binding of either sodium or calcium to aqueous N-methylacetamide using ab initio molecular dynamics simulations. We show that the two cations interact with aqueous N-methylacetamide with different affinitiesmore » and in different geometries. Since sodium exhibits a weak interaction with the carbonyl group, the resulting amide I band is similar to an unperturbed carbonyl group undergoing aqueous solvation. In contrast, the stronger calcium binding results in a clear IR shift with respect to N-methylacetamide in pure water. Support from the Czech Ministry of Education (grant LH12001) is gratefully acknowledged. EP thanks the International Max-Planck Research School for support and the Alternative Sponsored Fellowship program at Pacific Northwest National Laboratory (PNNL). PJ acknowledges the Praemium Academie award from the Academy of Sciences. Calculations of the free energy profiles were made possible through generous allocation of computer time from the North-German Supercomputing Alliance (HLRN). Calculations of vibrational spectra were performed in part using the computational resources in the National Energy Research Supercomputing Center (NERSC) at Lawrence Berkeley National Laboratory. This work was supported by National Science Foundation grant CHE-0431312. CJM is supported by the U.S. Department of Energy`s (DOE) Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences and Biosciences. PNNL is operated for the Department of Energy by Battelle. MDB is grateful for the support of the Linus Pauling Distinguished Postdoctoral Fellowship Program at PNNL.« less
2014 Annual Report - Argonne Leadership Computing Facility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, James R.; Papka, Michael E.; Cerny, Beth A.
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines.
2015 Annual Report - Argonne Leadership Computing Facility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, James R.; Papka, Michael E.; Cerny, Beth A.
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines.
NASA Astrophysics Data System (ADS)
Wolff, J.; Jankov, I.; Beck, J.; Carson, L.; Frimel, J.; Harrold, M.; Jiang, H.
2016-12-01
It is well known that global and regional numerical weather prediction ensemble systems are under-dispersive, producing unreliable and overconfident ensemble forecasts. Typical approaches to alleviate this problem include the use of multiple dynamic cores, multiple physics suite configurations, or a combination of the two. While these approaches may produce desirable results, they have practical and theoretical deficiencies and are more difficult and costly to maintain. An active area of research that promotes a more unified and sustainable system for addressing the deficiencies in ensemble modeling is the use of stochastic physics to represent model-related uncertainty. Stochastic approaches include Stochastic Parameter Perturbations (SPP), Stochastic Kinetic Energy Backscatter (SKEB), Stochastic Perturbation of Physics Tendencies (SPPT), or some combination of all three. The focus of this study is to assess the model performance within a convection-permitting ensemble at 3-km grid spacing across the Contiguous United States (CONUS) when using stochastic approaches. For this purpose, the test utilized a single physics suite configuration based on the operational High-Resolution Rapid Refresh (HRRR) model, with ensemble members produced by employing stochastic methods. Parameter perturbations were employed in the Rapid Update Cycle (RUC) land surface model and Mellor-Yamada-Nakanishi-Niino (MYNN) planetary boundary layer scheme. Results will be presented in terms of bias, error, spread, skill, accuracy, reliability, and sharpness using the Model Evaluation Tools (MET) verification package. Due to the high level of complexity of running a frequently updating (hourly), high spatial resolution (3 km), large domain (CONUS) ensemble system, extensive high performance computing (HPC) resources were needed to meet this objective. Supercomputing resources were provided through the National Center for Atmospheric Research (NCAR) Strategic Capability (NSC) project support, allowing for a more extensive set of tests over multiple seasons, consequently leading to more robust results. Through the use of these stochastic innovations and powerful supercomputing at NCAR, further insights and advancements in ensemble forecasting at convection-permitting scales will be possible.
On The Export Control Of High Speed Imaging For Nuclear Weapons Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Watson, Scott Avery; Altherr, Michael Robert
Since the Manhattan Project, the use of high-speed photography, and its cousins flash radiography1 and schieleren photography have been a technological proliferation concern. Indeed, like the supercomputer, the development of high-speed photography as we now know it essentially grew out of the nuclear weapons program at Los Alamos2,3,4. Naturally, during the course of the last 75 years the technology associated with computers and cameras has been export controlled by the United States and others to prevent both proliferation among non-P5-nations and technological parity among potential adversaries among P5 nations. Here we revisit these issues as they relate to high-speed photographicmore » technologies and make recommendations about how future restrictions, if any, should be guided.« less
High Performance Computing Software Applications for Space Situational Awareness
NASA Astrophysics Data System (ADS)
Giuliano, C.; Schumacher, P.; Matson, C.; Chun, F.; Duncan, B.; Borelli, K.; Desonia, R.; Gusciora, G.; Roe, K.
The High Performance Computing Software Applications Institute for Space Situational Awareness (HSAI-SSA) has completed its first full year of applications development. The emphasis of our work in this first year was in improving space surveillance sensor models and image enhancement software. These applications are the Space Surveillance Network Analysis Model (SSNAM), the Air Force Space Fence simulation (SimFence), and physically constrained iterative de-convolution (PCID) image enhancement software tool. Specifically, we have demonstrated order of magnitude speed-up in those codes running on the latest Cray XD-1 Linux supercomputer (Hoku) at the Maui High Performance Computing Center. The software applications improvements that HSAI-SSA has made, has had significant impact to the warfighter and has fundamentally changed the role of high performance computing in SSA.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
NASA Astrophysics Data System (ADS)
Wollherr, Stephanie; Gabriel, Alice-Agnes; Igel, Heiner
2015-04-01
In dynamic rupture models, high stress concentrations at rupture fronts have to to be accommodated by off-fault inelastic processes such as plastic deformation. As presented in (Roten et al., 2014), incorporating plastic yielding can significantly reduce earlier predictions of ground motions in the Los Angeles Basin. Further, an inelastic response of materials surrounding a fault potentially has a strong impact on surface displacement and is therefore a key aspect in understanding the triggering of tsunamis through floor uplifting. We present an implementation of off-fault-plasticity and its verification for the software package SeisSol, an arbitrary high-order derivative discontinuous Galerkin (ADER-DG) method. The software recently reached multi-petaflop/s performance on some of the largest supercomputers worldwide and was a Gordon Bell prize finalist application in 2014 (Heinecke et al., 2014). For the nonelastic calculations we impose a Drucker-Prager yield criterion in shear stress with a viscous regularization following (Andrews, 2005). It permits the smooth relaxation of high stress concentrations induced in the dynamic rupture process. We verify the implementation by comparison to the SCEC/USGS Spontaneous Rupture Code Verification Benchmarks. The results of test problem TPV13 with a 60-degree dipping normal fault show that SeisSol is in good accordance with other codes. Additionally we aim to explore the numerical characteristics of the off-fault plasticity implementation by performing convergence tests for the 2D code. The ADER-DG method is especially suited for complex geometries by using unstructured tetrahedral meshes. Local adaptation of the mesh resolution enables a fine sampling of the cohesive zone on the fault while simultaneously satisfying the dispersion requirements of wave propagation away from the fault. In this context we will investigate the influence of off-fault-plasticity on geometrically complex fault zone structures like subduction zones or branched faults. Studying the interplay of stress conditions and angle dependence of neighbouring branches including inelastic material behaviour and its effects on rupture jumps and seismic activation helps to advance our understanding of earthquake source processes. An application is the simulation of a real large-scale subduction zone scenario including plasticity to validate the coupling of our dynamic rupture calculations to a tsunami model in the framework of the ASCETE project (http://www.ascete.de/). Andrews, D. J. (2005): Rupture dynamics with energy loss outside the slip zone, J. Geophys. Res., 110, B01307. Heinecke, A. (2014), A. Breuer, S. Rettenberger, M. Bader, A.-A. Gabriel, C. Pelties, A. Bode, W. Barth, K. Vaidyanathan, M. Smelyanskiy and P. Dubey: Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers. In Supercomputing 2014, The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, LA, USA, November 2014. Roten, D. (2014), K. B. Olsen, S.M. Day, Y. Cui, and D. Fäh: Expected seismic shaking in Los Angeles reduced by San Andreas fault zone plasticity, Geophys. Res. Lett., 41, 2769-2777.
NASA Astrophysics Data System (ADS)
Nassiri, Ali; Vivek, Anupam; Abke, Tim; Liu, Bert; Lee, Taeseon; Daehn, Glenn
2017-06-01
Numerical simulations of high-velocity impact welding are extremely challenging due to the coupled physics and highly dynamic nature of the process. Thus, conventional mesh-based numerical methodologies are not able to accurately model the process owing to the excessive mesh distortion close to the interface of two welded materials. A simulation platform was developed using smoothed particle hydrodynamics, implemented in a parallel architecture on a supercomputer. Then, the numerical simulations were compared to experimental tests conducted by vaporizing foil actuator welding. The close correspondence of the experiment and modeling in terms of interface characteristics allows the prediction of local temperature and strain distributions, which are not easily measured.
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
The High Time Resolution Universe
NASA Astrophysics Data System (ADS)
Bailes, Matthew; Possenti, Andrea; Johnston, Simon; Kramer, Michael; Burgay, Marta; Bhat, Ramesh; Keith, Michael; Burke-Spolaor, Sarah; van Straten, Willem; Stappers, Benjamin; Bates, Samuel
2008-04-01
The Parkes multibeam surveys heralded a new era in pulsar surveys, more than doubling the number of pulsars known. However, at high time resolution, they were severely limited by the analogue backend system, which limited the volume of sky they could effectively survey to just the local 2-3 kpc. Here we propose to use a new digital backend coupled with Australia's most powerful (16 Tflop) supercomputing cluster to conduct three ambitious surveys for millisecond and relativistic pulsars with the Parkes telescope. We hope to discover over 200 new millisecond and relativistic pulsars that will define the recycled pulsar period distribution, supply pulsars for the timing array and aid in our understanding of binary evolution.
Visualization at Supercomputing Centers: The Tale of Little Big Iron and the Three Skinny Guys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bethel, E. Wes; van Rosendale, John; Southard, Dale
2010-12-01
Supercomputing Centers (SC's) are unique resources that aim to enable scientific knowledge discovery through the use of large computational resources, the Big Iron. Design, acquisition, installation, and management of the Big Iron are activities that are carefully planned and monitored. Since these Big Iron systems produce a tsunami of data, it is natural to co-locate visualization and analysis infrastructure as part of the same facility. This infrastructure consists of hardware (Little Iron) and staff (Skinny Guys). Our collective experience suggests that design, acquisition, installation, and management of the Little Iron and Skinny Guys does not receive the same level ofmore » treatment as that of the Big Iron. The main focus of this article is to explore different aspects of planning, designing, fielding, and maintaining the visualization and analysis infrastructure at supercomputing centers. Some of the questions we explore in this article include:"How should the Little Iron be sized to adequately support visualization and analysis of data coming off the Big Iron?" What sort of capabilities does it need to have?" Related questions concern the size of visualization support staff:"How big should a visualization program be (number of persons) and what should the staff do?" and"How much of the visualization should be provided as a support service, and how much should applications scientists be expected to do on their own?"« less
NASA Astrophysics Data System (ADS)
Hecht, K. T.
2012-12-01
This volume contains the contributions of the speakers of an international conference in honor of Jerry Draayer's 70th birthday, entitled 'Horizons of Innovative Theories, Experiments and Supercomputing in Nuclear Physics'. The list of contributors includes not only international experts in these fields, but also many former collaborators, former graduate students, and former postdoctoral fellows of Jerry Draayer, stressing innovative theories such as special symmetries and supercomputing, both of particular interest to Jerry. The organizers of the conference intended to honor Jerry Draayer not only for his seminal contributions in these fields, but also for his administrative skills at departmental, university, national and international level. Signed: Ted Hecht University of Michigan Conference photograph Scientific Advisory Committee Ani AprahamianUniversity of Notre Dame Baha BalantekinUniversity of Wisconsin Bruce BarrettUniversity of Arizona Umit CatalyurekOhio State Unversity David DeanOak Ridge National Laboratory Jutta Escher (Chair)Lawrence Livermore National Laboratory Jorge HirschUNAM, Mexico David RoweUniversity of Toronto Brad Sherill & Michigan State University Joel TohlineLouisiana State University Edward ZganjarLousiana State University Organizing Committee Jeff BlackmonLouisiana State University Mark CaprioUniversity of Notre Dame Tomas DytrychLouisiana State University Ana GeorgievaINRNE, Bulgaria Kristina Launey (Co-chair)Louisiana State University Gabriella PopaOhio University Zanesville James Vary (Co-chair)Iowa State University Local Organizing Committee Laura LinhardtLouisiana State University Charlie RascoLouisiana State University Karen Richard (Coordinator)Louisiana State University
Pre-Hardware Optimization and Implementation Of Fast Optics Closed Control Loop Algorithms
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Lyon, Richard G.; Herman, Jay R.; Abuhassan, Nader
2004-01-01
One of the main heritage tools used in scientific and engineering data spectrum analysis is the Fourier Integral Transform and its high performance digital equivalent - the Fast Fourier Transform (FFT). The FFT is particularly useful in two-dimensional (2-D) image processing (FFT2) within optical systems control. However, timing constraints of a fast optics closed control loop would require a supercomputer to run the software implementation of the FFT2 and its inverse, as well as other image processing representative algorithm, such as numerical image folding and fringe feature extraction. A laboratory supercomputer is not always available even for ground operations and is not feasible for a night project. However, the computationally intensive algorithms still warrant alternative implementation using reconfigurable computing technologies (RC) such as Digital Signal Processors (DSP) and Field Programmable Gate Arrays (FPGA), which provide low cost compact super-computing capabilities. We present a new RC hardware implementation and utilization architecture that significantly reduces the computational complexity of a few basic image-processing algorithm, such as FFT2, image folding and phase diversity for the NASA Solar Viewing Interferometer Prototype (SVIP) using a cluster of DSPs and FPGAs. The DSP cluster utilization architecture also assures avoidance of a single point of failure, while using commercially available hardware. This, combined with the control algorithms pre-hardware optimization, or the first time allows construction of image-based 800 Hertz (Hz) optics closed control loops on-board a spacecraft, based on the SVIP ground instrument. That spacecraft is the proposed Earth Atmosphere Solar Occultation Imager (EASI) to study greenhouse gases CO2, C2H, H2O, O3, O2, N2O from Lagrange-2 point in space. This paper provides an advanced insight into a new type of science capabilities for future space exploration missions based on on-board image processing for control and for robotics missions using vision sensors. It presents a top-level description of technologies required for the design and construction of SVIP and EASI and to advance the spatial-spectral imaging and large-scale space interferometry science and engineering.
Towards high-resolution mantle convection simulations
NASA Astrophysics Data System (ADS)
Höink, T.; Richards, M. A.; Lenardic, A.
2009-12-01
The motion of tectonic plates at the Earth’s surface, earthquakes, most forms of volcanism, the growth and evolution of continents, and the volatile fluxes that govern the composition and evolution of the oceans and atmosphere are all controlled by the process of solid-state thermal convection in the Earth’s rocky mantle, with perhaps a minor contribution from convection in the iron core. Similar processes govern the evolution of other planetary objects such as Mars, Venus, Titan, and Europa, all of which might conceivably shed light on the origin and evolution of life on Earth. Modeling and understanding this complicated dynamical system is one of the true “grand challenges” of Earth and planetary science. In the past three decades much progress towards understanding the dynamics of mantle convection has been made, with the increasing aid of computational modeling. Numerical sophistication has evolved significantly, and a small number of independent codes have been successfully employed. Computational power continues to increase dramatically, and with it the ability to resolve increasingly finer fluid mechanical structures. Yet, the perhaps most often cited limitation in numerical modeling based publications is still the limitation of computing power, because the ability to resolve thermal boundary layers within the convecting mantle (e.g., lithospheric plates), requires a spatial resolution of ~ 10 km. At present, the largest supercomputing facilities still barely approach the power to resolve this length scale in mantle convection simulations that include the physics necessary to model plate-like behavior. Our goal is to use supercomputing facilities to perform 3D spherical mantle convection simulations that include the ingredients for plate-like behavior, i.e. strongly temperature- and stress-dependent viscosity, at Earth-like convective vigor with a global resolution of order 10 km. In order to qualify to use such facilities, it is also necessary to demonstrate good parallel efficiency. Here we will present two kinds of results: (1) scaling properties of the community code CitcomS on DOE/NERSC's supercomputer Franklin for up to ~ 6000 processors, and (2) preliminary simulations that illustrate the role of a low-viscosity asthenosphere in plate-like behavior in mantle convection.
NASA Astrophysics Data System (ADS)
Xue, Wenhua; Borja, Miguel Gonzalez; Resasco, Daniel E.; Wang, Sanwu
2015-03-01
In the study of catalytic reactions of biomass, furfural conversion over metal catalysts with the presence of water has attracted wide attention. Recent experiments showed that the proportion of alcohol product from catalytic reactions of furfural conversion with palladium in the presence of water is significantly increased, when compared with other solvent including dioxane, decalin, and ethanol. We investigated the microscopic mechanism of the reactions based on first-principles quantum-mechanical calculations. We particularly identified the important role of water and the liquid/solid interface in furfural conversion. Our results provide atomic-scale details for the catalytic reactions. Supported by DOE (DE-SC0004600). This research used the supercomputer resources at NERSC, of XSEDE, at TACC, and at the Tandy Supercomputing Center.
NASA Technical Reports Server (NTRS)
Gentzsch, W.
1982-01-01
Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.
FAST: A multi-processed environment for visualization of computational fluid dynamics
NASA Technical Reports Server (NTRS)
Bancroft, Gordon V.; Merritt, Fergus J.; Plessel, Todd C.; Kelaita, Paul G.; Mccabe, R. Kevin
1991-01-01
Three-dimensional, unsteady, multi-zoned fluid dynamics simulations over full scale aircraft are typical of the problems being investigated at NASA Ames' Numerical Aerodynamic Simulation (NAS) facility on CRAY2 and CRAY-YMP supercomputers. With multiple processor workstations available in the 10-30 Mflop range, we feel that these new developments in scientific computing warrant a new approach to the design and implementation of analysis tools. These larger, more complex problems create a need for new visualization techniques not possible with the existing software or systems available as of this writing. The visualization techniques will change as the supercomputing environment, and hence the scientific methods employed, evolves even further. The Flow Analysis Software Toolkit (FAST), an implementation of a software system for fluid mechanics analysis, is discussed.
Vectorized program architectures for supercomputer-aided circuit design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rizzoli, V.; Ferlito, M.; Neri, A.
1986-01-01
Vector processors (supercomputers) can be effectively employed in MIC or MMIC applications to solve problems of large numerical size such as broad-band nonlinear design or statistical design (yield optimization). In order to fully exploit the capabilities of a vector hardware, any program architecture must be structured accordingly. This paper presents a possible approach to the ''semantic'' vectorization of microwave circuit design software. Speed-up factors of the order of 50 can be obtained on a typical vector processor (Cray X-MP), with respect to the most powerful scaler computers (CDC 7600), with cost reductions of more than one order of magnitude. Thismore » could broaden the horizon of microwave CAD techniques to include problems that are practically out of the reach of conventional systems.« less
NASA Astrophysics Data System (ADS)
Bhanota, Gyan; Chen, Dong; Gara, Alan; Vranas, Pavlos
2003-05-01
The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores, two 2.8 Gflops floating point units, 4 MB of embedded DRAM as cache, a memory controller for external memory, six 1.4 Gbit/s bi-directional ports for a 3-dimensional torus network connection, three 2.8 Gbit/s bi-directional ports for connecting to a global tree network and a Gigabit Ethernet for I/O. 65,536 of such nodes are connected into a 3-d torus with a geometry of 32×32×64. The total peak performance of the system is 360 Teraflops and the total amount of memory is 16 TeraBytes.
CFD Research, Parallel Computation and Aerodynamic Optimization
NASA Technical Reports Server (NTRS)
Ryan, James S.
1995-01-01
During the last five years, CFD has matured substantially. Pure CFD research remains to be done, but much of the focus has shifted to integration of CFD into the design process. The work under these cooperative agreements reflects this trend. The recent work, and work which is planned, is designed to enhance the competitiveness of the US aerospace industry. CFD and optimization approaches are being developed and tested, so that the industry can better choose which methods to adopt in their design processes. The range of computer architectures has been dramatically broadened, as the assumption that only huge vector supercomputers could be useful has faded. Today, researchers and industry can trade off time, cost, and availability, choosing vector supercomputers, scalable parallel architectures, networked workstations, or heterogenous combinations of these to complete required computations efficiently.
NASA Technical Reports Server (NTRS)
Samba, A. S.
1985-01-01
The problem of solving banded linear systems by direct (non-iterative) techniques on the Vector Processor System (VPS) 32 supercomputer is considered. Two efficient direct methods for solving banded linear systems on the VPS 32 are described. The vector cyclic reduction (VCR) algorithm is discussed in detail. The performance of the VCR on a three parameter model problem is also illustrated. The VCR is an adaptation of the conventional point cyclic reduction algorithm. The second direct method is the Customized Reduction of Augmented Triangles' (CRAT). CRAT has the dominant characteristics of an efficient VPS 32 algorithm. CRAT is tailored to the pipeline architecture of the VPS 32 and as a consequence the algorithm is implicitly vectorizable.
Climate@Home: Crowdsourcing Climate Change Research
NASA Astrophysics Data System (ADS)
Xu, C.; Yang, C.; Li, J.; Sun, M.; Bambacus, M.
2011-12-01
Climate change deeply impacts human wellbeing. Significant amounts of resources have been invested in building super-computers that are capable of running advanced climate models, which help scientists understand climate change mechanisms, and predict its trend. Although climate change influences all human beings, the general public is largely excluded from the research. On the other hand, scientists are eagerly seeking communication mediums for effectively enlightening the public on climate change and its consequences. The Climate@Home project is devoted to connect the two ends with an innovative solution: crowdsourcing climate computing to the general public by harvesting volunteered computing resources from the participants. A distributed web-based computing platform will be built to support climate computing, and the general public can 'plug-in' their personal computers to participate in the research. People contribute the spare computing power of their computers to run a computer model, which is used by scientists to predict climate change. Traditionally, only super-computers could handle such a large computing processing load. By orchestrating massive amounts of personal computers to perform atomized data processing tasks, investments on new super-computers, energy consumed by super-computers, and carbon release from super-computers are reduced. Meanwhile, the platform forms a social network of climate researchers and the general public, which may be leveraged to raise climate awareness among the participants. A portal is to be built as the gateway to the climate@home project. Three types of roles and the corresponding functionalities are designed and supported. The end users include the citizen participants, climate scientists, and project managers. Citizen participants connect their computing resources to the platform by downloading and installing a computing engine on their personal computers. Computer climate models are defined at the server side. Climate scientists configure computer model parameters through the portal user interface. After model configuration, scientists then launch the computing task. Next, data is atomized and distributed to computing engines that are running on citizen participants' computers. Scientists will receive notifications on the completion of computing tasks, and examine modeling results via visualization modules of the portal. Computing tasks, computing resources, and participants are managed by project managers via portal tools. A portal prototype has been built for proof of concept. Three forums have been setup for different groups of users to share information on science aspect, technology aspect, and educational outreach aspect. A facebook account has been setup to distribute messages via the most popular social networking platform. New treads are synchronized from the forums to facebook. A mapping tool displays geographic locations of the participants and the status of tasks on each client node. A group of users have been invited to test functions such as forums, blogs, and computing resource monitoring.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahrens, James P; Patchett, John M; Lo, Li - Ta
2011-01-24
This report provides documentation for the completion of the Los Alamos portion of the ASC Level II 'Visualization on the Supercomputing Platform' milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratory and Los Alamos National Laboratory. The milestone text is shown in Figure 1 with the Los Alamos portions highlighted in boldfaced text. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is the most computationally intensive portion of the visualization process. Formore » terascale platforms, commodity clusters with graphics processors (GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the perfromance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. In conclusion, we improved CPU-based rendering performance by a a factor of 2-10 times on our tests. In addition, we evaluated CPU and CPU-based rendering performance. We encourage production visualization experts to consider using CPU-based rendering solutions when it is appropriate. For example, on remote supercomputers CPU-based rendering can offer a means of viewing data without having to offload the data or geometry onto a CPU-based visualization system. In terms of comparative performance of the CPU and CPU we believe that further optimizations of the performance of both CPU or CPU-based rendering are possible. The simulation community is currently confronting this reality as they work to port their simulations to different hardware architectures. What is interesting about CPU rendering of massive datasets is that for part two decades CPU performance has significantly outperformed CPU-based systems. Based on our advancements, evaluations and explorations we believe that CPU-based rendering has returned as one viable option for the visualization of massive datasets.« less
Challenges and opportunities of cloud computing for atmospheric sciences
NASA Astrophysics Data System (ADS)
Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.
2016-04-01
Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Y.; Gunasekaran, Raghul; Ma, Xiaosong
2016-01-01
Inter-application I/O contention and performance interference have been recognized as severe problems. In this work, we demonstrate, through measurement from Titan (world s No. 3 supercomputer), that high I/O variance co-exists with the fact that individual storage units remain under-utilized for the majority of the time. This motivates us to propose AID, a system that performs automatic application I/O characterization and I/O-aware job scheduling. AID analyzes existing I/O traffic and batch job history logs, without any prior knowledge on applications or user/developer involvement. It identifies the small set of I/O-intensive candidates among all applications running on a supercomputer and subsequentlymore » mines their I/O patterns, using more detailed per-I/O-node traffic logs. Based on such auto- extracted information, AID provides online I/O-aware scheduling recommendations to steer I/O-intensive applications away from heavy ongoing I/O activities. We evaluate AID on Titan, using both real applications (with extracted I/O patterns validated by contacting users) and our own pseudo-applications. Our results confirm that AID is able to (1) identify I/O-intensive applications and their detailed I/O characteristics, and (2) significantly reduce these applications I/O performance degradation/variance by jointly evaluating out- standing applications I/O pattern and real-time system l/O load.« less
Requirements for a network storage service
NASA Technical Reports Server (NTRS)
Kelly, Suzanne M.; Haynes, Rena A.
1992-01-01
Sandia National Laboratories provides a high performance classified computer network as a core capability in support of its mission of nuclear weapons design and engineering, physical sciences research, and energy research and development. The network, locally known as the Internal Secure Network (ISN), was designed in 1989 and comprises multiple distributed local area networks (LAN's) residing in Albuquerque, New Mexico and Livermore, California. The TCP/IP protocol suite is used for inner-node communications. Scientific workstations and mid-range computers, running UNIX-based operating systems, compose most LAN's. One LAN, operated by the Sandia Corporate Computing Directorate, is a general purpose resource providing a supercomputer and a file server to the entire ISN. The current file server on the supercomputer LAN is an implementation of the Common File System (CFS) developed by Los Alamos National Laboratory. Subsequent to the design of the ISN, Sandia reviewed its mass storage requirements and chose to enter into a competitive procurement to replace the existing file server with one more adaptable to a UNIX/TCP/IP environment. The requirements study for the network was the starting point for the requirements study for the new file server. The file server is called the Network Storage Services (NSS) and is requirements are described in this paper. The next section gives an application or functional description of the NSS. The final section adds performance, capacity, and access constraints to the requirements.
LFRic: Building a new Unified Model
NASA Astrophysics Data System (ADS)
Melvin, Thomas; Mullerworth, Steve; Ford, Rupert; Maynard, Chris; Hobson, Mike
2017-04-01
The LFRic project, named for Lewis Fry Richardson, aims to develop a replacement for the Met Office Unified Model in order to meet the challenges which will be presented by the next generation of exascale supercomputers. This project, a collaboration between the Met Office, STFC Daresbury and the University of Manchester, builds on the earlier GungHo project to redesign the dynamical core, in partnership with NERC. The new atmospheric model aims to retain the performance of the current ENDGame dynamical core and associated subgrid physics, while also enabling a far greater scalability and flexibility to accommodate future supercomputer architectures. Design of the model revolves around a principle of a 'separation of concerns', whereby the natural science aspects of the code can be developed without worrying about the underlying architecture, while machine dependent optimisations can be carried out at a high level. These principles are put into practice through the development of an autogenerated Parallel Systems software layer (known as the PSy layer) using a domain-specific compiler called PSyclone. The prototype model includes a re-write of the dynamical core using a mixed finite element method, in which different function spaces are used to represent the various fields. It is able to run in parallel with MPI and OpenMP and has been tested on over 200,000 cores. In this talk an overview of the both the natural science and computational science implementations of the model will be presented.
Heuristic Scheduling in Grid Environments: Reducing the Operational Energy Demand
NASA Astrophysics Data System (ADS)
Bodenstein, Christian
In a world where more and more businesses seem to trade in an online market, the supply of online services to the ever-growing demand could quickly reach its capacity limits. Online service providers may find themselves maxed out at peak operation levels during high-traffic timeslots but too little demand during low-traffic timeslots, although the latter is becoming less frequent. At this point deciding which user is allocated what level of service becomes essential. The concept of Grid computing could offer a meaningful alternative to conventional super-computing centres. Not only can Grids reach the same computing speeds as some of the fastest supercomputers, but distributed computing harbors a great energy-saving potential. When scheduling projects in such a Grid environment however, simply assigning one process to a system becomes so complex in calculation that schedules are often too late to execute, rendering their optimizations useless. Current schedulers attempt to maximize the utility, given some sort of constraint, often reverting to heuristics. This optimization often comes at the cost of environmental impact, in this case CO 2 emissions. This work proposes an alternate model of energy efficient scheduling while keeping a respectable amount of economic incentives untouched. Using this model, it is possible to reduce the total energy consumed by a Grid environment using 'just-in-time' flowtime management, paired with ranking nodes by efficiency.
Goudey, Benjamin; Abedini, Mani; Hopper, John L; Inouye, Michael; Makalic, Enes; Schmidt, Daniel F; Wagner, John; Zhou, Zeyu; Zobel, Justin; Reumann, Matthias
2015-01-01
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs
NASA Astrophysics Data System (ADS)
Leutwyler, D.; Fuhrer, O.; Ban, N.; Lapillonne, X.; Lüthi, D.; Schar, C.
2016-12-01
Convection-resolving models have proven to be very useful tools in numerical weather prediction and in climate research. However, due to their extremely demanding computational requirements, they have so far been limited to short simulations and/or small computational domains. Innovations in the supercomputing domain have led to new supercomputer designs that involve conventional multi-core CPUs and accelerators such as graphics processing units (GPUs). One of the first atmospheric models that has been fully ported to GPUs is the Consortium for Small-Scale Modeling weather and climate model COSMO. This new version allows us to expand the size of the simulation domain to areas spanning continents and the time period up to one decade. We present results from a decade-long, convection-resolving climate simulation over Europe using the GPU-enabled COSMO version on a computational domain with 1536x1536x60 gridpoints. The simulation is driven by the ERA-interim reanalysis. The results illustrate how the approach allows for the representation of interactions between synoptic-scale and meso-scale atmospheric circulations at scales ranging from 1000 to 10 km. We discuss some of the advantages and prospects from using GPUs, and focus on the performance of the convection-resolving modeling approach on the European scale. Specifically we investigate the organization of convective clouds and on validate hourly rainfall distributions with various high-resolution data sets.
The Tera Multithreaded Architecture and Unstructured Meshes
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Mavriplis, Dimitri J.
1998-01-01
The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
Petascale computation of multi-physics seismic simulations
NASA Astrophysics Data System (ADS)
Gabriel, Alice-Agnes; Madden, Elizabeth H.; Ulrich, Thomas; Wollherr, Stephanie; Duru, Kenneth C.
2017-04-01
Capturing the observed complexity of earthquake sources in concurrence with seismic wave propagation simulations is an inherently multi-scale, multi-physics problem. In this presentation, we present simulations of earthquake scenarios resolving high-detail dynamic rupture evolution and high frequency ground motion. The simulations combine a multitude of representations of model complexity; such as non-linear fault friction, thermal and fluid effects, heterogeneous fault stress and fault strength initial conditions, fault curvature and roughness, on- and off-fault non-elastic failure to capture dynamic rupture behavior at the source; and seismic wave attenuation, 3D subsurface structure and bathymetry impacting seismic wave propagation. Performing such scenarios at the necessary spatio-temporal resolution requires highly optimized and massively parallel simulation tools which can efficiently exploit HPC facilities. Our up to multi-PetaFLOP simulations are performed with SeisSol (www.seissol.org), an open-source software package based on an ADER-Discontinuous Galerkin (DG) scheme solving the seismic wave equations in velocity-stress formulation in elastic, viscoelastic, and viscoplastic media with high-order accuracy in time and space. Our flux-based implementation of frictional failure remains free of spurious oscillations. Tetrahedral unstructured meshes allow for complicated model geometry. SeisSol has been optimized on all software levels, including: assembler-level DG kernels which obtain 50% peak performance on some of the largest supercomputers worldwide; an overlapping MPI-OpenMP parallelization shadowing the multiphysics computations; usage of local time stepping; parallel input and output schemes and direct interfaces to community standard data formats. All these factors enable aim to minimise the time-to-solution. The results presented highlight the fact that modern numerical methods and hardware-aware optimization for modern supercomputers are essential to further our understanding of earthquake source physics and complement both physic-based ground motion research and empirical approaches in seismic hazard analysis. Lastly, we will conclude with an outlook on future exascale ADER-DG solvers for seismological applications.
An online mineral dust model within the global/regional NMMB: current progress and plans
NASA Astrophysics Data System (ADS)
Perez, C.; Haustein, K.; Janjic, Z.; Jorba, O.; Baldasano, J. M.; Black, T.; Nickovic, S.
2008-12-01
While mineral dust distribution and effects are important on global scales, they strongly depend on dust emissions that are occurring on small spatial and temporal scales. Indeed, the accuracy of surface wind speed used in dust models is crucial. Due to the high-order power dependency on wind friction velocity and the threshold behaviour of dust emissions, small errors in surface wind speed lead to large dust emission errors. Most global dust models use prescribed wind fields provided by major meteorological centres (e.g., NCEP and ECMWF) and their spatial resolution is currently about 1 degree x 1 degree . Such wind speeds tend to be strongly underestimated over arid and semi-arid areas and do not account for mesoscale systems responsible for a significant fraction of dust emissions regionally and globally. Other significant uncertainties in dust emissions resulting from such approaches are related to the misrepresentation of high subgrid-scale spatial heterogeneity in soil and vegetation boundary conditions, mainly in semi-arid areas. In order to significantly reduce these uncertainties, the Barcelona Supercomputing Center is currently implementing a mineral dust model coupled on-line with the new global/regional NMMB atmospheric model using the ESMF framework under development in NOAA/NCEP/EMC. The NMMB is an evolution of the operational WRF-NMME extending from meso to global scales, and including non-hydrostatic option and improved tracer advection. This model is planned to become the next-generation NCEP mesoscale model for operational weather forecasting in North America. Current implementation is based on the well established regional dust model and forecast system Eta/DREAM (http://www.bsc.es/projects/earthscience/DREAM/). First successful global simulations show the potentials of such an approach and compare well with DREAM regionally. Ongoing developments include improvements in dust size distribution representation, sedimentation, dry deposition, wet scavenging and dust-radiation feedback, as well as the efficient implementation of the model on High Performance Supercomputers for global simulations and forecasts at high resolution.
Compilation of Abstracts for SC12 Conference Proceedings
NASA Technical Reports Server (NTRS)
Morello, Gina Francine (Compiler)
2012-01-01
1 A Breakthrough in Rotorcraft Prediction Accuracy Using Detached Eddy Simulation; 2 Adjoint-Based Design for Complex Aerospace Configurations; 3 Simulating Hypersonic Turbulent Combustion for Future Aircraft; 4 From a Roar to a Whisper: Making Modern Aircraft Quieter; 5 Modeling of Extended Formation Flight on High-Performance Computers; 6 Supersonic Retropropulsion for Mars Entry; 7 Validating Water Spray Simulation Models for the SLS Launch Environment; 8 Simulating Moving Valves for Space Launch System Liquid Engines; 9 Innovative Simulations for Modeling the SLS Solid Rocket Booster Ignition; 10 Solid Rocket Booster Ignition Overpressure Simulations for the Space Launch System; 11 CFD Simulations to Support the Next Generation of Launch Pads; 12 Modeling and Simulation Support for NASA's Next-Generation Space Launch System; 13 Simulating Planetary Entry Environments for Space Exploration Vehicles; 14 NASA Center for Climate Simulation Highlights; 15 Ultrascale Climate Data Visualization and Analysis; 16 NASA Climate Simulations and Observations for the IPCC and Beyond; 17 Next-Generation Climate Data Services: MERRA Analytics; 18 Recent Advances in High-Resolution Global Atmospheric Modeling; 19 Causes and Consequences of Turbulence in the Earths Protective Shield; 20 NASA Earth Exchange (NEX): A Collaborative Supercomputing Platform; 21 Powering Deep Space Missions: Thermoelectric Properties of Complex Materials; 22 Meeting NASA's High-End Computing Goals Through Innovation; 23 Continuous Enhancements to the Pleiades Supercomputer for Maximum Uptime; 24 Live Demonstrations of 100-Gbps File Transfers Across LANs and WANs; 25 Untangling the Computing Landscape for Climate Simulations; 26 Simulating Galaxies and the Universe; 27 The Mysterious Origin of Stellar Masses; 28 Hot-Plasma Geysers on the Sun; 29 Turbulent Life of Kepler Stars; 30 Modeling Weather on the Sun; 31 Weather on Mars: The Meteorology of Gale Crater; 32 Enhancing Performance of NASAs High-End Computing Applications; 33 Designing Curiosity's Perfect Landing on Mars; 34 The Search Continues: Kepler's Quest for Habitable Earth-Sized Planets.
Radio Astronomy at the Centre for High Performance Computing in South Africa
NASA Astrophysics Data System (ADS)
Catherine Cress; UWC Simulation Team
2014-04-01
I will present results on galaxy evolution and cosmology which we obtained using the supercomputing facilities at the CHPC. These include cosmological-scale N-body simulations modelling neutral hydrogen as well as the study of the clustering of radio galaxies to probe the relationship between dark and luminous matter in the universe. I will also discuss the various roles that the CHPC is playing in Astronomy in SA, including the provision of HPC for a variety of Astronomical applications, the provision of storage for radio data, our educational programs and our participation in planning for the SKA.
Beyond Moore’s technologies: operation principles of a superconductor alternative
Klenov, Nikolay V; Bakurskiy, Sergey V; Kupriyanov, Mikhail Yu; Gudkov, Alexander L; Sidorenko, Anatoli S
2017-01-01
The predictions of Moore’s law are considered by experts to be valid until 2020 giving rise to “post-Moore’s” technologies afterwards. Energy efficiency is one of the major challenges in high-performance computing that should be answered. Superconductor digital technology is a promising post-Moore’s alternative for the development of supercomputers. In this paper, we consider operation principles of an energy-efficient superconductor logic and memory circuits with a short retrospective review of their evolution. We analyze their shortcomings in respect to computer circuits design. Possible ways of further research are outlined. PMID:29354341
LBNL Computational ResearchTheory Facility Groundbreaking - Full Press Conference. Feb 1st, 2012
Yelick, Kathy
2018-01-24
Energy Secretary Steven Chu, along with Berkeley Lab and UC leaders, broke ground on the Lab's Computational Research and Theory (CRT) facility yesterday. The CRT will be at the forefront of high-performance supercomputing research and be DOE's most efficient facility of its kind. Joining Secretary Chu as speakers were Lab Director Paul Alivisatos, UC President Mark Yudof, Office of Science Director Bill Brinkman, and UC Berkeley Chancellor Robert Birgeneau. The festivities were emceed by Associate Lab Director for Computing Sciences, Kathy Yelick, and Berkeley Mayor Tom Bates joined in the shovel ceremony.
Using Strassen's algorithm to accelerate the solution of linear systems
NASA Technical Reports Server (NTRS)
Bailey, David H.; Lee, King; Simon, Horst D.
1990-01-01
Strassen's algorithm for fast matrix-matrix multiplication has been implemented for matrices of arbitrary shapes on the CRAY-2 and CRAY Y-MP supercomputers. Several techniques have been used to reduce the scratch space requirement for this algorithm while simultaneously preserving a high level of performance. When the resulting Strassen-based matrix multiply routine is combined with some routines from the new LAPACK library, LU decomposition can be performed with rates significantly higher than those achieved by conventional means. We succeeded in factoring a 2048 x 2048 matrix on the CRAY Y-MP at a rate equivalent to 325 MFLOPS.
NASA Technical Reports Server (NTRS)
Gillian, Ronnie E.; Lotts, Christine G.
1988-01-01
The Computational Structural Mechanics (CSM) Activity at Langley Research Center is developing methods for structural analysis on modern computers. To facilitate that research effort, an applications development environment has been constructed to insulate the researcher from the many computer operating systems of a widely distributed computer network. The CSM Testbed development system was ported to the Numerical Aerodynamic Simulator (NAS) Cray-2, at the Ames Research Center, to provide a high end computational capability. This paper describes the implementation experiences, the resulting capability, and the future directions for the Testbed on supercomputers.
Computer Simulation Performed for Columbia Project Cooling System
NASA Technical Reports Server (NTRS)
Ahmad, Jasim
2005-01-01
This demo shows a high-fidelity simulation of the air flow in the main computer room housing the Columbia (10,024 intel titanium processors) system. The simulation asseses the performance of the cooling system and identified deficiencies, and recommended modifications to eliminate them. It used two in house software packages on NAS supercomputers: Chimera Grid tools to generate a geometric model of the computer room, OVERFLOW-2 code for fluid and thermal simulation. This state-of-the-art technology can be easily extended to provide a general capability for air flow analyses on any modern computer room. Columbia_CFD_black.tiff
LBNL Computational Research and Theory Facility Groundbreaking. February 1st, 2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yelick, Kathy
2012-02-02
Energy Secretary Steven Chu, along with Berkeley Lab and UC leaders, broke ground on the Lab's Computational Research and Theory (CRT) facility yesterday. The CRT will be at the forefront of high-performance supercomputing research and be DOE's most efficient facility of its kind. Joining Secretary Chu as speakers were Lab Director Paul Alivisatos, UC President Mark Yudof, Office of Science Director Bill Brinkman, and UC Berkeley Chancellor Robert Birgeneau. The festivities were emceed by Associate Lab Director for Computing Sciences, Kathy Yelick, and Berkeley Mayor Tom Bates joined in the shovel ceremony.
LBNL Computational Research and Theory Facility Groundbreaking. February 1st, 2012
Yelick, Kathy
2017-12-09
Energy Secretary Steven Chu, along with Berkeley Lab and UC leaders, broke ground on the Lab's Computational Research and Theory (CRT) facility yesterday. The CRT will be at the forefront of high-performance supercomputing research and be DOE's most efficient facility of its kind. Joining Secretary Chu as speakers were Lab Director Paul Alivisatos, UC President Mark Yudof, Office of Science Director Bill Brinkman, and UC Berkeley Chancellor Robert Birgeneau. The festivities were emceed by Associate Lab Director for Computing Sciences, Kathy Yelick, and Berkeley Mayor Tom Bates joined in the shovel ceremony.
Advanced flight computers for planetary exploration
NASA Technical Reports Server (NTRS)
Stephenson, R. Rhoads
1988-01-01
Research concerning flight computers for use on interplanetary probes is reviewed. The history of these computers from the Viking mission to the present is outlined. The differences between ground commercial computers and computers for planetary exploration are listed. The development of a computer for the Mariner Mark II comet rendezvous asteroid flyby mission is described. Various aspects of recently developed computer systems are examined, including the Max real time, embedded computer, a hypercube distributed supercomputer, a SAR data processor, a processor for the High Resolution IR Imaging Spectrometer, and a robotic vision multiresolution pyramid machine for processsing images obtained by a Mars Rover.
An Automated, High-Throughput System for GISAXS and GIWAXS Measurements of Thin Films
NASA Astrophysics Data System (ADS)
Schaible, Eric; Jimenez, Jessica; Church, Matthew; Lim, Eunhee; Stewart, Polite; Hexemer, Alexander
Grazing incidence small-angle X-ray scattering (GISAXS) and grazing incidence wide-angle X-ray scattering (GIWAXS) are important techniques for characterizing thin films. In order to meet rapidly increasing demand, the SAXSWAXS beamline at the Advanced Light Source (beamline 7.3.3) has implemented a fully automated, high-throughput system to conduct SAXS, GISAXS and GIWAXS measurements. An automated robot arm transfers samples from a holding tray to a measurement stage. Intelligent software aligns each sample in turn, and measures each according to user-defined specifications. Users mail in trays of samples on individually barcoded pucks, and can download and view their data remotely. Data will be pipelined to the NERSC supercomputing facility, and will be available to users via a web portal that facilitates highly parallelized analysis.
Final Report for DOE Award ER25756
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kesselman, Carl
2014-11-17
The SciDAC-funded Center for Enabling Distributed Petascale Science (CEDPS) was established to address technical challenges that arise due to the frequent geographic distribution of data producers (in particular, supercomputers and scientific instruments) and data consumers (people and computers) within the DOE laboratory system. Its goal is to produce technical innovations that meet DOE end-user needs for (a) rapid and dependable placement of large quantities of data within a distributed high-performance environment, and (b) the convenient construction of scalable science services that provide for the reliable and high-performance processing of computation and data analysis requests from many remote clients. The Centermore » is also addressing (c) the important problem of troubleshooting these and other related ultra-high-performance distributed activities from the perspective of both performance and functionality« less
Relativistic Collisions of Highly-Charged Ions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ionescu, Dorin; Belkacem, Ali
1998-11-19
The physics of elementary atomic processes in relativistic collisions between highly-charged ions and atoms or other ions is briefly discussed, and some recent theoretical and experimental results in this field are summarized. They include excitation, capture, ionization, and electron-positron pair creation. The numerical solution of the two-center Dirac equation in momentum space is shown to be a powerful nonperturbative method for describing atomic processes in relativistic collisions involving heavy and highly-charged ions. By propagating negative-energy wave packets in time the evolution of the QED vacuum around heavy ions in relativistic motion is investigated. Recent results obtained from numerical calculations usingmore » massively parallel processing on the Cray-T3E supercomputer of the National Energy Research Scientific Computer Center (NERSC) at Berkeley National Laboratory are presented.« less
OpenTopography: Addressing Big Data Challenges Using Cloud Computing, HPC, and Data Analytics
NASA Astrophysics Data System (ADS)
Crosby, C. J.; Nandigam, V.; Phan, M.; Youn, C.; Baru, C.; Arrowsmith, R.
2014-12-01
OpenTopography (OT) is a geoinformatics-based data facility initiated in 2009 for democratizing access to high-resolution topographic data, derived products, and tools. Hosted at the San Diego Supercomputer Center (SDSC), OT utilizes cyberinfrastructure, including large-scale data management, high-performance computing, and service-oriented architectures to provide efficient Web based access to large, high-resolution topographic datasets. OT collocates data with processing tools to enable users to quickly access custom data and derived products for their application. OT's ongoing R&D efforts aim to solve emerging technical challenges associated with exponential growth in data, higher order data products, as well as user base. Optimization of data management strategies can be informed by a comprehensive set of OT user access metrics that allows us to better understand usage patterns with respect to the data. By analyzing the spatiotemporal access patterns within the datasets, we can map areas of the data archive that are highly active (hot) versus the ones that are rarely accessed (cold). This enables us to architect a tiered storage environment consisting of high performance disk storage (SSD) for the hot areas and less expensive slower disk for the cold ones, thereby optimizing price to performance. From a compute perspective, OT is looking at cloud based solutions such as the Microsoft Azure platform to handle sudden increases in load. An OT virtual machine image in Microsoft's VM Depot can be invoked and deployed quickly in response to increased system demand. OT has also integrated SDSC HPC systems like the Gordon supercomputer into our infrastructure tier to enable compute intensive workloads like parallel computation of hydrologic routing on high resolution topography. This capability also allows OT to scale to HPC resources during high loads to meet user demand and provide more efficient processing. With a growing user base and maturing scientific user community comes new requests for algorithms and processing capabilities. To address this demand, OT is developing an extensible service based architecture for integrating community-developed software. This "plugable" approach to Web service deployment will enable new processing and analysis tools to run collocated with OT hosted data.
What problem are you working on?
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2013-11-21
Superconductors, supercomputers, new materials, clean energy, big science - ORNL researchers' work is multidisciplinary and world-leading. Hear them explain it in their own words in this video first shown at UT-Battelle's 2013 Awards Night.
2009-09-15
Obama Administration launches Cloud Computing Initiative at Ames Research Center. Vivek Kundra, White House Chief Federal Information Officer (right) and Lori Garver, NASA Deputy Administrator (left) get a tour & demo NASAS Supercomputing Center Hyperwall.
What problem are you working on?
None
2018-05-07
Superconductors, supercomputers, new materials, clean energy, big science - ORNL researchers' work is multidisciplinary and world-leading. Hear them explain it in their own words in this video first shown at UT-Battelle's 2013 Awards Night.
Analytical Applications of Monte Carlo Techniques.
ERIC Educational Resources Information Center
Guell, Oscar A.; Holcombe, James A.
1990-01-01
Described are analytical applications of the theory of random processes, in particular solutions obtained by using statistical procedures known as Monte Carlo techniques. Supercomputer simulations, sampling, integration, ensemble, annealing, and explicit simulation are discussed. (CW)
Accessing and visualizing scientific spatiotemporal data
NASA Technical Reports Server (NTRS)
Katz, Daniel S.; Bergou, Attila; Berriman, G. Bruce; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;
2004-01-01
This paper discusses work done by JPL's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids.
NASA Astrophysics Data System (ADS)
Boyle, P.; Chen, D.; Christ, N.; Clark, M.; Cohen, S.; Cristian, C.; Dong, Z.; Gara, A.; Joo, B.; Jung, C.; Kim, C.; Levkova, L.; Liao, X.; Liu, G.; Li, S.; Lin, H.; Mawhinney, R.; Ohta, S.; Petrov, K.; Wettig, T.; Yamaguchi, A.
2005-03-01
The QCDOC project has developed a supercomputer optimised for the needs of Lattice QCD simulations. It provides a very competitive price to sustained performance ratio of around $1 USD per sustained Megaflop/s in combination with outstanding scalability. Thus very large systems delivering over 5 TFlop/s of performance on the evolution of a single lattice is possible. Large prototypes have been built and are functioning correctly. The software environment raises the state of the art in such custom supercomputers. It is based on a lean custom node operating system that eliminates many unnecessary overheads that plague other systems. Despite the custom nature, the operating system implements a standards compliant UNIX-like programming environment easing the porting of software from other systems. The SciDAC QMP interface adds internode communication in a fashion that provides a uniform cross-platform programming environment.
NASA Technical Reports Server (NTRS)
Deardorff, Glenn; Djomehri, M. Jahed; Freeman, Ken; Gambrel, Dave; Green, Bryan; Henze, Chris; Hinke, Thomas; Hood, Robert; Kiris, Cetin; Moran, Patrick;
2001-01-01
A series of NASA presentations for the Supercomputing 2001 conference are summarized. The topics include: (1) Mars Surveyor Landing Sites "Collaboratory"; (2) Parallel and Distributed CFD for Unsteady Flows with Moving Overset Grids; (3) IP Multicast for Seamless Support of Remote Science; (4) Consolidated Supercomputing Management Office; (5) Growler: A Component-Based Framework for Distributed/Collaborative Scientific Visualization and Computational Steering; (6) Data Mining on the Information Power Grid (IPG); (7) Debugging on the IPG; (8) Debakey Heart Assist Device: (9) Unsteady Turbopump for Reusable Launch Vehicle; (10) Exploratory Computing Environments Component Framework; (11) OVERSET Computational Fluid Dynamics Tools; (12) Control and Observation in Distributed Environments; (13) Multi-Level Parallelism Scaling on NASA's Origin 1024 CPU System; (14) Computing, Information, & Communications Technology; (15) NAS Grid Benchmarks; (16) IPG: A Large-Scale Distributed Computing and Data Management System; and (17) ILab: Parameter Study Creation and Submission on the IPG.
NASA Technical Reports Server (NTRS)
Nguyen, Howard; Willacy, Karen; Allen, Mark
2012-01-01
KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.
Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming
2017-06-16
Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
Application-level regression testing framework using Jenkins
Budiardja, Reuben; Bouvet, Timothy; Arnold, Galen
2017-09-26
Monitoring and testing for regression of large-scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. The goal was to find an automated solution for running user-level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We also describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one-time password, and integratemore » with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user-level system-wide regression testing and monitoring framework for large supercomputer systems.« less
NASA Technical Reports Server (NTRS)
Saini, Subhash; Hood, Robert T.; Chang, Johnny; Baron, John
2016-01-01
We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5- 2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.
Multiprocessing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1991-01-01
Little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPS or more) to improve turnaround time in computational aerodynamics. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, such improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) is applied through multitasking via a strategy that requires relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-Fortran-Unix interface. The existing code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor.
Solving large sparse eigenvalue problems on supercomputers
NASA Technical Reports Server (NTRS)
Philippe, Bernard; Saad, Youcef
1988-01-01
An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
A Computational framework for telemedicine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.
1998-07-01
Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less
Application-level regression testing framework using Jenkins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Budiardja, Reuben; Bouvet, Timothy; Arnold, Galen
Monitoring and testing for regression of large-scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. The goal was to find an automated solution for running user-level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We also describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one-time password, and integratemore » with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user-level system-wide regression testing and monitoring framework for large supercomputer systems.« less
Supercomputing in the Age of Discovering Superearths, Earths and Exoplanet Systems
NASA Technical Reports Server (NTRS)
Jenkins, Jon M.
2015-01-01
NASA's Kepler Mission was launched in March 2009 as NASA's first mission capable of finding Earth-size planets orbiting in the habitable zone of Sun-like stars, that range of distances for which liquid water would pool on the surface of a rocky planet. Kepler has discovered over 1000 planets and over 4600 candidates, many of them as small as the Earth. Today, Kepler's amazing success seems to be a fait accompli to those unfamiliar with her history. But twenty years ago, there were no planets known outside our solar system, and few people believed it was possible to detect tiny Earth-size planets orbiting other stars. Motivating NASA to select Kepler for launch required a confluence of the right detector technology, advances in signal processing and algorithms, and the power of supercomputing.
Multi-GPU accelerated three-dimensional FDTD method for electromagnetic simulation.
Nagaoka, Tomoaki; Watanabe, Soichi
2011-01-01
Numerical simulation with a numerical human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the numerical human model, we adapt three-dimensional FDTD code to a multi-GPU environment using Compute Unified Device Architecture (CUDA). In this study, we used NVIDIA Tesla C2070 as GPGPU boards. The performance of multi-GPU is evaluated in comparison with that of a single GPU and vector supercomputer. The calculation speed with four GPUs was approximately 3.5 times faster than with a single GPU, and was slightly (approx. 1.3 times) slower than with the supercomputer. Calculation speed of the three-dimensional FDTD method using GPUs can significantly improve with an expanding number of GPUs.
Computational Approaches to Simulation and Optimization of Global Aircraft Trajectories
NASA Technical Reports Server (NTRS)
Ng, Hok Kwan; Sridhar, Banavar
2016-01-01
This study examines three possible approaches to improving the speed in generating wind-optimal routes for air traffic at the national or global level. They are: (a) using the resources of a supercomputer, (b) running the computations on multiple commercially available computers and (c) implementing those same algorithms into NASAs Future ATM Concepts Evaluation Tool (FACET) and compares those to a standard implementation run on a single CPU. Wind-optimal aircraft trajectories are computed using global air traffic schedules. The run time and wait time on the supercomputer for trajectory optimization using various numbers of CPUs ranging from 80 to 10,240 units are compared with the total computational time for running the same computation on a single desktop computer and on multiple commercially available computers for potential computational enhancement through parallel processing on the computer clusters. This study also re-implements the trajectory optimization algorithm for further reduction of computational time through algorithm modifications and integrates that with FACET to facilitate the use of the new features which calculate time-optimal routes between worldwide airport pairs in a wind field for use with existing FACET applications. The implementations of trajectory optimization algorithms use MATLAB, Python, and Java programming languages. The performance evaluations are done by comparing their computational efficiencies and based on the potential application of optimized trajectories. The paper shows that in the absence of special privileges on a supercomputer, a cluster of commercially available computers provides a feasible approach for national and global air traffic system studies.
Operational uses of ACTS technology
NASA Astrophysics Data System (ADS)
Gedney, Richard T.; Wright, David L.; Balombin, Joseph L.; Sohn, Philip Y.; Cashman, William F.; Stern, Alan L.; Golding, Len; Palmer, Larry
1992-03-01
The NASA Advanced Communications Technology Satellite (ACTS) provides the technologies for very high gain hopping spot beam antennas, on-board baseband routing and processing, and wideband (1 GHz) Ka-band transponders. A number of studies have recently been completed using the experience gained in developing the actual ACTS system hardware to quantify how well the ACTS technology can be used in future operational systems. This paper provides a summary of these study results including the spacecraft (S/C) weight per unit circuit for providing services by ACTS technologies as compared to present-day satellites. The uses of the ACTS technology discussed are for providing T1 VSAT mesh networks, aeronautical mobile communications, supervisory control and data acquisition (SCADA) services, and high data rate networks for supercomputer and other applications.
Chemistry Modeling for Aerothermodynamics and TPS
NASA Technical Reports Server (NTRS)
Wang, Dunyou; Stallcop, James R.; Dateo, Christopher e.; Schwenke, David W.; Halicioglu, Timur; Huo, winifred M.
2005-01-01
Recent advances in supercomputers and highly scalable quantum chemistry software render computational chemistry methods a viable means of providing chemistry data for aerothermal analysis at a specific level of confidence. Four examples of first principles quantum chemistry calculations will be presented. Study of the highly nonequilibrium rotational distribution of a nitrogen molecule from the exchange reaction N + N2 illustrates how chemical reactions can influence rotational distribution. The reaction C2H + H2 is one example of a radical reaction that occurs during hypersonic entry into an atmosphere containing methane. A study of the etching of a Si surface illustrates our approach to surface reactions. A recently developed web accessible database and software tool (DDD) that provides the radiation profile of diatomic molecules is also described.
Chemistry Modeling for Aerothermodynamics and TPS
NASA Technical Reports Server (NTRS)
Wang, Dun-You; Stallcop, James R.; Dateo, Christopher E.; Schwenke, David W.; Haliciogiu, Timur; Huo, Winifred
2004-01-01
Recent advances in supercomputers and highly scalable quantum chemistry software render computational chemistry methods a viable means of providing chemistry data for aerothermal analysis at a specific level of confidence. Four examples of first principles quantum chemistry calculations will be presented. The study of the highly nonequilibrium rotational distribution of nitrogen molecule from the exchange reaction N + N2 illustrates how chemical reactions can influence the rotational distribution. The reaction C2H + H2 is one example of a radical reaction that occurs during hypersonic entry into a methane containing atmosphere. A study of the etching of Si surface illustrates our approach to surface reactions. A recently developed web accessible database and software tool (DDD) that provides the radiation profile of diatomic molecules is also described.